Video Screencast Help

Fix GetPackageInfo request interface(s) once and for all!

Created: 12 Aug 2010 • Updated: 17 Aug 2010 | 8 comments
Ludovic Ferre's picture
26 Agree
0 Disagree
+26 26 Votes
Login to vote

And let the customer say bye-bye to the dreaded "Error while downloading package: Server is busy. Package sources request is backing off".

In case you are not aware of what the issue is with this interface (lucky you ;)), here are a few links pointing into the various problems reported by customers or documented by Symantec:

And there are many more. To put it short we have a single point of failure that each and every computer needing packages frmo the environment will get to.

SMP 7.0 SP4 also includes an interesting hint on this problem, albeit no attempt to fix the interface design have been disclosed to this day (and the problem is considered a feature request rather than a bug):

Altiris Profiler GetPackageInfo DenialOfService report


So, if you have read thus far my friend, please help us getting this problem fixed by clicking on Agree!

Comments 8 CommentsJump to latest comment

Ludovic Ferre's picture

We had another issue with the interface at my customer and they lost some of the 120 GiB pakages on their (100+) package servers.

It took a couple of days to recover using robocopy (thus stopping production roll-out) so the issue was escalated all the way to Product Management and Engineering.

And we have 2 changes in the make:

  1. A recovery tool to get the PS infrastructure in working order quickly and efficiently as a short term solution
  2. Introducing queueing on the GetPackageInfo.aspx to handle the Agent and PS requests differently as a full resolution

I'll post some updates as we get finalizations on the tool and design, but we have full commitment form the product team. So this is looking better than ever.

Thanks to all of those who voted on the idea!

Ludovic FERRÉ
Principal Remote Product Specialist
Symantec

0
Login to vote
Ludovic Ferre's picture

Because there's also a version 7.0 of this problem, albeit it should be a little harder to hit it. Still it's worth documenting here.

In short, the codebase cache, that was backed by the database in the 6.0 time, went to memory only.

However the codebase cache maintenance is mainly done on package server package info requests (because they tend to request all packages, thus they often get misses for not often used packages on a site etc).

And when you have too many package servers synchronising at once (as my customer found) you hit a bug that causes the codebase request to take a slow path that make package info requests about as bad as when they were backed by the DB.

This is because the bug that we hit causes all codebases in cache to be refreshed, and with enough PS you end up with a huge workload on the cache (in-memory) as only one process cache write to it at a point in time.

In all cases, I'm on leave now but this will be further documented on the KB (Symwise) and here on Connect.

Ludovic FERRÉ
Principal Remote Product Specialist
Symantec

0
Login to vote
cnx_steve's picture

I can find no updates on this in either SymWISE or Connect. Did anything ever come of it? We are using CMS 7.1 SP2 MP1 and have this problem.

0
Login to vote
Ludovic Ferre's picture

Hello cnx_steve,

I got a fix in place for one of my customer in 6.0 R14. I double checked a few weeks back if this was going to make it in Rollup v7 but it didn't.

The good news is that it will go to Rollup v8. This rollup is planned for January right now.

Ludovic FERRÉ
Principal Remote Product Specialist
Symantec

0
Login to vote
Stefan S.'s picture

I wonder if any of those fixes have made it into 7.5? It seems there is still a bug if many (~1000) package servers try to synchronize many packages at the same time.

0
Login to vote
Ludovic Ferre's picture

We do not recommand going above 500 package servers per SMP, so you are way off here.

I guess the issue with GetPackageInfo is not just queue but the all process of centrally maintaining codebases... that cannot be scalable.

An alternative (which would require some re-engineering) would be to have the agent doing more of the intelligent work pulling codebases directly from a package server (you remove the latency with package statuses flying up to the SMP and the bottleneck from GetPAckageInfo).

The intelligence there would be that if the agent find the PS is unavailable or if the package is not ready on the PS the agent would have the option to pull data from another PS or the SMP or to wait depending on administrator settings (via site-subnet configuration or global coresettings.config items).

That would get rid of a lot of moving parts that are proving to not work well under stress (and that should allow you to have many more package servers as well - given only them would pull codebase data form the SMP).

Ludovic FERRÉ
Principal Remote Product Specialist
Symantec

0
Login to vote
Stefan S.'s picture

Unfortunately you were right. The implementation guide for 7.5 says recommended package servers per NS 1 - 600.

So I guess I have to do some re-design maybe move some of the small sites to another child NS.

Strangely though we did not have much of a problem until the upgrade from 7.1 to 7.5 where now a lot of packages (seems most security updates and internal packages) staying in invalid / stale mode and complaining about the signature not matching....

0
Login to vote
sdmayhew's picture

The issue is not so much the package servers as the 7.5 SMP server handling the package request. I still get this issue in 7.5 

Altiris user since 2001, Asset Management for 25 years

0
Login to vote