Client Management Suite

 View Only

SMP7.x Agent Plugin Distribution Health 

Jun 05, 2015 03:46 PM

Summary

Altiris Agent health is something which is getting good traction at the moment with version 7.6 of the Symantec Management Platform. The reason for this is that in some environments Altiris Agent health can be an issue. Unhealthy Agents are sub optimal as they impact estate management by reducing both the visibility and control of your clients.

Locating unhealthy Altiris Agents is troublesome -locating and resolving maladjusted agents is tricky. This can often mean that their resolution is actioned on an ad-hoc basis, purely as a consequence of users reporting issues.

This article describes how we handled the suspicion that all was not well with our agent population, how we used T-SQL to locate those agents, and finally how we leveraged our DS6.9 Server behind the scenes to rapidly implement a fix.


The Problem

We noticed some time after deploying our SMP7.5 server that our agent health wasn't so good. A small percentage of clients would execute the agent install job, but be left with a corrupted agent.

One symptom of this was that clients would show themselves in the console as having just one agent/plugin installed,

BadAgentInConsole.png

Notable in it's absence in the above screen grab is the Altiris Agent itself. Machines with this corruption communicate to the Altiris server, but are in general unmanageable. These machines do present warnings in the Altiris Server's logs, but these just look like the normal warnings you see in deployment windows where it's entirely normal for machines to be light on agent plugins. Only by digging in deeper into the resource guids themselves does the anomaly above become apparent.

To fix the agent needs to be re-installed. Luckily we still have in production our DS6.9 servers so we gave our IT techs a job which they could execute in our Deployment Server 6.9 console to re-install the agent. The idea was that if they should ever encounter an issue with a bad agent install, they could swiftly resolve it themselves to get the machine back under management.

However, what we knew was missing was an more exact formulation of the scope of this issue, the root cause, and a better (automated) way to fix it.

 

General Approach

As every machine in our estate site possesses a functioning DAgent, we were fortunate that we have a remote path to resolving bad Altiris Agent installs.

To understand how widespread this issue really was, I crafted some T-SQL to reveal the plugin health of our estate. It is a 'general' piece of T-SQL which should work on anyone's SMP server. It simply reveals the spread of agent plugin registrations across the estate,

-- This script examines the agent plugin profile to catch install
-- anomalies across our estate.

DECLARE @MIN_DAYS_SINCE_FIRST_DISCOVERED as INT
--So that we only capture machines that have existed for enough time to have had
--all plugins installed. 7 days should be more than enough.

DECLARE @MAX_DAYS_SINCE_LAST_INVENTORY as INT
--So that we only capture machines which have sent inventory recently.
--This should be less than @MIN_DAYS_SINCE_FIRST_DISCOVERED, and generally I'd say keep
--this to a couple of days.

SET @MIN_DAYS_SINCE_FIRST_DISCOVERED=7
SET @MAX_DAYS_SINCE_LAST_INVENTORY=7

select xxx.Plugin_Count,COUNT(*) as 'Total' from
(
select COUNT(*) as 'Plugin_Count' from Inv_AeX_AC_Client_Agent
where _ResourceGuid in (
                         SELECT guid FROM  vComputer vc
                         join resourceupdatesummary rus               
                         on vc.guid = rus.resourceguid               
                         AND rus.inventoryclassguid = 'C74002B6-C7B9-47BB-A5D6-3031AF73BB8D'  
                         WHERE Datediff(dd,rus.[modifieddate],Getdate()) <= @MAX_DAYS_SINCE_LAST_INVENTORY
                         and Datediff(dd,vc.CreatedDate,Getdate()) > @MIN_DAYS_SINCE_FIRST_DISCOVERED
                        )
group by _ResourceGuid ) xxx
group by Plugin_Count
order by Plugin_Count asc

 

The result of this script a table whose results I've charted below for clarity,

Plugin-Distribution.png

The chart above shows three peaks in our agent plugin distribution,

  1. The largest peak is at 13 plugins, and this represents fully functioning agents with all the plugins we deliver.
     
  2. The second largest peak at 11 plugins. Looking in more detail at these machines reveal that these are missing the patch management plugins (the software update and software delivery pickup plugins). This is due to recent patch policy re-assignments and we expect in a few weeks for this peak to vanish.
     
  3. The smallest peak represents our mystery 1-plugin installed scenario.
     
  4. Not shown in the chart (due to the scale) is a very thin background of machines with other plugin counts outside these 3 peaks. On average this is about 2 for any plugin value and in many cases represent incomplete upgrades from NS6 agents.

 

My first thought on seeing this chart was "Brilliant" -digging into the problem hadn't revealed it to be worse than we had originally thought. We had just one issue to contend with -the nearly 200 effectively unmanaged machines which have only have 1 plugin installed. The next step then was to formulate  a strategy to resolve the problem.


Fixing The Agents

At this point we had 3 immediately apparent options to fix this,

  1. Generate a list of failed agent and hand this over to the techs to resolve manually (via the DS6.9 Drag'n'Drop Job we'd already built for this purpose)
  2. Generate a list of failed agents and go through it manually ourselves (via the DS6.9 Drag'n'Drop Job we'd already built for this purpose)
  3. Automate the resolution through DS6.9 and not bother anyone

Option (3) was the prime choice here (as it didn't involve a requirement to have a meeting) so we started looking at what we had,

  • Some T-SQL which could be easily modified to create a list of machines that required attention
  • An already functioning DS6.9 Job which could fix it
  • A mechanism in DS6.9 which could be used to automate the resolution (axsched)

So the simplest way forward seemed to be to use our T-SQL to craft a batch script. This batch script would call the DS6.9 Job scheduling utility (AxSched) on each affected machine to initiate the agent re-install.


For those not familiar with AxSched, I needed the batch script to have the following format,

axsched BAD-PC1 "ReInstall Altiris 7.5 Agent" /t "2015-1-1 9:00" 
PING -n 2 127.0.0.1>nul

axsched BADPC2 "ReInstall Altiris 7.5 Agent" /t "2015-1-1 9:00" 
PING -n 2 127.0.0.1>nul
...

...

where the Ping command between each call is just to give the system a seconds rest to be nice to the DS6.9 box.

Below is the T-SQL I crafted to generate the above batch,

DECLARE @MIN_DAYS_SINCE_FIRST_DISCOVERED as INT
--So that we only capture machines that have existed for enough time to have had
--all plugins installed. 7 days should be more than enough.

DECLARE @MAX_DAYS_SINCE_LAST_INVENTORY as INT
--So that we only capture machines which have sent inventory recently.
--This should be less than @MIN_DAYS_SINCE_FIRST_DISCOVERED, and generally I'd say keep
--this to a couple of days.

DECLARE @PluginCount as INT
--Target PCs based on the number of agent plugins.

SET @MIN_DAYS_SINCE_FIRST_DISCOVERED=7
SET @MAX_DAYS_SINCE_LAST_INVENTORY=7
SET @PluginCount=1

DECLARE @Result varchar(max)

select @Result=COALESCE(@Result,'') + 'axsched ' + vComputer.Name + ' "ALLINONE-AltirisAgentReInstall" /t "2015-1-1 9:00" *CRLF*PING -n 2 127.0.0.1>nul*CRLF*' from 

(
select _ResourceGuid from
(select  _ResourceGuid, COUNT(*) as 'Plugin_Count'  
  from Inv_AeX_AC_Client_Agent 
  where  _ResourceGuid in (
                         SELECT guid FROM  vComputer vc
                         join resourceupdatesummary rus               
                         on vc.guid = rus.resourceguid               
                         AND rus.inventoryclassguid = 'C74002B6-C7B9-47BB-A5D6-3031AF73BB8D'  
                         WHERE Datediff(dd,rus.[modifieddate],Getdate()) <= @MAX_DAYS_SINCE_LAST_INVENTORY
                         and Datediff(dd,vc.CreatedDate,Getdate()) > @MIN_DAYS_SINCE_FIRST_DISCOVERED
                        )
  group by _ResourceGuid
  ) xxx

where plugin_count = @PluginCount) yyy

join vcomputer on vComputer.Guid=yyy._ResourceGuid

select @Result

 

The output above though isn't multiline -I've deliberately packed it into one long string where I've represented each carriage return with the embedded substring *CRLF*.

This string can be pasted into Notepad++ to replace the substring *CRLF* with a line break,

Notpad++.png

Which results finally in the batch script we require.

Once this script was run on the DS6.9 server, within minutes the agent repair had begin and fixed agents started reporting in correctly. I then worked through the outliers by tweaking the plugin count variable in the T-SQL script.

 

Root Cause

As we were in rather a rush with other things, we never found the time to open a support case with Symantec to ascertain the root cause of these corrupted agent installs. What I can say however was that this issue seemed particular to NS Client Package installer that came with SMP 7.5 prior to it's service packing with SP1. We've not seen a single machine exhibiting this issue following our upgrade to 7.5SP1

 

Conclusion

Agent Health is something we heard a great deal on at the 7.6 Summit in Utah, and with it the importance to have mechanisms to report on this in a timely and intuitive matter. This will continue to to get good traction as the product line evolves as it's absolutely critical for endpoint management; an endpoint with a malfunctioning agent cannot be considered to be a managed endpoint.

What I've tried to show today is that it's quite simple to establish a useful plugin health metric for your estate (from the point of view of client registered altiris agent plugins). This metric however cannot be considered in isolation; other metrics are of course needed to ascertain overall agent health; client configuration request history, full inventory history as well as policy execution metric distributions.

Kind Regards,
Ian./

Statistics
0 Favorited
0 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Related Entries and Links

No Related Resource entered.