Altiris Agent 7 blocking some network traffic
We've been running NS 6 for several years and just installed 7. Due to hardware requirements, we couldn't install to the same server so I turned off the old one and did a fresh install of NS 7 on a new server (named the same as the old one).
Everything appeared to be working nicely. The ver 6 agents were updating to 7 like they should. Then random machines started showing some strange and disturbing behaviour. I started getting calls that Internet Explorer was no longer connecting to websites and people were having trouble printing to shared network printers.
When I stop the Altiris process, everything starts working again. Restart the agent and it will stay working for a while, then eventually IE and shared network printers stop working again.
I've been unable to reproduce the problem and am at a loss as to what's causing it. Anyone seen this before or have any idea what could be causing it?
Thanks,
Mike
<Update> Noticed in the event viewer the following error keeps repeating on computers that are having issues:
Application Error: Windows cannot determine the user or computer name. (Not enough storage is available to complete this operation). Group Policy processing aborted.
The particular pc I checked has 29 GB free space. This error started after Altiris Agent 7 was installed.
<Update 2> Ran TCPView from Sysinternals on a pc with the above problem. AeXNSAgent.exe is showing over 3900 TCP connections in the CLOSE_WAIT state to our Altiris server along with a few UDP connections. On a PC where Altiris Agent 7 is working properly, there are only 3 AeXNSAgent.exe's listed. Two are UDP and one is TCP.
Could be incompatible agents...
Mike,
Not using NS7 here yet, but a few thoughts...what Agents do you have deployed (both legacy 6.x agents and new 7.x)? There may be an agent plug-in/sub-agent that is incompatible with NS7 (i.e. there is no Recovery Solution agent compatible with NS7 yet). Are you using the Real-Time Systems Management agent? Also, Carbon Copy is not supported in NS7, maybe CC is still running on these boxes and trying to connect to the NS?
Thanks,
Kyle
Symantec Trusted Advisor
If your question has been resolved, please be sure to click "Mark as Solution"! Thank you.
all 7.x agents
All of the agents are 7.x. It even happens on a system with just the main Altiris Agent installed and none of the extra agents.
Hate to sound like
a broken record, but what do the Altiris Agent log files indicate?
Jim Harings
Technical Solutions Consultant
Xcend Group
http://xcendgroup.com
logs
Completely forgot there was a logs directory. Duh! Found some interesting info. There is a lot of unauthorized tickling going on. Here's a small sample from the logs:
<event date='Mar 18 07:53:16' severity='4' hostName='NOCSPARELAPTOP8' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='2176' thread='3424' tickCount='843903469' > <![CDATA[CWin32CTServerDirectConnection::Connect(): Tickle connection to server NOC413Altiris.airgas.com:50124]]></event>
<event date='Mar 18 07:53:16' severity='4' hostName='NOCSPARELAPTOP8' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='2176' thread='3424' tickCount='843903659' > <![CDATA[CWin32CTServerDirectConnection::Connect(): Tickle connection to server NOC413Altiris.airgas.com:50124]]></event>
After a few thousand entries like that, then the errors start:
<event date='Mar 18 14:52:57' severity='2' hostName='NOCSPARELAPTOP8' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='2176' thread='3392' tickCount='869081923' >
<![CDATA[CTaskServerNetCommsConnection::Register(): CAtrsException exception, error = "An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.", OS error = 2147952455, at CTaskServerNetCommsConnection::Get
re-throw at CTaskServerNetCommsConnection::GetServerListXml
re-throw at CTaskServerNetCommsConnection::GetServerList
re-throw at CTaskServerNetCommsConnection::GetServersAndRegister]]></event>
So it looks like the big question is what is causing the machine to start all the tickles? I've randomly come across machines with this error but haven't figured out how to replicate it.
Thanks everyone for the help so far. Looks like we're heading in the right direction now.
DS agent issues
I am running DS 6.8 (build 206) and the altiris agent is also causing some issues with IE and shared network printers they stop working intermittently started about 2 weeks ago after I ran Microsoft updates on my server 2003 with sp2 installed. There are several versions of the altiris Agent running could this be a place to start by updating all of them to the same version or am I grasping at straws.
Not necessarily the agent...
Jeanne,
It is not necessarily the Altiris Agent causing these issues; there were several threads I received from the PatchManagement.org listserv talking about printing issues after one of the recent MS security bulletins. It seemed to be specific to PCL printer drivers and some outdated .dll files. So, you may want to check into that. Looks like you can run this search:
http://marc.info/?l=patchmanagement&w=2&r=1&s=PCL&q=b
Thanks,
Kyle
Symantec Trusted Advisor
If your question has been resolved, please be sure to click "Mark as Solution"! Thank you.
Client Task Agent 'tickle' process different in v7
Hi Mike,
It may be worth running by support. I know the tickle process was changed in v7, where it could be initiated from the client side (similar to the 'update configuration' button). This wasn't possible in v6, so there may be some correlation there.
Jim
Jim Harings
Technical Solutions Consultant
Xcend Group
http://xcendgroup.com
Me too
I am having the same exact issues that Mike describes. In addition, we are seeing our Diskeeper product crash (DkService.exe) on several workstations. I'm not sure if its related. Altiris support says that my company and Airgas are the only two people who have reported this problem. They are asking us to run TCPView on an affected workstation and then send the results to them.
As Mike said, stopping the Agent service fixes the issue. We are performing an across-the-board undeploy of all agents until we figure out what's going on.
Any help would be greatly appreciated.
removed 7
We took off NS 7. Symantec tech support couldn't provide an answer and it was messing up our network too badly.
We had always run the version 6 console, then jumped right to 7. Since reinstalling NS 6, I've been using the 6.5 console. The 7 console looks pretty, but functionally it's no better than 6 or 6.5. I'd actually say 7 is worse because the scrolling is awful. In 6 or 6.5 you can just grab the scroll bar and browse through a list of items. In 7, it just shows what record number you're on as you scroll. Not very handy at all when looking for a particular item.
My company tried to get us to drop Altiris this year already due to the price, but I was able to convince them we were making good use out of it. I'll see how long we can hold onto NS 6, because I'm seeing no advantage to moving to NS 7 (they threw in Ghost and pcAnywhere, but we already have images and remote access covered).
Save yourself the headache and run NS 6 with the 6.5 console. Works great and is very stable.
Same issue here
We're also having the same issue. Recently upgraded from 6 to 7 and also notice the large number of log entries for Client Task Agent tickle. Problem is, when I go to the client task agent folder, it is only a link and I get an error saying that the item can't be found. There's also no package or policies to be seen. The client task agent 7.0.4404 is installed on the clients.
Any help would be appreciated.
Same issue here
We're also having the same issue. Recently upgraded from 6 to 7 and also notice the large number of log entries for Client Task Agent tickle. Problem is, when I go to the client task agent folder, it is only a link and I get an error saying that the item can't be found. There's also no package or policies to be seen. The client task agent 7.0.4404 is installed on the clients.
Any help would be appreciated.
With Task Server 6.0 the
With Task Server 6.0 the Client Task agent was configured by default to check for a new task every 5 seconds. In my opinion this was severe over kill. This has been modified in CMS 7 to be 15 minutes by default. There is a possiblity that what you are experiencing is an issue with the Client Task agent still running on the old 6.0 setup, because it hasn't been able to successfully register with a new 7.0 Task Server.
If this was 6.0 I would recommend performing an un-install of the Client Task agent, however with the release of 7.0 the Client Task agent was integrated into being part of the core.
no 6.0 agent
I started by doing an upgrade of 6.0 clients to 7.0, then shortly after we started noticing the problems. After isolating a couple machines that I could play with, I uninstalled the agent and wiped out the Altiris directory. After a fresh install of Altiris Agent 7, it was still behaving the same way.
I thought that was odd about checking for tasks every 5 seconds too. I went and changed that right away.
SQL local or on a dedicated SQL box?
Out of curiosity, are you guys running SQL on the local Altiris server or on a dedicated SQL box/cluster? It shouldn't make a difference.
Local SQL
We were running SQL Server 2005 Express on the same box. We had the full SQL Server 2000 for Altiris 6, but since 7 no longer supported that I put on the free version of 2005 for the time being. That was going to be upgraded to SQL 2005 in the future.
I thought possibly it was due to SQL being overloaded, so I removed all but 7 clients. Some of them had been running fine and some were having the problems above. Even with just 7 clients, the problem ones still behaved the same way.
Server Busy, I don't buy it...
Working with tech support. They say that one of the reasons you get Close_Wait states on the client machine is because of a busy server. I don't personally buy it, I have never seen anything queued under the EvtQFast directory (though I have tons under the Bad directory), which is where they are telling me to look. Our server just isn't that busy. I don't buy the whole server busy idea but will continue to work with the tech to at least rule it out as a cause.
It is good to know that you still get the issue with just 7 clients.
bounce IIS?
Patrick,
DId you try bouncing the IIS service (Start/Run/IISRESET)? Not sure if that would force the connections to close on the clients or not, but it might help. And/or look at the Perfmon IIS counters on the NS for "Current connections" (that may not be the actual counter name, but it is something like that) and see how many it is showing?
Thanks,
Kyle
Symantec Trusted Advisor
If your question has been resolved, please be sure to click "Mark as Solution"! Thank you.
Same here...
We upgraded to NS 7 and had the same problem. We had clients opening anywhere between 3,000 and 6,000 connections. We couldn't find a resolution so we were forced to roll back to NS 6. It really did a number on our network which was ground to a halt due to the saturation. The clients running version 7 were behaving very strange as well.
I really wanted to move to version 7 but until this issue is resolved we'll be sticking with version 6.
Support response
Here's what Symantec support have come back with initially. https://kb.altiris.com/display/1n/articleDirect/index.asp?aid=46199&r=0.1211206
I'm trying the resolution on a few problematic clients (apparently there's no apparent differentiation between server class and client) and will post results.
Greg
resetting IIS
I never tried resetting IIS, but I did completely reboot the server. It's been long enough now where I forget if that temporarily fixed the problem or not (I gave up weeks ago and went back to NS 6). Even after a server reboot, it wasn't too long before the problem came back again.
issues with NS 7
Did you talk to Symantec at all? They told me I was the only one having this problem, and they had never heard of it before.
"fix"
That "fix" was one of the first things they had me try. DNS wasn't the problem, as I was having no issues resolving names. If there was a DNS problem, we'd have over 1000 locations having issues, so that would raise a red flag awfully quick.
It's not really even so much a fix as it is a band-aid. All they say to do is bump up the # of ports to 65534, probably in the hope that the machine will be rebooted before it hits that max and starts having problems again.
Another trick along those lines is to schedule the task that forces a restart of the Altiris agent on a pc. That works too, but some users are very conscious of their icons in the system tray and get a little freaked out when they disappear and reappear on their own.
Ticket logged
Yes I logged a support ticket with Symantec.
Tickle
Mentioned above is something about tickle in the client. If I don't need power management at this stage, can I disable the Tickle / Power Management check box in the agent settings and possibly stop this problem for now?
Tickle
There are two types of connections which can occur. The original NS agent power management connection and the task server connection.
1) As far as I am aware the original NS agent power management connection is not a continuous connection. It only occurs when there is a request for the action. Remote machines are relays for the server but I think the remote machines would only be having an issue like this if power management had been requested for many, many machines. Also the possible number of connections would generally be much smaller as a relay is only per subnet and usually there are not so many machines per subnet.
2) The Task server agent connection is the more likely candidate for this issue for several reasons. 1. The port number is the number used by Task Server agent I believe. 2. Task server agent will try to maintain an open IP connection back to the Task Server it is assigned to. This open IP connection is the route the Task Server uses to tickle the Task Server agent and tell it to request via HTTP the tasks which have been received for it.
So I don't believe disabling the Tickle/Power Management check box will do anything, but it won't hurt anything either and if it is turned off will eliminate this as a possibility.
The best test would be to uninstall the NS agent and then reinstall, but do not allow the Task server plug-in to reinstall. Temporarily turn off the policy for it.
BBishop is correct The best
BBishop is correct The best test would be to uninstall the NS agent and then reinstall, but do not allow the Task server plug-in to reinstall. Temporarily turn off the policy for it.
Thanks,
XianRain
Just wanted to know if
Just wanted to know if BBishop's suggestion worked?
Haven't tried
Haven't tried uninstalling and reinstalling without Task server agent. We really need the agent to do a lot of our work and run several tasks already set up.
Information on the issue
Here is information on the resolution of this issue or a very similiar one. I don't know all the exact details of this issue so that is why I have to qualify this statement.
First thing to check is - make sure that the agents have unique GUIDs. Once that is done, then everything works as designed. Also check and report what version of the Task Server agent you have.
Details from the developers:
Subject: RE: Task Server tickle issue
Using the VMs as prescribed, dev was able to determine that the problem stemmed from using cloned VMs which resulted in the agent residing in each VM to have the same GUID.
Therefore each agent was competing for a connection with the Task Server.
It went something like this:
(1) Local agent attempts to register with TS.
(2) TS open tickle connection to local agent.
(3) Remote agent attempts to register with TS.
(4) Since remote agent appears to be the same agent as local (due to both agents having the same GUID), TS opens a new connection with remote agent and ignores the original connection with the local agent.
(5) Once the local connection is ignored due to TS opening a new connection with remote agent, local agent attempts to re-register with TS.
(6) TS sees request for new connection and opens a connection with the local agent, ignoring the connection with the remote agent.
(7) …
(8) …
The net result is that there are an enormous number of tickle connections being made without resources properly being cleaned up.
Regarding the resource cleanup. TS 7.0 SP1 takes care of this. Before a new connection is established, previous connections are cleaned up.
Regarding the flooding of tickle connections. Dev has added a delay that will prevent an agent from immediately re-registering in the event of a dropped connection. This will be in SP3.
Resolution
Resolution of the root cause, however, is to make sure that the agents have unique GUIDs. Once that is done, then everything works as designed.
Regarding the support cases reported in the Symantec Forums. I am not sure whether the problem reported there is the same as the one we have looked at here. I suggest that this be determined ASAP. If the root-cause is the same, then it can quickly be resolved on the customer’s end. If not, then we will spend additional time isolating their specific problems in order to find root cause.
--------------------------------------------------------------------------------
Just a side node - the server code was early SP2, but the agents were still using Client Task Agent 2904 which was our original 7.0 release. I don't believe we would have seen the socket buildup had the agents been upgraded to SP2. [Note: Of course SP2 is not available publically quite yet, but if you have SP1 which is publically available you should good to go.]
Conflict with DS 6.8?
Hi All,
We're having the same issue...
For those having the issue, are you also running DS as well? We're still running DS 6.8 and have discovered if we restart both AClient and Altiris Agent services the machines can reconnect to the domain, web and Exchange. Maybe a conflict between the clients? Haven't had a chance to monitor extensively if the issue comes back but on a few machines it has. We're about to disable the AClient services on all machines to test for 1 week. Will let you know the results.
Symantec/Altiris support remoted in, checked our system and witnessed the issues first hand. They ran a duplicate GUID report and only 25-35 machines showed up out of 3700 that we have in the system. I'll be verifying to see if the report is accurate.
Conflict with DS 6.8?
Didn't take long to find out it's NOT a conflict between the two. We disabled the AClient on all machines and this morning the problem arose again. The pc we checked this morning had the AClient disabled and after restarting the Altiris Agent (7) the machine functioned normally. I checked the GUIDs, exported a Computer Summary report, no duplicates. I also noticed the machines listed in the duplicate GUID report do not show up in any NS7 reports. Is it possible that these few machines could be wreaking havoc with the server or other NS7 clients?
This is from one of the log files, did I understand correctly the this is nothing to worry about and will be taken care of in SP3?
<event date='May 19 07:23:34' severity='4' hostName='SAM-000IT-A33' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='1696' thread='4040' tickCount='420825000' >
<![CDATA[CWin32CTServerDirectConnection::Connect(): Tickle connection to server sachem-ans.SACHEM.SCHOOLS.LOCAL:50124]]></event>
<event date='May 19 07:23:37' severity='4' hostName='SAM-000IT-A33' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='1696' thread='4040' tickCount='420827359' >
<![CDATA[CWin32CTServerDirectConnection::Connect(): Tickle connection to server sachem-ans.SACHEM.SCHOOLS.LOCAL:50124]]></event>
<event date='May 19 07:23:37' severity='4' hostName='SAM-000IT-A33' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='1696' thread='4040' tickCount='420827390' >
<![CDATA[CWin32CTServerDirectConnection::Connect(): Tickle connection to server sachem-ans.SACHEM.SCHOOLS.LOCAL:50124]]></event>
<event date='May 19 07:23:40' severity='4' hostName='SAM-000IT-A33' source='Client Task Agent' module='client task agent.dll' process='aexnsagent.exe' pid='1696' thread='4040' tickCount='420830421' >
<![CDATA[CWin32CTServerDirectConnection::Connect(): Tickle connection to server sachem-ans.SACHEM.SCHOOLS.LOCAL:50124]]></event>
<event date='May 19 07:25:36' severity='4' hostName='SAM-000IT-A33' source='HttpConnection' module='AeXNetComms.dll' process='aexnsagent.exe' pid='1696' thread='280' tickCount='420946859' >
<![CDATA[Direct connection using IP: 10.45.30.25, Port: 80]]>
Thanks,
Lee
Agent versions
Could you report which versions (screen shot as below or just the version numbers for each of the agents) of the agents you have?
That will help determine if the fixes implemented cover the problem you are seeing?
Thanks
Agent versions
Here's what we have;
Altiris Agent 7.0.3356
Altiris AClient 6.9.355
Altiris Application Metering Agent 7.0.1085
Altiris Base Task Handlers 7.0.4044
Altiris Client Task Agent 7.0.4044
Altiris Inventory Agent 7.0.1085
Altiris Inventory Rule Agent 7.0.3032
Software Management Framework Agent 7.0.2626
Software Management Solution Agent 7.0.1291
Altiris Software Update Agent 7.0.3127
Software Virtualization Agent 2.1.3066
Symantec pcA Agent 12.5.285
Agent versions
Here's ours:
Altiris AClient 6.8.378
Altiris Agent 7.0.3356
Altiris Application Metering Agent 7.0.1.085
Altiris Base Task Handlers 7.0.4044
Altiris Client Task Agent 7.0.4044
Altiris Inventory Agent 7.0.1085
Altiris Software Update Agent 7.0.3127
Dell Client Manager Agent 3.0.1238
Inventory Rule Agent 7.0.3032
Software Management Framework Agent 7.0.2626
Software Management Solution Agent 7.0.1291
How do I disable Client task agent now?
With some things, we're completely lost in NS7!!! I looked in the Agent settings link but the Client task agent policy is not there. I did a search for it and a folder link was returned but nothing else and when I click on it it says it's an invalid link? I think there's something scwewy going on with our NS???
Greg
Disable Client task agent
Hey Greg,
I believe the setting for that is in Settings>Notification Server>Task Agent Settings
Lee
Not there
No link for it in that location?? :(
Greg
Info from Symantec support
reply from Symantec support
I haven't been able to identify a clear way to remove the client task agent - without directly unregistering the DLL for it. Incidentally I have been working with our core team on this (and a number of other customers) and we now have the issue documented and a potential workaround. The knowledge base article is here:
https://kb.altiris.com/article.asp?article=47324&p=1
I had already confirmed this was your issue via the TCPView results - which did show many connections to the NS on port 50124.
Info from Symantec support
Thank you for the info!
Will run the report again to see if there are any remaining duplicate GUIDS.
If all is clear I'll re-enable all the Agents again and give it another go.
Thanks again :)
Any luck on your missing link?
Info from Symantec support
I ran the report "GUIDS shared between 2 or more computers" again and I'm now coming up with over 200 machines that have duplicates, how are they getting duplicate GUIDS if the NS Agent isn't included in our images?
I'll run the Reset GUID Task and see what happens...
Info from Symantec support
Nevermind that last bit. I re-read the bit about software packages, may be our issue since most of our software installations are done through packages.
Thanks.
GUID reset
I tested the GUID reset utility on a few workstations and the same GUID returned.
Anyone else having that happen?
Possible solution
I've been running the duplicate GUID reports and fixing the NS7 Altiris Agents by doing a re-install. The machines have been getting new GUIDs but they still kept showing up in the duplicate reports...
I ran a registry search for some machines that were reported as having duplicate GUIDs and found the GUID in question (47783B4D-371E-467A-B1EB-AEA0EC8227B5)
in the following reg key:
HKEY_LOCAL_MACHINE\SOFTWARE\Altiris\Client Service: LastUpdateComputerInfo
The NS7 Altiris Agent had a different GUID altogether.
Seems the DS Client was holding on to the duplicate GUID somehow. I pushed out a new DS Client and it updated the above reg key to the new GUID.
My plan is to test in a few labs before I redo the entire district and re-enable the Altiris services.
Possible Solution
Status Update:
I needed to also uninstall/re-install NS Altiris Agent in addition to re-pushing the DS AClient.
Tested in one building, roughly 500 machines, and all is working fine.
Same problem here - I sort of fixed it.
We also seem to have this problem as well. Symptons for us are:
Drive mappings missing when user logged in. Have to reboot for it to work again.
Internet webpages not displaying all the time.
I work at a school so testing a room full of 30 PC's or so is quite easy. I removed the altiris Notification Agent from 30 PC's and powered off the Altiris NS for a few days to see if it fixed the problem. The users said the problem went away.
I haven't tried reinstalling the agent like it was suggested above. Originally we upgraded from 6.9 to 7. So all the agents would have been upgraded. Now that I have removed them, I don't know if it will install a fresh agent. We still have the Altiris Deployment Console agent and SVS agent running on the PC's and we have no problem.
Our PC's are running Windows XP SP2. I don't know if SP3 might make a difference? We plan on rolling out a new build to all students PC's over the summer holidays. This will be Windows XP SP3 with almost all our applications running as SVS apps.
Possible Solution
Status Update:
The solution I found above has been working well. No upgrade to Windows SP3 required.
NS 7.0 SP2 has been release
This should now be fixed with NS 7.0 SP2. Task Server version 7.0.4248. Please let us know if you continue to see this issue have upgrading the server and clients.
I've just had our tech
I've just had our tech support person tell me that this issue is resolved by SP2. Apparently it was left out of the issues resolved list. We'll be testing it over the next few days.
Greg
See KB
See KB 46065 kb.altiris.com/article.asp.
Specifically the fixed item labeled "Client task agent immediately retries to connect with server in cases where the connection with the server is dropped, creating a race condition"
Would you like to reply?
Login or Register to post your comment.