We've been testing out GSS3HF1 and we're seeing here an apparently random peculiarity with DAgent crashes. Machines when booting into WinPE5 will just sometimes have an agent crash as shown below,
We've opened a case with support, but it's puzzling.
I can deploy a machine a dozen times, and not see this. Then we'll get period where it happens with a high probability... then just as frustration peaks it vanishes again. This suggests an environmental factor creeping in that eludes me.
We've initiated thousands of deployments with the Altiris DS6.9 engine in this environment and have never seen this issue (we still have Altiris DS6.9 running in parallel to GSS and still no issues).
I know we aren't alone as I have another collegue in the UK report this for his new GSS installation too. One thing in common that we have is that GSS3 also seems to be automatically, though not consistently, resheduling jobs to 2071 on completion.
What I'm doing at the moment to track this is turn on agent logging in automation as per HOWTO:3066 and using WinCrashReport to collect crash data.
Environment Details,
I've tried artificially loading the network and server to force this -all to no effect. I've also tried randomly disconnecting Agent comms -but all these outages are handled smoothly by the agent (which is a credit to it).
Anyone else seen this?
Hi Sebastian,
Symantec Engineering contacted me with the same date for the new agent release, so that hopefully means it really will come out then ;-)
We've been testing the fixed agent for the last couple of weeks and am pleased to report a total of ZERO agent crashes since. Most pleased.
Kind Regards, Ian./
Hi Ianatkin,
a couple of days ago our assigned Symantec Support Engineer informed me, that their responsible internal software engineering team has identified a buffer overrun situation from the data dumps and that they are anticipating that this will resolve both the crash and the silent fail.
Besides he gave me the "unofficial info" that they want to fix it in GSS 3.1 HF2 currently scheduled for 13th June 2016.
I keep my fingers crossed that they can hold their promise/schedule and finally fix this major problem almost 1 year after it was the first time reported to them.
Best Regards,
Sebastian
I can confirm that this is still an issue for GSS3.1 MP1
Thanks for the reply. I did some additional testing and enabled gratuitous-arp-sending on our core switch. This made the amount of clients with disconnects al lot less. For example on a image session of 120 machines we had 3 faillures because of the winpe client disconnecting after the image was pushed. Beforehand it would be 10+ failing clients. But the issue still remains and should not happen anyway. We also have support request running. I'm trying to get a packet capture on the gss server on the moment the issue happens. But this is quite difficult in a production environment. You dont want to capture all the image traffic.
Best regards,
Wilco
That's not additional info really snruebes72.... as this thread dates back prior to that technote.
Have you raised a call with Symantec to add your customer details as one who suffers from this issue?
Hi Wilco,
this is exactly the same issue we are facing. We now updated our Win10 x64 PE build now to the latest stable version (10586) and using latest GSS3.1 version (build 275) and dagent/aclien (6.9.2037) and no improvement: From time to time silent shutdowns of the agent in WinPE as well as crashes (less frequent).
I will contact our Symantec support engineer again nd try to open a ticket to get this fixed as they now officially support Win 10 PE when I'm not mistaken.
As soon as I have news I will post it here.
Regards,
One addition:
Symantec is already tracking this issue quite a while under Tech Item TECH232291.
https://support.symantec.com/en_US/article.TECH232291.html
since September 2015!!!! without any substantial progress which makes teh situation even worse.
What you are describing is similair to our issue. We are now using GSS 3.1 with WinPE 10 build 10586. Issue remains. Its not a WinPE 10 issue. We also tested and reproduced it in GSS 3.0 different HF levels. With WinPE 5 and 5.1.
Also after we distribute our multicast image some clients random throw the application error and some clients just get silently disconected from the console. The remaining clients go through the next steps in the task just fine and finish correctly.
Cheers,
the only thing I can say for sure at the moment is, that even GSS 3.0 HF5 did not solve the problem but Symatec declined to support our scenario because we use WinPE based on Windows 10.
They promised us to support it in GSS 3.1 if I remember correctly.
We just upraded our GSS Staging enviroment to GSS 3.1 but did not update our WinPE iso with the corresponding latest dagent version 6.9.2037.
We have two problems - one is the visible crash of the dagent.exe in WinPE and the other one is a "silent termination", but as for you only sometimes so that they cannot be reproduced in every attempt but require bulk testing.
Once we did it based on the new WinPE image with GSS 3.1 dagent and have testing results I will post it here.
Maybe one further detail: We currently use WinPE 10 based on build 10240 (TH1 LTSB) but will try to lift it to 10586 (TH2) or RS1 (latest Insider preview build).
The hope is that Microsoft also fixed some bugs in these later releases which might have caused the issue.
Also this post looks quite similair to yours. Note this is also happening at our winpe connected clients after deployment of a multicast image randomly 6 out of 30 clients get dissconnected. And the task wont continue. Only after manually restaring dagent in Winpe.
http://www.symantec.com/connect/forums/dagentexe-applcation-error-windows-pe
Did you get the issue resolved with the dagent crashes? We have the same error but then on GSS 3.1.
We are still working here with Symantec to resolve this. My advice is to ask support for the latest version of the agent (on ETrack 3827763) so that you can test whether this resolves the issue in your environment.
Hi ianatkin,
I opened a case with Symantec and mentioned this thread. Newest information I have is, that they will probably fix it with HF4, scheduled for (14-DEC 2015).
Our support engineer also mentioned that this does look similar to the known issue flagged in the release notes for GSS 3.0 HF3 (http://www.symantec.com/docs/TECH232291).
Hi Sebastian, please ask support to link this to case number 09148269.
we are struggling with the same problem in two different flavors. We are using GSS 3.0 HF3 and WinPE 64 Bit based on Windows 10.
From time to time we see memory crashes of the dagent.exe:
On top of this we have "silent" dagent shutdowns more fequently, especially when we try to image a workstation on which the previous imaging attempt failed and the old computer object with the same serial number is still in the eXpress DB - to avoid the resulting reboot we delete the corresonding entry from computer table in the eXpress DB via script before we start the dagent in WinPE on the client but it looks like the server is sending some kind of exit command to the dagent anyway.
A log of such a silent shutdown is attached to this post - but it is hard to read and it does not mention, why the dagent exists.
The only strange thing I see at the very end is some weird string after task type at the very end of the log file:
[11/09/2015 15:08:9.750 756 2] RequestWorkToDo.cpp:145 Mapped Task-Type Ї®îû in TASK_TYPE_TBL
I already spoke to Symantec in regards to this issue and ask them to open cases, but they did not come back yet.
As soon as I have news, I will let you know.
Adding more agent and crash log files.
Client running as multicast master.
Adding some more Agent and Crash logs.
This time I added to WinPE a startup a script to consume some RAM. Now, when the DAgent starts, we can more reliably force the crash.
Just been informed of someone else on Symantec Connect who has this issue..
http://www.symantec.com/connect/forums/gss-30-error-da-agent
Now attached some logs from a recent crashed agent session -last entry refers to Agent garbage collection.
[07/24/2015 00:27:8.456 1676 1] OutputThread.cpp:42 Waiting on data in output queue [07/24/2015 00:27:8.565 476 2] GCThread.cpp:33 - Garbage Collector - Thread 344 is still busy [07/24/2015 00:27:8.565 476 2] GCThread.cpp:150 - Garbage Heap is not empty, Collecting list of 1 objects on which to wait [07/24/2015 00:27:8.565 476 2] GCThread.cpp:170 - Waiting on table of 1 objects [07/24/2015 00:27:8.565 476 2] GCThread.cpp:175 - GC Notified of new object to check [07/24/2015 00:27:8.565 476 2] GCThread.cpp:135 - Garbage Heap is not empty, Removing dead objects [07/24/2015 00:27:8.565 476 2] GCThread.cpp:22 - Garbage Collector checking status of Thread 344 [07/24/2015 00:27:8.675 476 2] GCThread.cpp:33 - Garbage Collector - Thread 344 is still busy [07/24/2015 00:27:8.675 476 2] GCThread.cpp:150 - Garbage Heap is not empty, Collecting list of 1 objects on which to wait [07/24/2015 00:27:8.675 476 2] GCThread.cpp:170 - Waiting on table of 1 objects