Ghost Solution Suite

 View Only

DAgent Application Crashes GSS3HF1-Environmental Cause? 

Jul 23, 2015 07:51 AM

We've been testing out GSS3HF1 and we're seeing here an apparently random peculiarity with DAgent crashes. Machines when booting into WinPE5 will just sometimes have an agent crash as shown below,

IMG_2797_0.JPG

 

We've opened a case with support, but it's puzzling.

I can deploy a machine a dozen times, and not see this. Then we'll get period where it happens with a high probability... then just as frustration peaks it vanishes again. This suggests an environmental factor creeping in that eludes me. 

We've initiated thousands of deployments with the Altiris DS6.9 engine in this environment and have never seen this issue (we still have Altiris DS6.9 running in parallel to GSS and still no issues).

I know we aren't alone as I have another collegue in the UK report this for his new GSS installation too. One thing in common that we have is that GSS3 also seems to be automatically, though not consistently, resheduling jobs to 2071 on completion.

What I'm doing at the moment to track this is turn on agent logging in automation as per HOWTO:3066 and using WinCrashReport to collect crash data. 

Environment Details,

  1. Fresh GSS3 install on a 2012 VM
  2. GSS install is a simple install
  3. No drivers added to automation, and no other tools added
  4. Clients are physical clients on the same subnet as the server.
  5. Have seen issue on various Dell models (9010, 760, 990), a Fujisu 420, and a Viglen Genie (whitebox hardware that's 100% intel inside).
  6. Clients are PXE booting
  7. No other software installed on Server (not even AV)

I've tried artificially loading the network and server to force this -all to no effect. I've also tried randomly disconnecting Agent comms -but all these outages are handled smoothly by the agent (which is a credit to it).

Anyone else seen this?

 

 

 

 

 

 

 

 

Statistics
0 Favorited
0 Views
0 Files
0 Shares
0 Downloads

Tags and Keywords

Comments

Jun 08, 2016 11:08 AM

Hi Sebastian,

Symantec Engineering contacted me with the same date for the new agent release, so that hopefully means it really will come out then ;-)

We've been testing the fixed agent for the last couple of weeks and am pleased to report a total of ZERO agent crashes since. Most pleased.

Kind Regards,
Ian./

Jun 08, 2016 09:58 AM

Hi Ianatkin,

a couple of days ago our assigned Symantec Support Engineer informed me, that their responsible internal software engineering team has identified a buffer overrun situation from the data dumps and that they are anticipating that this will resolve both the crash and the silent fail. 

Besides he gave me the "unofficial info" that they want to fix it in GSS 3.1 HF2 currently scheduled for 13th June 2016.

I keep my fingers crossed that they can hold their promise/schedule and finally fix this major problem almost 1 year after it was the first time reported to them.

Best Regards,

Sebastian

May 17, 2016 05:27 AM

I can confirm that this is still an issue for GSS3.1 MP1

May 10, 2016 08:19 AM

Hi Sebastian,

Thanks for the reply. I did some additional testing and enabled gratuitous-arp-sending on our core switch.
This made the amount of clients with disconnects al lot less. For example on a image session of 120 machines we had 3 faillures because of the winpe client disconnecting after the image was pushed. Beforehand it would be 10+ failing clients.
But the issue still remains and should not happen anyway. We also have support request running. I'm trying to get a packet capture on the gss server on the moment the issue happens. But this is quite difficult in a production environment. You dont want to capture all the image traffic.
 

Best regards,

 

Wilco


 

Apr 25, 2016 04:58 AM

That's not additional info really snruebes72.... as this thread dates back prior to that technote.

Have you raised a call with Symantec to add your customer details as one who suffers from this issue?

Apr 25, 2016 04:43 AM

Hi Wilco,

this is exactly the same issue we are facing. We now updated our Win10 x64 PE build now to the latest stable version (10586) and using latest GSS3.1 version (build 275) and dagent/aclien (6.9.2037) and no improvement: From time to time silent shutdowns of the agent in WinPE as well as crashes (less frequent).

I will contact our Symantec support engineer again nd try to open a ticket to get this fixed as they now officially support Win 10 PE when I'm not mistaken.

As soon as I have news I will post it here.

 

Regards,

Sebastian

Apr 25, 2016 04:35 AM

One addition:

Symantec is already tracking this issue quite a while under Tech Item TECH232291.

https://support.symantec.com/en_US/article.TECH232291.html

since September 2015!!!! without any substantial progress which makes teh situation even worse.

Apr 15, 2016 05:22 AM

Hi Sebastian,

What you are describing is similair to our issue. We are now using GSS 3.1 with WinPE 10 build 10586. Issue remains. Its not a WinPE 10 issue. We also tested and reproduced it in GSS 3.0 different HF levels. With WinPE 5 and 5.1.

Also after we distribute our multicast image some clients random throw the application error and some clients just get silently disconected from the console. The remaining clients go through the next steps in the task just fine and finish correctly.

 

Cheers,


Wilco

Apr 15, 2016 04:36 AM

Hi Wilco,

the only thing I can say for sure at the moment is, that even GSS 3.0 HF5 did not solve the problem but Symatec declined to support our scenario because we use WinPE based on Windows 10.

They promised us to support it in GSS 3.1 if I remember correctly.

We just upraded our GSS Staging enviroment to GSS 3.1 but did not update our WinPE iso with the corresponding latest dagent version 6.9.2037.

We have two problems - one is the visible crash of the dagent.exe in WinPE and the other one is a "silent termination", but as for you only sometimes so that they cannot be reproduced in every attempt but require bulk testing.

Once we did it based on the new WinPE image with GSS 3.1 dagent and have testing results I will post it here.

Maybe one further detail: We currently use WinPE 10 based on build 10240 (TH1 LTSB) but will try to lift it to 10586 (TH2) or RS1 (latest Insider preview build).

The hope is that Microsoft also fixed some bugs in these later releases which might have caused the issue.

Cheers,

Sebastian

Apr 15, 2016 03:17 AM

Also this post looks quite similair to yours. Note this is also happening at our winpe connected clients after deployment of a multicast image randomly 6 out of 30 clients get dissconnected. And the task wont continue. Only after manually restaring dagent in Winpe.

http://www.symantec.com/connect/forums/dagentexe-applcation-error-windows-pe

Apr 15, 2016 03:12 AM

Hi Sebastian,

Did you get the issue resolved with the dagent crashes? We have the same error but then on GSS 3.1.

 


Best regards,

Wilco

Nov 10, 2015 07:12 AM

Hi Sebastian,

We are still working here with Symantec to resolve this. My advice is to ask support for the latest version of the agent (on ETrack 3827763) so that you can test whether this resolves the issue in your environment.

Kind Regards,
Ian./

 

Nov 10, 2015 02:26 AM

Hi ianatkin,

I opened a case with Symantec and mentioned this thread. Newest information I have is, that they will probably fix it with HF4, scheduled for (14-DEC 2015).

Our support engineer also mentioned that this does look similar to the known issue flagged in the release notes for GSS 3.0 HF3 (http://www.symantec.com/docs/TECH232291).

Regards,

Sebastian

Nov 09, 2015 03:52 PM

Hi Sebastian, please ask support to link this to case number 09148269.

 

Nov 09, 2015 09:22 AM

Hi ianatkin,

we are struggling with the same problem in two different flavors. We are using GSS 3.0 HF3 and WinPE 64 Bit based on Windows 10.

From time to time we see memory crashes of the dagent.exe:

2015-11-04_1031.png

On top of this we have "silent" dagent shutdowns more fequently, especially when we try to image a workstation on which the previous imaging attempt failed and the old computer object with the same serial number is still in the eXpress DB - to avoid the resulting reboot we delete the corresonding entry from computer table in the eXpress DB via script before we start the dagent in WinPE on the client but it looks like the server is sending some kind of exit command to the dagent anyway.

A log of such a silent shutdown is attached to this post - but it is hard to read and it does not mention, why the dagent exists.

The only strange thing I see at the very end is some weird string after task type at the very end of the log file:

[11/09/2015 15:08:9.750 756 2] RequestWorkToDo.cpp:145 Mapped Task-Type Ї®îû in TASK_TYPE_TBL

I already spoke to Symantec in regards to this issue and ask them to open cases, but they did not come back yet.

As soon as I have news, I will let you know.

Regards,

Sebastian

Jul 24, 2015 09:32 AM

Adding more agent and crash log files.

 

Client running as multicast master.

Jul 23, 2015 11:25 AM

Adding some more Agent and Crash logs.

This time I added to WinPE a startup a script to consume some RAM. Now, when the DAgent starts, we can more reliably force the crash.

 

 

Jul 23, 2015 10:40 AM

Just been informed of someone else on Symantec Connect who has this issue..

http://www.symantec.com/connect/forums/gss-30-error-da-agent

Jul 23, 2015 10:29 AM

Now attached some logs from a recent crashed agent session -last entry refers to Agent garbage collection.

[07/24/2015 00:27:8.456 1676 1] OutputThread.cpp:42 Waiting on data in output queue
[07/24/2015 00:27:8.565 476 2] GCThread.cpp:33 - Garbage Collector - Thread 344 is still busy
[07/24/2015 00:27:8.565 476 2] GCThread.cpp:150 - Garbage Heap is not empty, Collecting list of 1 objects on which to wait
[07/24/2015 00:27:8.565 476 2] GCThread.cpp:170 - Waiting on table of 1 objects
[07/24/2015 00:27:8.565 476 2] GCThread.cpp:175 - GC Notified of new object to check
[07/24/2015 00:27:8.565 476 2] GCThread.cpp:135 - Garbage Heap is not empty, Removing dead objects
[07/24/2015 00:27:8.565 476 2] GCThread.cpp:22 - Garbage Collector checking status of Thread 344
[07/24/2015 00:27:8.675 476 2] GCThread.cpp:33 - Garbage Collector - Thread 344 is still busy
[07/24/2015 00:27:8.675 476 2] GCThread.cpp:150 - Garbage Heap is not empty, Collecting list of 1 objects on which to wait
[07/24/2015 00:27:8.675 476 2] GCThread.cpp:170 - Waiting on table of 1 objects

 

Related Entries and Links

No Related Resource entered.