Video Screencast Help

BE 2010 Backups to deduplication folder stuck in queued status

Created: 04 Nov 2013 | 9 comments

Hi All,

I just noticed that my BE server has had backup jobs stuck in a queued state for 157+ hours. It's been doing this for quite awhile now, but now I'm sick of it and want to address the problem.

In the past my fix was to reboot the server. It would run fine for 2 weeks to 2 months and then jobs would get stuck in queued again.

Here's the rundown on what I've tried / checked so far

  • There are no active alerts
  • Nothing unusual in the Windows event logs
  • Backup jobs to standalone devices work properly (B2D, LTO library, etc)
  • Stuck jobs include traditional Windows file system backups and VMWare backups
  • Backup Exec is on Service Pack 2
  • Media Server is Windows 2008 R2 Enterprise w/ 80 GB RAM
  • Deduplication target is 25 TB array w/ 12 TB of free space
  • Client side dedupe or Media server dedupe are in use (depending on the job) both have this issue
  • Backups occur on LAN/WAN/SAN - all have this issue

Here's the latest from the debug monitor:

BEREMOTE: [11/04/13 07:47:03] [6764]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.909-08:00 [06764 trivia 'transport'] Main session keepalive.
BEREMOTE: [11/04/13 07:47:03] [6764]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.909-08:00 [06764 trivia 'vmomi.soapStub[488]'] Sending soap request to [TCP:vcenter.do.sduhsd.net:443]: retrieveContent {}
BEREMOTE: [11/04/13 07:47:03] [6764]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.909-08:00 [06764 trivia 'HttpConnectionPool'] [PopPendingConnection] Pending connection returned for scheduling
BEREMOTE: [11/04/13 07:47:03] [6764]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.910-08:00 [06764 trivia 'vmomi.soapStub[488]'] Request started [class Vmacore::Http::UserAgentImpl::AsyncSendRequestHelper:0000000025E58108]
BEREMOTE: [11/04/13 07:47:03] [5484]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.920-08:00 [05484 trivia 'vmomi.soapStub[488]'] Request completed [class Vmacore::Http::UserAgentImpl::AsyncSendRequestHelper:0000000025E58108]
BEREMOTE: [11/04/13 07:47:03] [6764]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.920-08:00 [06764 trivia 'vmomi.soapStub[488]'] Received soap response from [TCP:vcenter.do.sduhsd.net:443]: retrieveContent
BEREMOTE: [11/04/13 07:47:03] [6764]     [fsys\shared]        - VDDK-Log: diskLibPlugin: 2013-11-04T07:47:03.922-08:00 [06764 trivia 'HttpConnectionPool'] [PopPendingConnection] Nothing to reschedule
BESERVER: [11/04/13 07:47:08] [4900]     01 No archive option installed. EV Embedded status defaults to true and not supported version.
BESERVER: [11/04/13 07:47:08] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:08] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:08] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:08] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:08] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:08] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:13] [4900]     01 No archive option installed. EV Embedded status defaults to true and not supported version.
BESERVER: [11/04/13 07:47:13] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:13] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:13] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:13] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:13] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:13] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BENGINE:  [11/04/13 07:47:16] [4160]     [server]             - DeviceManager: timeout event fired
BENGINE:  [11/04/13 07:47:16] [4160]     [server]             - DeviceManager: processing pending requests
BENGINE:  [11/04/13 07:47:16] [4160]     [server]             - DeviceManager: going to sleep for 900000 msecs
BESERVER: [11/04/13 07:47:18] [3828]     12 ForcedDisplay
BESERVER: [11/04/13 07:47:18] [3828]     12 Backup server status:
BESERVER: [11/04/13 07:47:18] [3828]     12 BACKUP          (BACKUP          ) (0300/0900) ONLINE        Drift:0(0), 0.
BESERVER: [11/04/13 07:47:18] [4900]     01 No archive option installed. EV Embedded status defaults to true and not supported version.
BESERVER: [11/04/13 07:47:19] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:19] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:19] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:19] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:19] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:19] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:24] [4900]     01 No archive option installed. EV Embedded status defaults to true and not supported version.
BESERVER: [11/04/13 07:47:24] [4900]     "Cluster" key does not appear to be present in the registry
BESERVER: [11/04/13 07:47:24] [4900]     Failed to open Microsoft cluster ()
BESERVER: [11/04/13 07:47:24] [4900]     VCS cluster keys do not appear to be present in the registry
BESERVER: [11/04/13 07:47:24] [4900]     Failed to open VCS cluster ()
BESERVER: [11/04/13 07:47:24] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:24] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:24] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:24] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:24] [4900]     05 AdammAdminBO::Get() - Parent = NULL , Entity = OST_SERVER_ENTITY
BESERVER: [11/04/13 07:47:24] [4900]     05 AdammAdminBO::Get() - hr = 0x0
BESERVER: [11/04/13 07:47:29] [4900]     01 No archive option installed. EV Embedded status defaults to true and not supported version.
 
I noticed SP3 is out - but it doens't look like it addresses this issue.
 
What can I do besides baby sit my server?
 
Thanks
Operating Systems:

Comments 9 CommentsJump to latest comment

lmosla's picture

Hello,

To fix this issue make sure 'all' of your live updates have been run.  

SP3 fixes a number of Deduplication issues http://www.symantec.com/docs/TECH208601 

sidd3009's picture

We dont have anything associated to Deduplication(PVLSVR) logged.

Since client-side Deduplication is in use, please refer to the Event Viewer for the Event ID mentioned in the following article. If it matches then perform the steps as mentioned in it: http://www.symantec.com/business/support/index?page=content&id=TECH174741

 

Regards,

Siddhant Saini
Advanced Technical Support Engineer, Symantec Corporation 
www.symantec.com

Matt12345's picture

Updated to SP3 - still have issue.

sidd3009 - It doesn't appear that the tech note is related to my issue.

 

 

Matt12345's picture

Stuck in queued again.. Looks like this happens about every 7 days.

 

Problem seems to have gotten worse with SP3.

 

Matt12345's picture

Here's the lateset from the debug monitor:

PVLSVR:   [11/20/13 08:36:17] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = MACHINE_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:17] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = VAULT_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:17] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = STORAGE_ARRAY_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:17] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = DEVICE_POOL_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:18] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = MACHINE_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = BACKUP_TO_DISK_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:18] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = DRIVE_POOL_DRIVE_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:18] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = MACHINE_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = LIBRARY_ENTITY_TYPE
            EntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityName = "
            EntityFlags = 0x00008000
PVLSVR:   [11/20/13 08:36:18] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = OST_SERVER_ENTITY_TYPE
            EntityGuid = {DF7A1F2B-A923-469E-B0E5-E78BBC7ED0C7}
            EntityName = ""
            EntityFlags = 0x00000900
PVLSVR:   [11/20/13 08:36:18] [1500]     PvlEntityDatabase::QueryEntity()
            ParentEntityType = NULL_ENTITY_TYPE
            ParentEntityGuid = {00000000-0000-0000-0000-000000000000}
            EntityType = OST_SERVER_ENTITY_TYPE
            EntityGuid = {DF7A1F2B-A923-469E-B0E5-E78BBC7ED0C7}
            EntityName = ""
 
 
What would the logs show if BE was sequentially mounting, erasing, and dismounting media?
Gurvinder Rait's picture

use sgmon.exe and select first checkbox, device and media (from tools-> Settings-> Click device and media -> select verbose) capture to file

Start the Job and let it run for 20 minutes. Provide the SGMON.log.

Check if this can be the case (Refer www.symantec.com/docs/TECH174741)

Matt12345's picture

I finally got fed up with this and called "support". We updated to SP4, which doens't appear when you use the live update tool - you have to search for it.

I've got a ticket open and have spent a more than 8 hours on the phone (over almost 2 weeks) trying to get this issue resolved.

This *might* relate to using client side deduplication during backups. We'll see.

Matt12345's picture

...still working with support. I'll keep this thread updated in hopes that it might help someone

 

So - it's hard to detremine if this is a client side dedupe issue or not. I'm thinking not.

My media server is running 8 jobs at the same time to the same dedupe target. Support's theroy is that this is overwhelming the dedupe target and making everything hang.

I'm shocked that support is telling me that BE2010 cannot run 8 simultaneous jobs to the same target. It's only backing up about 100 GB at most...

So now I'm going through the process of re-enginering my entire backup routine to stagger my jobs.

At this point I truely believe that support is taking wild guesses at to what the problem might be, and hoping I'll just give up and leave them alone.

Gurvinder Rait's picture

did you eventually get this fixed ? adamm.log can provide you some valuable info. Attach it if you are still running into this problem