Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

No way to tell in NBU that this backup is hung

Created: 11 Dec 2013 • Updated: 15 Jan 2014 | 5 comments
Jay_for_customer's picture
This issue has been solved. See solution.

There are conditions that can occur on a windows backup that can lead you to believe the backup is in progress, but it is hung indefinitely.

 

Example:

Job details:

 

 

10/22/2013 20:18:54 - Info nbjm (pid=19111) starting backup job (jobid=3688964) for client ironbark-bk, policy ITSS-ironbark, schedule Daily_Inc

10/22/2013 20:18:54 - estimated 2241187 kbytes needed

10/22/2013 20:18:54 - Info nbjm (pid=19111) started backup (backupid=ironbark-bk_1382487534) job for client ironbark-bk, policy ITSS-ironbark, schedule Daily_Inc on storage unit fimbo-disk-pool-01

10/22/2013 20:18:57 - started process bpbrm (pid=15885)

10/22/2013 20:18:59 - Info bpbrm (pid=15885) ironbark-bk is the host to backup data from

10/22/2013 20:18:59 - Info bpbrm (pid=15885) reading file list from client

10/22/2013 20:18:59 - connecting

10/22/2013 20:19:00 - Info bpbrm (pid=15885) starting bpbkar on client

10/22/2013 20:19:00 - connected; connect time: 0:00:00

10/22/2013 20:19:06 - Info bpbkar (pid=3420) Backup started

10/22/2013 20:19:06 - Info bpbrm (pid=15885) bptm pid: 15886

10/22/2013 20:19:08 - Info bptm (pid=15886) start

10/22/2013 20:19:08 - Info bptm (pid=15886) using 1048576 data buffer size

10/22/2013 20:19:08 - Info bptm (pid=15886) setting receive network buffer to 1048576 bytes

10/22/2013 20:19:08 - Info bptm (pid=15886) using 48 data buffers

10/22/2013 20:19:14 - Info bptm (pid=15886) start backup

10/22/2013 20:19:15 - Info bptm (pid=15886) backup child process is pid 15916

10/22/2013 20:19:15 - begin writing

 

 

 

 

bpbrm

 

 

20:18:56.981 [15885] <2> bpbrm main: INITIATING (VERBOSE = 0): version NetBackup 7.5 2013061020

20:18:56.982 [15885] <2> logparams: -backup -S walleye -c ironbark-bk -ct 13 -ru root -cl ITSS-ironbark -sched Daily_Inc -bt 1382487534 -dt 86396 -st 1 -b ironbark-bk_1382487534 -mediasvr fimbo -jobid 3688964 -jobgrpid 3688824 -masterversion 750000 -maxfrag 524287 -bpstart_time 1382487835 -reqid -1379845250 -mt 0 -to 0 -stunit fimbo-disk-pool-01 -rl 3 -rp 2678400 -eari 0 -p /ux1848 -mst 6 -flags 0 -storagesvr fimbo -sts_type AdvancedDisk -use_ofb -use_otm -WOFB_enabled 1 -WOFB_fim 1 -WOFB_error 1 -nbu -jm -secure 1 -kl 8 -rg root -fso -tir -tir_plus -connect_options 16974338

20:18:57.001 [15885] <2> vnet_pbxConnect: pbxConnectEx Succeeded

 

 

20:19:06.087 [15885] <2> write_file_names: successfully wrote buffer to COMM_SOCK

20:19:06.088 [15885] <2> bpbrm main: wrote CONTINUE on COMM_SOCK

20:19:06.089 [15885] <2> bpbrm main: from client ironbark-bk: INF - BACKUP 10/22/2013 8:19:05 PM ironbark-bk ITSS-ironbark Daily_Inc INCR 10/21/2013 7:19:09 PM

20:19:15.405 [15885] <2> bpbrm mm_sig: received ready signal from media manager

Nothing else

 

 

bptm

 

 

20:19:08.240 [15886] <2> bptm: INITIATING (VERBOSE = 0): -w -c ironbark-bk -dpath /ux1848 -stunit fimbo-disk-pool-01 -cl ITSS-ironbark -bt 1382487534 -b ironbark-bk_1382487534 -st 1 -cj 1 -reqid -1379845250 -jm -brm -hostname ironbark-bk -ru root -rclnt ironbark-bk -rclnthostname ironbark-bk -rl 3 -rp 2678400 -sl Daily_Inc -ct 13 -maxfrag 524287 -tir -eari 0 -mediasvr fimbo -nonrsvdports -no_callback -connect_options 0x01010101 -jobid 3688964 -jobgrpid 3688824 -masterversion 750000 -bpbrm_shm_id 419430423 -blks_per_buffer 2048

20:19:08.257 [15886] <2> vnet_pbxConnect: pbxConnectEx Succeeded

20:19:08.258 [15886] <2> job_connect: SO_KEEPALIVE set on socket 1 for client walleye

.....

20:19:15.339 [15886] <2> write_backup: backup child process is pid 15916

20:19:15.341 [15916] <2> close_disk_descriptors: descriptor close: files = FALSE sockets = FALSE

The above line is the only child PID  for 15916 I see.

 

20:19:15.394 [15886] <2> db_end: Need to collect reply

20:19:15.405 [15886] <4> write_backup: begin writing backup id ironbark-bk_1382487534, copy 1, fragment 1, destination path /ux1848

20:19:15.405 [15886] <2> signal_parent: sending SIGUSR1 to bpbrm (pid = 15885)

20:19:15.405 [15886] <2> write_data: twin_index: 0 active: 1 dont_process: 0 wrote_backup_hdr: 0 finished_buff: 0 saved_cindex: -1 twin_is_disk 1 delay_brm: 0

20:19:15.405 [15886] <2> write_data: Total Kbytes transferred 0

bpbkar

 

8:19:20.067 PM: [3420.3168] <4> dos_backup::tfs_scannext: INF - no more path list entries

8:19:20.239 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'D:' added to VSS Volume Set

8:19:20.239 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - Drive 'd:' added to VSS Volume Set

8:19:59.319 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - VSS Volume Set snapped

8:19:59.319 PM: [3420.3168] <4> VssSnapshot::V_VssVolumeSnapshot(): INF - About to commence post-snap unbounded association checks

8:19:59.319 PM: [3420.3168] <4> backup_create: INF - NetBackup Temp Directory: 'D:\Program Files\Veritas\\NetBackup\Temp'

8:19:59.319 PM: [3420.3168] <4> tar_backup_tfi::UpdateExcludeListWithVHD: INF - UpdateExludeListWithVHD begin

Nothing else

 

the windows event log shows the cause:

 

Here is what we see for this hung backup:

 

Wed Oct 23 2013 01:24:53             Virtual Disk Service          E1           Unexpected failure. Error code: 2@0200001D

Wed Oct 23 2013 01:04:54             Virtual Disk Service          E1           Unexpected failure. Error code: 2@0200001D

Wed Oct 23 2013 00:44:54             Virtual Disk Service          E1           Unexpected failure. Error code: 2@0200001D

Wed Oct 23 2013 00:24:54             Virtual Disk Service          E1           Unexpected failure. Error code: 2@0200001D

Wed Oct 23 2013 00:04:54             Virtual Disk Service          E1           Unexpected failure. Error code: 2@0200001D

Tue Oct 22 2013 20:19:59               Service Control Manager              I7036     The Virtual Disk Service service entered the running state.

Tue Oct 22 2013 20:19:59               Service Control Manager              I7035     The Virtual Disk Service service was successfully sent a start control.

Tue Oct 22 2013 20:19:59               Virtual Disk Service          I3            Service started

Can something in NBU be changed so it can know of a fault like this and not have the backup hang indefinitely?

Operating Systems:

Comments 5 CommentsJump to latest comment

Marianne's picture

Virtual Disk Service is an OS service, not NBU.

If a problem at OS level is causing backup to hang, that is where you should focus troubleshooting attempts.

You did not mention OS version and patch level or NBU patch level.

Maybe start with latest Windows updates followed by NBU latest patches (if not already installed).

If this does not fix the issue, a call to Microsoft should be the next step...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Mark_Solutions's picture

There is an OpsCenter alert for hung backups - not sure exactly how it works it out but perhaps that would help?

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

jim dalton's picture

Yes opscenter might have it , perhaps trigger on job duration or throuhput or both maybe.

Jim

Mark_Solutions's picture

It is a specific alert "hung job" - havent used it so not sure what setting you need to set

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Jay_for_customer's picture

My intention was to get a change request in also

Ths has been published to note this:

http://www.symantec.com/business/support/index?page=content&id=TECH213267

 

SOLUTION