BUG REPORT: Backup jobs are hanging on a Veritas NetBackup (tm) 6.0 media server.

Article:TECH45475  |  Created: 2005-01-06  |  Updated: 2006-01-31  |  Article URL http://www.symantec.com/docs/TECH45475
Article Type
Technical Solution


Environment

Issue



BUG REPORT: Backup jobs are hanging on a Veritas NetBackup (tm) 6.0 media server.

Error



[allocateTwin] RetryInfo_Record: reason = 13 (MEDIA SERVER IS CURRENTLY NOT CONNECTED TO MASTER SERVER)

Solution



Bug ID: 509509: allocation failure while spanning media, EMM says media server is down, but it looks ok.

Symptoms:

It is possible for a backup job to hang when attempting to mount new media to continue a backup job.   If the connection to the media server is lost at some point, the nbrb daemon will mark the resources as down on the EMM server.  This loss of connection may be intermittent and cause no problems for running backups.  When the connection to the media server returns,  the nbrb daemon will update the status on the EMM server.  The state of the robotic library is not being reset in this situation and is still marked as down for new backup jobs. This will cause backup jobs to hang when attempting to mount a new tape.  There will be no indication of an actual robotic library failure and the devices will still be up and running.  

Log Files:

Unified logs on the master server will show a "MEDIA SERVER IS CURRENTLY NOT CONNECTED TO MASTER SERVER" message for the mds originator.  The mds originator will show media and device selection information.  This will be useful to determine for which NetBackup resource the hung job is waiting.
# vxlogview -p 51216 -o 143 -b '11/13/05 23:00:00' -d all
11/13/05 23:00:00.249 [Debug] NB 51216 mds 143 PID:5074 TID:3066026928 [jobid=1427] 2 [is_host_online] master to media server connection is down, host name = nbmedia1, host state = 6
11/13/05 23:00:00.249 [Debug] NB 51216 mds 143 PID:5074 TID:3066026928 [jobid=1427] 2 [select_from_stu_list] cannot select storage unit, media server is offline, name = nbmedia1-stu
...
11/13/05 23:00:00.249 [Debug] NB 51216 mds 143 PID:5074 TID:3066026928 [jobid=1427] 1 [allocateTwin] RetryInfo_Record: reason = 13 (MEDIA SERVER IS CURRENTLY NOT CONNECTED TO MASTER SERVER), mediaServerName = nbmedia1...

Workaround:

Restarting NetBackup will cause the hung job to fail.  Once NetBackup is restarted,  then any new jobs will run.  To restart NetBackup,  run the following commands:
# /usr/openv/netbackup/bin/goodies/netbackup stop
# /usr/openv/netbackup/bin/goodies/netbackup start
Note: This will allow new jobs to run, but will not prevent this situation from occurring again.

Fix:

This issue is resolved in NetBackup 6.0 Maintenance Pack 1.  The state of the robotic library will be handled properly and backups will be able to span tapes without causing the job to hang.  

To download NetBackup 6.0 Maintenance Pack 1, visit the Support Web site:
 http://support.veritas.com/menu_ddProduct_NBUESVR_view_DOWNLOAD.htm

Supplemental Materials

SourceETrack
Value509509
DescriptionEtrack (NetBackup) 509509: allocation failure while spanning media, EMM says media server is down, but it looks ok. (6.0 MP1 Approved)

Legacy ID



280823


Article URL http://www.symantec.com/docs/TECH45475


Terms of use for this information are found in Legal Notices