Video Screencast Help

VAULT failure error code 10

Created: 17 Aug 2013 | 6 comments

Hi Folks

vault is failing since past 3 days. it fails with error code 10. failure logs:

14:06:45.922 [25341] <16> vltrun@ImageInterface::next_image^1622: db_IMAGEreceive FAIL: 10 (allocation failed)
14:06:45.923 [25341] <16> vltrun@SelectImagesForDuplication^1622: Leaving with DMN=1 SC=10
14:06:45.923 [25341] <16> vltrun@VltSession::lock_and_operate^1622 OP_STEP=select FAILED
14:06:45.958 [25341] <16> vltrun@VltSession::lock_and_operate^1622 FAILed NB_EC=10 NB_MSG=allocation failed
14:06:45.959 [25341] <2> vltrun@VaultJobMonitor::_send_jobrundata^1622: SENT JobRunDataEx_t to JOBD at 1376762805
14:06:45.959 [25341] <2> vltrun@VaultJobMonitor::IncrementJobProgress^1622: SENT completion pct=3 to JOBD
14:06:45.959 [25341] <16> vltrun@VltSession::lock_and_operate^1622: Leaving with DMN=1 SC=10
14:06:45.966 [25341] <8> vltrun@VltSession::sessionStep()^1622: Session STEP COMPLETE
End Time                :2013.08.17 02:06:45 PM (1376762805)
Elapsed                 :12:5
14:06:45.966 [25341] <16> vltrun@VltSession::runSession^1622 Aborting session...
14:06:45.970 [25341] <16> vltrun@VltSession::runSession^1622 FAILed NB_EC=10 NB_MSG=allocation failed
14:06:45.984 [25341] <8> vltrun@VltSession::runSession^1622 Destructor called during stack unwinding
14:06:45.984 [25341] <16> vltrun@VltSession::runSession^1622: Leaving with DMN=1 SC=10
14:06:45.985 [25341] <2> vltrun@main^1622: End Time:            2013.08.14 09:06:45 PM (1376762805)     Elapsed: 15:56 (776)
 
14:06:45.985 [25341] <2> vltrun@main^1622: >>********************************************************************<<
14:06:45.993 [25341] <2> vltrun@main: cleanup vltrun
14:06:45.993 [25341] <2> vltrun@VaultLockProxy::release_all_locks^1622: UpdJobd=1
14:06:45.993 [25341] <2> vltrun@VaultLockProxy::raw_release_lock^1622: Type=NBVAULT.MAXJOBS Key= Limit=100
14:06:45.994 [25341] <4> vltrun@VaultLockProxy::raw_release_lock^1622: lock released. Key=MASTERSERVER.NBVAULT.MAXJOBS
14:06:45.994 [25341] <2> vltrun@VaultJobMonitor::_send_try_msg^1622: At 1376762805 sent TRY_MSG: VAULT_GLOBAL_LOCK_RELEASED 1376762805
 
14:06:45.994 [25341] <2> vltrun@cancel_keepalive_process^1622: Entering to cancel PID=25349, JOB_ID=3736561 with FD=13
14:06:45.994 [25341] <2> vltrun@cancel_keepalive_process^1622: Msg to Child: Exit now
 
14:06:45.994 [25341] <4> vltrun@cancel_keepalive_process^1622: write end of pipe closed
14:06:45.994 [25341] <2> vltrun@cancel_keepalive_process^1622: Harvesting child's exit status
14:06:46.018 [25341] <8> vltrun@cancel_keepalive_process^1622: Child exited with EC=0 SIG=0
14:06:46.018 [25341] <4> vltrun@main: Reporting Status=10 (allocation failed)
14:06:46.018 [25341] <16> vltrun@main Vault Session FAILED [PRFL=BC_eject_lib0 SID=1622 JID=3736561 EC=10]
14:06:46.044 [25341] <16> vltrun@main FAILed NB_EC=10 NB_MSG=allocation failed
14:06:46.044 [25341] <2> vltrun@main: EXIT STATUS 10
 
disk space is not a an issue. it more than 80G free.
 
please share your thoughts.
Operating Systems:

Comments 6 CommentsJump to latest comment

Nagalla's picture

please attach the full detail.log and summary.log file for this vault session.

and let us know about your enviornment.

1)Master server OS version netbackup version

2)robot contorl host OS version and netbackup Version

3) what are the tasks that you are perffrom with this vault job(duplication/catlologbackup/ejection)

4) show us the detail status of the failed job

watsons's picture

How many images are the vault trying to process?

"Error 10 allocation failed" is usually related to memory issue.

http://www.symantec.com/docs/TECH57161

http://www.symantec.com/docs/TECH19038

rookie11's picture

Master server OS version netbackup version -- Linux  2.6.18-274.3.1.el5, NBU 7.1.0.2

what are the tasks that you are perffrom with this vault job(duplication/catlologbackup/ejection) -- duplication by SLP and ejection by vault.

detail status of the failed job.:

8/17/2013 13:53:49 - vault waiting for session ID lock
08/17/2013 13:53:49 - vault session ID lock acquired
08/17/2013 13:53:49 - vault session ID lock released
08/17/2013 13:53:49 - Info nbjm (pid=20823) starting backup job (jobid=3736561) for client MASTER, policy Library_0_daily_eject, schedule Daily2
08/17/2013 13:53:49 - Info nbjm (pid=20823) requesting NO_STORAGE_UNIT resources from RB for backup job (jobid=3736561, request id:{F85EA3B8-0765-11E3-B341-90E8706B4779})
08/17/2013 13:53:49 - requesting resource MASTER.NBVAULT.MAXJOBS
08/17/2013 13:53:49 - requesting resource MASTER.NBU_POLICY.MAXJOBS.Library_0_daily_eject
08/17/2013 13:53:49 - granted resource  MASTER.NBVAULT.MAXJOBS
08/17/2013 13:53:49 - granted resource  MASTER.NBU_POLICY.MAXJOBS.Library_0_daily_eject
08/17/2013 13:53:49 - estimated 0 kbytes needed
08/17/2013 13:53:49 - begin Parent Job
08/17/2013 13:53:49 - begin Vault: Start Notify Script
08/17/2013 13:53:49 - Info RUNCMD (pid=25334) started
08/17/2013 13:53:49 - Info RUNCMD (pid=25334) exiting with status: 0
Operation Status: 0
08/17/2013 13:53:49 - end Vault: Start Notify Script; elapsed time 0:00:00
08/17/2013 13:53:49 - begin Vault: Execute Script
08/17/2013 13:53:49 - started process bpbrm (pid=25341)
08/17/2013 13:53:49 - requesting resource MASTER.VAULT_CREATE_SESSION_ID.LOCK_TLD(0)_Daily_eject_Lib0
08/17/2013 13:53:49 - granted resource  MASTER.VAULT_CREATE_SESSION_ID.LOCK_TLD(0)_Daily_eject_Lib0
08/17/2013 14:06:45 - vault global lock released
08/17/2013 14:06:46 - end writing
Operation Status: 10
08/17/2013 14:06:46 - end Vault: Execute Script; elapsed time 0:12:57
08/17/2013 14:06:46 - begin Vault: Stop On Error
Operation Status: 0
08/17/2013 14:06:46 - end Vault: Stop On Error; elapsed time 0:00:00
08/17/2013 14:06:46 - begin Vault: End Notify Script
08/17/2013 14:06:46 - Info RUNCMD (pid=4340) started
08/17/2013 14:06:46 - Info RUNCMD (pid=4340) exiting with status: 0
Operation Status: 0
08/17/2013 14:06:46 - end Vault: End Notify Script; elapsed time 0:00:00
Operation Status: 10
08/17/2013 14:06:46 - end Parent Job; elapsed time 0:12:57
allocation failed  (10)
 
summary.log:
more summary.log
Robot:                TLD(0)
Vault:                Daily_eject_Lib0
Profile:              BC_eject_lib0
Session ID:           0
Job_Id:               3736561
====================================
 
Session STEP Information
RVP                     = BC_eject_lib0
SID                     = 1622
STEP                    = start_notify
StartTime               = 2013.08.17 01:53:49 PM (1376762029)
Session STEP COMPLETE
End Time                :2013.08.17 01:53:49 PM (1376762029)
Elapsed                 :0:0
Session STEP Information
RVP                     = BC_eject_lib0
SID                     = 1622
STEP                    = changegroup_torobot
StartTime               = 2013.08.17 01:53:49 PM (1376762029)
Session STEP COMPLETE
End Time                :2013.08.17 01:54:40 PM (1376762080)
Elapsed                 :0:51
Session STEP Information
RVP                     = BC_eject_lib0
SID                     = 1622
STEP                    = select
StartTime               = 2013.08.17 01:54:40 PM (1376762080)
NetBackup Error: [2013:08:17::14:06:45] allocation failed (10)
Session STEP COMPLETE
End Time                :2013.08.17 02:06:45 PM (1376762805)
Elapsed                 :12:5
NetBackup Error: [2013:08:17::14:06:45] allocation failed (10)
 
====================================
 
SUMMARY REPORT
================
Master Server:        usa0300uv832
Started:              2013.08.17 01:53:49 PM
Stopped:              2013.08.17 02:06:45 PM
Exit Status:          allocation failed (10)
 
For detailed information, please refer to:
    /usr/openv/netbackup/vault/sessions/Daily_eject_Lib0/sid1622/logs/detail.log
 
====================================
 
NetBackup Error: [2013:08:17::14:06:46] allocation failed (10)

 

rookie11's picture

How many images are the vault trying to process?  -- do not know how to find it.

but on an average weekday vault ejecting around 45 LTO3 media.

 

watsons's picture

detail.log should have that info - of how many images were selected.

As per the error, it fails on the step of image/media selection:

STEP                    = select
StartTime               = 2013.08.17 01:54:40 PM (1376762080)
NetBackup Error: [2013:08:17::14:06:45] allocation failed (10)
 
So it looks more likely related to: 
http://www.symantec.com/docs/TECH19038
http://www.symantec.com/docs/TECH159907
Nagalla's picture

does robot control host and Master servers are same?

please attach the detail.log file for this vault job.

look for the location /usr/openv/volmgr/misc/ and see if you see any *.txt files in it..if yes.. please delete them..