Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

NDMP backups failing with 23/83 Error

Created: 28 Mar 2013 | 8 comments

Hi,

I have NDMP backups running in multiple locations on NB6.5.6 master server. Am facing issues while backing up the volumes( NetApp) which has more files EX: 10,000,000 or more files, Its failing with 23 error.

3/22/2013 6:00:04 PM - requesting resource proton-tape-drives
3/22/2013 6:00:04 PM - requesting resource strut.NBU_CLIENT.MAXJOBS.proton.soco.agilent.com
3/22/2013 6:00:04 PM - requesting resource strut.NBU_POLICY.MAXJOBS.proton
3/22/2013 6:00:04 PM - awaiting resource proton-tape-drives - Maximum job count has been reached for the storage unit
3/23/2013 4:01:48 PM - awaiting resource proton-tape-drives - No drives are available
3/23/2013 4:04:34 PM - awaiting resource proton-tape-drives - Maximum job count has been reached for the storage unit
3/23/2013 4:06:48 PM - awaiting resource proton-tape-drives - No drives are available
3/23/2013 4:14:28 PM - awaiting resource proton-tape-drives - Maximum job count has been reached for the storage unit
3/23/2013 5:11:30 PM - granted resource strut.NBU_CLIENT.MAXJOBS.proton.soco.agilent.com
3/23/2013 5:11:30 PM - granted resource strut.NBU_POLICY.MAXJOBS.proton
3/23/2013 5:11:30 PM - granted resource CN6946
3/23/2013 5:11:30 PM - granted resource strut-ULTRIUM-TD4-003
3/23/2013 5:11:30 PM - granted resource proton-tape-drives
3/23/2013 5:11:30 PM - estimated 0 kbytes needed
3/23/2013 5:11:30 PM - started process bpbrm (4268)
3/23/2013 5:11:30 PM - connecting
3/23/2013 5:11:30 PM - connected; connect time: 00:00:00
3/23/2013 5:12:18 PM - mounted
3/23/2013 5:12:18 PM - positioning CN6946 to file 1
3/23/2013 5:13:06 PM - positioned CN6946; position time: 00:00:48
3/23/2013 5:13:06 PM - begin writing
3/26/2013 4:15:28 PM - Error bptm(pid=10048) NDMP SDK: stub called for missing shared library entry "ndmp_get_error_name"  
3/26/2013 4:15:28 PM - Error bptm(pid=10048) NDMP SDK: continuing without looking up error name; returning "?"  
3/26/2013 4:15:28 PM - end writing; write time: 2 23:02:22
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) terminated by parent process        
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) send error status = 18 (NDMP_XDR_DECODE_ERR)      
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) SendControlMessage failed, disabling connection 00EEC440 and exiting     
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) MoverGetState called with no session       
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) NDMP backup failed, path = /vol/vol19      
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) MoverGetState called with no session       
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
3/26/2013 4:15:31 PM - Error ndmpagent(pid=3692) Connection was closed but has not yet been destroyed.   
socket read failed(23)

I have tried adding the manual file, MAX_FILES_PER_ADD and putting the entry as 25000 but still no luck.

Operating Systems:

Comments 8 CommentsJump to latest comment

Suraj_Hegde's picture

HI Nagalla,

I am not getting error 84, I have gone through the bptm log, i dont see any error 84 in that. As i mentioned earlier only the volumes which has many file say more than 10,000,000 only those volumes are failing.

Yasuhisa Ishikawa's picture

Can you post jobid.t file placed under /usr/openv/netbackup/db/jobs/trylogs or install_path\NetBackup\db\jobs\trylogs?

Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan

Suraj_Hegde's picture

Try 1
REQUESTING_RESOURCE 1364246041 server-tape-drives
REQUESTING_RESOURCE 1364246041 strut.NBU_CLIENT.MAXJOBS.server.soco.agilent.com
REQUESTING_RESOURCE 1364246041 strut.NBU_POLICY.MAXJOBS.server
RESOURCE_GRANTED 1364246041 strut.NBU_CLIENT.MAXJOBS.server.soco.agilent.com
RESOURCE_GRANTED 1364246041 strut.NBU_POLICY.MAXJOBS.server
RESOURCE_GRANTED 1364246041 CN6948
RESOURCE_GRANTED 1364246041 strut-ULTRIUM-TD4-002
RESOURCE_GRANTED 1364246041 server-tape-drives
ESTIMATED_KBYTES 1364246042 0
BEGIN_OPERATION 1364246042 ParentJob
BEGIN_OPERATION 1364246042 STREAM_DISCOVERY:StartNotifyScript
PROCESS 1364246042 9056 RUNCMD
PROCESS_END 1364246043 9056 0
OPERATION_STATUS 0
END_OPERATION 1364246043
BEGIN_OPERATION 1364246043 STREAM_DISCOVERY:StreamDiscovery
OPERATION_STATUS 0
END_OPERATION 1364246043
BEGIN_OPERATION 1364246043 STREAM_DISCOVERY:PEMPreprocessed
OPERATION_STATUS 23
END_OPERATION 1364445189
BEGIN_OPERATION 1364445189 STREAM_DISCOVERY:StopOnError
OPERATION_STATUS 0
END_OPERATION 1364445189
BEGIN_OPERATION 1364445189 STREAM_DISCOVERY:EndNotifyScript
PROCESS 1364445190 5620 RUNCMD
PROCESS_END 1364445190 5620 0
OPERATION_STATUS 0
END_OPERATION 1364445190
OPERATION_STATUS 23
END_OPERATION 1364445190
Started 1364246041
Status 23
DestStorageUnit server-tape-drives
DestMediaServer strut
Transport 0
Ended 1364445193
 

Yasuhisa Ishikawa's picture

Sorry, there are no clue I expected here.

Please try Paramesh's suggestion, or decrease MAX_FILES_PER_ADD to 5000 or so.

Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan

Mark_Solutions's picture

I think you probably need to split your streams down a little (making sure you only allow 4 to run at a time if you have that NDMP restriction)

Looking at your log it fails after 3 days!

See if you can split it to have more smaller streams that complete in a shorter time as you may well be affected with timeouts, keep alive issues etc.

3 days seems just too long for a backup to be running.

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.