Video Screencast Help

Large NDMP jobs fail with status 23 but small jobs succeed

Created: 04 Apr 2014 | 7 comments
Matt Musgrove's picture

We are installing NetBackup 7.6.0.1 on Windows 2012 and backing up NetApp OnTap 8.2.1RC2 Cluster-Mode.

Large jobs (over 800 GB) are failing with Status 23 after the dump to tape has successfully completed.  Small jobs are successful.  I have not determined the threshhold at which they will fail but a test job of 37 GB ran successfully.  Successful and failed jobs are all on the same NetApp storage node and SVM and use the same tape devices.  The large jobs always fail and the small jobs always pass so the problem is consistent.

I've done some digging around and thought it could be solved by setting a NDMP timeout but I could not see this documented in 7.6

End of Activity Monitor log:

01/04/2014 19:48:28 - Info ndmpagent(pid=5764) xxxxsn0202: DUMP: Tue Apr  1 19:48:29 2014 : We have written 817613361 KB.
01/04/2014 19:53:28 - Info ndmpagent(pid=5764) xxxxsn0202: DUMP: Tue Apr  1 19:53:29 2014 : We have written 853167693 KB.
01/04/2014 19:57:02 - Info ndmpagent(pid=5764) xxxxsn0202: ENHANCED_DAR_ENABLED is 'T'      
01/04/2014 19:57:02 - Info ndmpagent(pid=5764) xxxxsn0202: ACL_START is '910273288192'      
01/04/2014 19:57:02 - Info ndmpagent(pid=5764) xxxxsn0202: DUMP: dumping (Pass V) [ACLs]    
01/04/2014 19:57:03 - Info ndmpagent(pid=5764) xxxxsn0202: DUMP: 888938765 KB      
01/04/2014 19:57:03 - Info ndmpagent(pid=5764) xxxxsn0202: DUMP: DUMP IS DONE     
01/04/2014 19:57:03 - Info ndmpagent(pid=5764) NDMP backup successfully completed, path = /xxxxsvminmixpb0001/xxxxvolindtspb_vsvr_datastore_0001   
01/04/2014 19:57:17 - Info bptm(pid=5040) EXITING with status 23 <----------     
01/04/2014 19:57:17 - Critical bpbrm(pid=616) unexpected termination of client xxxxsn0202     
01/04/2014 19:57:17 - Info ndmpagent(pid=0) done. status: 23: socket read failed    
01/04/2014 19:57:17 - end writing; write time: 1:01:34
socket read failed(23)

BPTM log

19:57:05.337 [5040.5132] <2> NdmpMoverSession[0]: ndmp_mover_set_window(0, 18446744073709551615)
19:57:05.337 [5040.5132] <2> NdmpMoverSession[0]: Attempting to create server to which remote host xxxxsn0202 can connect - IPv6
19:57:05.337 [5040.5132] <2> NdmpMoverSession[0]: Using hostname from config file: NDMP_DATA_CONNECTION_HOST_NAME=192.168.120.80
19:57:05.337 [5040.5132] <8> retry_getaddrinfo_for_real: [vnet_addrinfo.c:1059] getaddrinfo() failed RV=11001 NAME=192.168.120.80 SVC=0
19:57:05.337 [5040.5132] <8> retry_getaddrinfo: [vnet_addrinfo.c:894] retry_getaddrinfo_for_real failed RV=11001 NAME=192.168.120.80 SVC=0
19:57:05.337 [5040.5132] <8> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1643] retry_getaddrinfo() failed RV=11001 NAME=192.168.120.80 SVC=NULL
19:57:05.337 [5040.5132] <8> vnet_cached_getaddrinfo: [vnet_addrinfo.c:1271] vnet_cached_getaddrinfo_and_update() failed 6 0x6
19:57:05.337 [5040.5132] <2> NdmpMoverSession[0]: CreateServer: vnet_cached_getaddrinfo failed, host = 192.168.120.80, status = 6, error = 11001
19:57:05.337 [5040.5132] <2> NdmpMoverSession[0]: MoverConnect method failed for IPv6. Reinitializing and attempting to use IPv4.
19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: ndmp_mover_set_window(0, 18446744073709551615)
19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: Attempting to create server to which remote host xxxxsn0202 can connect - IPv4
19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: Using hostname from config file: NDMP_DATA_CONNECTION_HOST_NAME=192.168.120.80
19:57:05.353 [5040.5132] <2> vnet_cached_getaddrinfo_and_update: [vnet_addrinfo.c:1649] found via getaddrinfo NAME=192.168.120.80 SVC=NULL
19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: Using IP address: 192.168.120.80 (for configured host 192.168.120.80)
19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: Creating server for 192.168.120.80 in Server Port Window 1025 to 5000
19:57:05.368 [5040.5132] <2> vnet_bind_to_port_addr_extra_ipi: vnet_bind.c.195: 0: Function failed: 10 0x0000000a
19:57:05.368 [5040.5132] <2> NdmpMoverSession[0]: Failed to bind to port in Server Port Window (check Port Ranges in Host Properties)
19:57:05.368 [5040.5132] <2> NdmpMoverSession[0]: MoverConnect method failed. Reinitializing and attempting to use MoverListen method.
19:57:05.384 [5040.5132] <2> NdmpMoverSession[0]: ndmp_mover_set_window(0, 18446744073709551615)
19:57:05.384 [5040.5132] <2> NdmpMoverSession[0]: ndmp_mover_listen failed, status = NDMP_CONN_TYPE_UNCONFIGURED_ERR
19:57:05.384 [5040.5132] <16> NdmpMoverSession[0]: ERROR Start failed
19:57:05.384 [5040.5132] <16> check_and_process_mover_tasks: NDMP Mover Client Setup failed
19:57:05.399 [5040.5132] <2> NdmpMoverSession[0]: Shutdown complete
19:57:05.399 [5040.5132] <2> write_data_tir: ndmp_task = 1
19:57:05.399 [5040.5132] <2> write_data_tir: status_to_return = -1
19:57:05.399 [5040.5132] <2> check_error_history: just tpunmount: called from bptm line 21767, EXIT_Status = 23
19:57:05.399 [5040.5132] <2> io_close: closing D:\Program Files\Veritas\NetBackup\db\media\tpreq\drive_HP.ULTRIUM6-SCSI.003, from bptm.c.16640

 

 

 

Operating Systems:

Comments 7 CommentsJump to latest comment

Marianne's picture

Have you tried to set NDMP progress timeout ?

See http://www.symantec.com/docs/TECH21246

Similar discussion over here, but unfortunately no solution:
https://www-secure.symantec.com/connect/forums/ndmp-backup-failed-status-23 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Matt Musgrove's picture

Yes Marianne, I have tried the NDMP progress timeout as suggested in TECH21246 but it made no difference. I wasn't sure if it was still valid in NBU 7.6. I couldn't find reference to NDMP_PROGRESS_TIMEOUT in my ndmp log file.

I also had already seen that very promising previous discussion and had sent a personal message to the poster to see if he had a resolution.  Unfortunately I've had no reply.

Amol Nair's picture

Create the following entry in the "ndmp.cfg" file on the media server and restart the netbackup services on the media server.

NDMP_MOVER_CLIENT_DISABLE

This may affect the performance of ndmp backups but it should resolve your status 23 issue.

You can take a look at the following technote for the impact of this entry on your backups.

http://www.symantec.com/docs/TECH162041

Matt Musgrove's picture

Hi Amol.  Before reading your post we were already in the process of trying that :)

Unfortunately the backup we started before the weekend is still running at a pathetic 916 KB/Sec!  So useless :(

mnolan's picture

This is your error here.

 

19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: Creating server for 192.168.120.80 in Server Port Window 1025 to 5000
19:57:05.368 [5040.5132] <2> vnet_bind_to_port_addr_extra_ipi: vnet_bind.c.195: 0: Function failed: 10 0x0000000a
19:57:05.368 [5040.5132] <2> NdmpMoverSession[0]: Failed to bind to port in Server Port Window (check Port Ranges in Host Properties)

Here is one technote with the exact error.

http://www.symantec.com/docs/TECH138437

Which points back to an issue with the ndmp.cfg which we know you have modified from the following line.

19:57:05.353 [5040.5132] <2> NdmpMoverSession[0]: Using hostname from config file: NDMP_DATA_CONNECTION_HOST_NAME=192.168.120.80

I suggest removing this line and allowing NetBackup to setup the ndmp server based on the interface being used to talk to the filer.

 

Matt Musgrove's picture

I think that error is a red herring as I got the same error before I tinkered with the ndmp.cfg file.  We've taken it out now anyway.

mnolan's picture

Is there anything else on the server running on the following? "Server Port Window 1025 to 5000"

Or blocking it?

Even if the job is not doing Remote NDMP, NetBackup may still open a server port for easier meta data trasmission.

Ndmpagent log should show exactly what port was being opened.