Video Screencast Help

Issue restoring a large number of files

Created: 17 Apr 2013 | 4 comments
francesco.arioldi@nispro.it's picture

Hi all

I Have a problem with restoring about 9000 files, all files are located on the same tape.

My Environment:

Windows Server 2003 Std Sp2

4Gb Ram

Netbackup 6.5.3

Tape Library IBM System Storage TS3100 - product id 3573-TL - firmware A.60 / 3.10e

About 2 hours after I started the restore job, it don't end successfully

When I run same job with a smaller number of files, the job ends successfully.

In bprestore log :

18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.990: host_cache_size: 200 0x000000c8
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.991: cache_time: 3600 0x00000e10
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.1003: host_failed_cache_size: 40 0x00000028
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.1004: cache_time: 3600 0x00000e10
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.990: host_cache_size: 200 0x000000c8
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.991: cache_time: 3600 0x00000e10
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.1003: host_failed_cache_size: 40 0x00000028
18:25:14.138 [9416.4232] <2> init_cache: ..\libvlibs\vnet_hosts.c.1004: cache_time: 3600 0x00000e10
18:25:14.154 [9416.4232] <2> vnet_vnetd_service_socket: ..\libvlibs\vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
18:25:14.154 [9416.4232] <2> vnet_vnetd_service_socket: ..\libvlibs\vnet_vnetd.c.2057: service: bprd

 

In bprd log :

20:25:21.537 [9020.2928] <2> find_files: Got CONTINUE from bpdbm
20:25:21.537 [9020.2928] <2> find_files:    file_path = /D/sir_fs_01/000/000/005/139/103529
20:25:21.537 [9020.2928] <2> find_files:    start_date = 0
20:25:21.537 [9020.2928] <2> find_files:    end_date = 1366129514
20:25:46.465 [9020.2928] <2> find_files: Got CONTINUE from bpdbm
20:25:46.465 [9020.2928] <2> get_long: (1) cannot read (byte 1) from network: An existing connection was forcibly closed by the remote host.
20:25:46.465 [9020.2928] <2> get_next_triplet: get_string() failed, An existing connection was forcibly closed by the remote host.  (10054), network read error
20:25:46.465 [9020.2928] <2> db_error_add_to_file: dberrorq.c:midnite = 1366063200
20:25:46.497 [9020.2928] <2> mail_msg_and_set_exit_status: entered; status = 42
20:25:46.497 [9020.2928] <2> mail_msg_and_set_exit_status: Attempting to send mail to root on cs3
20:25:46.575 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:46.575 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:46.590 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2882 TO 172.16.80.57.13724
20:25:46.590 [9020.2928] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.179: msg: VNETD CONNECT FROM 172.16.80.57.2883 TO 172.16.80.57.13724 fd = 1688
20:25:46.622 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.532: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
20:25:46.653 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.549: ipc_string: 2884
20:25:47.371 [9020.2928] <2> mail_msg_and_set_exit_status: CLIENT_CMD_SOCK from bpcr = 1704
20:25:47.371 [9020.2928] <2> mail_msg_and_set_exit_status: CLIENT_STAT_SOCK from bpcr = 1688
20:25:47.403 [9020.2928] <2> bpcr_get_version_rqst: bpcd version: 06530000
20:25:47.403 [9020.2928] <2> bpcr_get_version_rqst: bpcd version: 06530000
20:25:47.418 [9020.2928] <2> bpcr_get_version_rqst: bpcd version: 06530000
20:25:47.418 [9020.2928] <2> mail_msg_and_set_exit_status: RESTORE EXIT STATUS = 42
20:25:47.418 [9020.2928] <2> db_error_add_to_file: dberrorq.c:midnite = 1366063200
20:25:47.434 [9020.2928] <2> job_end_try: Done
20:25:47.434 [9020.2928] <2> job_monitoring_exex: ACK disconnect
20:25:47.434 [9020.2928] <2> job_disconnect: Disconnected
20:25:47.449 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:47.449 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:47.449 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2891 TO 172.16.80.57.13724
20:25:47.449 [9020.2928] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.179: msg: VNETD CONNECT FROM 172.16.80.57.2892 TO 172.16.80.57.13724 fd = 1740
20:25:47.465 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.532: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
20:25:47.465 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.549: ipc_string: 2893
20:25:47.543 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:47.543 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:47.543 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2898 TO 172.16.80.57.13724
20:25:47.543 [9020.2928] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.179: msg: VNETD CONNECT FROM 172.16.80.57.2899 TO 172.16.80.57.13724 fd = 1756
20:25:47.559 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.532: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
20:25:47.559 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.549: ipc_string: 2900
20:25:47.637 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:47.637 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:47.637 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2905 TO 172.16.80.57.13724
20:25:47.637 [9020.2928] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.179: msg: VNETD CONNECT FROM 172.16.80.57.2906 TO 172.16.80.57.13724 fd = 1740
20:25:47.652 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.532: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
20:25:47.652 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.549: ipc_string: 2907
20:25:47.715 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:47.715 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:47.715 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2913 TO 172.16.80.57.13724
20:25:47.715 [9020.2928] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.179: msg: VNETD CONNECT FROM 172.16.80.57.2915 TO 172.16.80.57.13724 fd = 1756
20:25:47.731 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.532: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
20:25:47.731 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.549: ipc_string: 2916
20:25:47.793 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:47.793 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:47.793 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2921 TO 172.16.80.57.13724
20:25:47.793 [9020.2928] <2> vnet_connect_to_vnetd_extra: vnet_vnetd.c.179: msg: VNETD CONNECT FROM 172.16.80.57.2922 TO 172.16.80.57.13724 fd = 1740
20:25:47.809 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.532: VN_REQUEST_CONNECT_FORWARD_SOCKET: 10 0x0000000a
20:25:47.809 [9020.2928] <2> vnet_vnetd_connect_forward_socket_begin: vnet_vnetd.c.549: ipc_string: 2923
20:25:47.887 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2043: VN_REQUEST_SERVICE_SOCKET: 6 0x00000006
20:25:47.887 [9020.2928] <2> vnet_vnetd_service_socket: vnet_vnetd.c.2057: service: bpcd
20:25:47.902 [9020.2928] <2> logconnections: BPCD CONNECT FROM 172.16.80.57.2931 TO 172.16.80.57.13724

 

In bpcd log :

20:25:51.073 [7460.7488] <2> bpcd main: BPCD_LOG_RQST_NO_STATUS
20:25:51.073 [7460.7488] <2> bpcd main: BPCD_LOG_RQST_NO_STATUS
20:25:51.073 [7460.7488] <2> bpcd main: BPCD_CLOSE_FILE_RQST
20:25:51.073 [7460.7488] <2> bpcd main: BPCD_DISCONNECT_RQST
20:25:51.073 [7460.7488] <2> bpcd exit_bpcd: exit status 0  ----------->exiting
20:25:51.120 [6720.4564] <2> bpcd main: offset to GMT -3600
20:25:51.120 [6720.4564] <2> bpcd main: Got socket for input 548
20:25:51.120 [6720.4564] <2> logconnections: BPCD ACCEPT FROM 172.16.80.57.3203 TO 172.16.80.57.13724
20:25:51.120 [6720.4564] <2> bpcd main: setup_sockopts complete
20:25:51.135 [6720.4564] <2> bpcd peer_hostname: Connection from host cs3.bonifatius-lingen.de (172.16.80.57) port 3203
20:25:51.135 [6720.4564] <2> bpcd valid_server: comparing cs3 and cs3.bonifatius-lingen.de
20:25:51.135 [6720.4564] <4> bpcd valid_server: hostname comparison succeeded
20:25:51.135 [6720.4564] <2> bpcd main: output socket port number = 1
20:25:51.135 [6720.4564] <2> bpcd main: Duplicated vnetd socket on stderr
20:25:51.135 [6720.4564] <2> bpcd main: <---- NetBackup 6.5 0 ------------initiated
20:25:51.135 [6720.4564] <2> bpcd main: VERBOSE = 0
20:25:51.135 [6720.4564] <2> bpcd main: Not using VxSS authentication with cs3.bonifatius-lingen.de
20:25:51.135 [6720.4564] <2> bpcd main: BPCD_OPEN_FOR_WRITE_RQST
20:25:51.151 [6720.4564] <2> bpcd main: open for write file name = C:\Migration\SIRRestore_IMS.NB.log
20:25:51.151 [6720.4564] <2> bpcd main: buffer length = 4096
20:25:51.151 [6720.4564] <2> bpcd main: our open flags = 0x119
20:25:51.151 [6720.4564] <2> bpcd main: client open flags = 0x109
20:25:51.151 [6720.4564] <2> bpcd main: completed the write of the file number to the output socket
20:25:51.151 [6720.4564] <2> bpcd main: BPCD_LOG_RQST_NO_STATUS
20:25:51.151 [6720.4564] <2> bpcd main: BPCD_CLOSE_FILE_RQST
20:25:51.151 [6720.4564] <2> bpcd main: BPCD_DISCONNECT_RQST
20:25:51.151 [6720.4564] <2> bpcd exit_bpcd: exit status 0  ----------->exiting

 

In tar log :

 

6:03:16.298 PM: [6220.6224] <4> ov_log::OVInit: INF - Starting log file: C:\Program Files\Veritas\NetBackup\logs\TAR\041613.LOG

6:03:16.298 PM: [6220.6224] <4> ov_log::OVInit: GENERAL Log Level: 0
6:03:16.298 PM: [6220.6224] <4> ov_log::OVInit: TCP Log Level: 0
6:03:16.298 PM: [6220.6224] <4> ov_log::OVInit: INF - the log mutex: 1776
TAR  NetBackup TAR  6.5GA  [Oct 30 2008]
Copyright 1993 - 2007 VERITAS Software Corporation
All Rights Reserved.

6:03:16.298 PM: [6220.6224] <2> WinMain: DAT - _pgmptr = 'C:\Program Files\Veritas\NetBackup\bin\tar32.exe'
6:03:16.298 PM: [6220.6224] <2> WinMain: DAT - lpCmdLine = '-x -v -Y -p -P -I 1366128169 -U 13 -J clnt_lc_messages=en -J clnt_lc_time=en -J clnt_lc_ctype=en -J clnt_lc_collate=en -J clnt_lc_numeric=en -J restoreid=18883.001 -J job_total=1 -J shm_restore -J client=cs3 -J requesting_client=cs3 -J browse_client=cs3 -J backup_time=1286587274 -L /C/PROGRA��1/Kodak/System5/log/algotec_class_restore_log -f - -J shm_restore_fd=1864 '
6:03:16.298 PM: [6220.6224] <2> date_debug: DAT - timezone: W. Europe Standard Time, offset=-3600, dst: W. Europe Daylight Time
6:03:16.298 PM: [6220.6224] <2> date_debug: DAT - current time: 1366128196, 4/16/2013 6:03:16 PM
6:03:16.298 PM: [6220.6224] <2> date_debug: DAT - 01/01/94 UCT:  757382400, 1/1/1994 2:00:00 AM
6:03:16.298 PM: [6220.6224] <2> date_debug: DAT - 07/01/94 UCT:  773020800, 7/1/1994 2:00:00 AM
6:03:16.782 PM: [6220.6224] <2> WinMain: DAT - standard input handle = 1748
6:03:16.782 PM: [6220.6224] <2> WinMain: DAT - standard output handle = 1744
6:03:16.782 PM: [6220.6224] <2> WinMain: DAT - standard error handle = 1740
6:03:17.001 PM: [6220.6224] <4> backup_list: INF - creating restore object
6:03:17.782 PM: [6220.6224] <4> tar_restore_tfi::create: INF - dwJobData: ffffffff
6:03:17.782 PM: [6220.6224] <4> tar_restore_tfi::create: INF -     dwJob: ffffffff
6:03:17.782 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - restore privileges enabled, previous = 0
6:03:17.782 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - backup privileges enabled, previous = 0
6:03:17.782 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - security privileges enabled, previous = 0
6:03:17.782 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - create token privileges enabled, previous = 0
6:03:17.782 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - take ownership privileges enabled, previous = 0
6:03:18.079 PM: [6220.6224] <4> OVGetRestoreToken: INF - successfully wrote restore information to named pipe
6:03:18.079 PM: [6220.6224] <4> OVGetRestoreToken: INF - success obtaining access token
6:03:18.079 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - success obtaining access token: (1632)
6:03:18.079 PM: [6220.6224] <4> base_restore::tfs_enable_restore_privileges: INF - user has restore privileges
6:03:18.157 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - BEDS_Init() Enter InitFlags:0x0
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: C:
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: D:
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: E:
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: I:
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Microsoft Terminal Services
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Microsoft Windows Network
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Web Client Network
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Oracle
6:03:19.829 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - DumpDleInfo() DLE Device Name: Shadow?Copy?Components
6:03:45.471 PM: [6220.9632] <4> tar_base::keepaliveThread: INF - keepalive thread is active with an interval of 30 seconds
6:03:45.502 PM: [6220.6224] <2> tar_base::backup_finish: TAR - restore:                     0 files
6:03:45.502 PM: [6220.6224] <2> tar_base::backup_finish: TAR - restore:          file data:          0 bytes
6:03:45.502 PM: [6220.6224] <2> tar_base::backup_finish: TAR - restore:         image data:          0 bytes
6:03:45.502 PM: [6220.6224] <2> tar_base::backup_finish: TAR - restore:       elapsed time:          1 secs            0 bps
6:03:45.502 PM: [6220.6224] <8> tar_restore::processStateEOT: WRN - processStateEOT before keepalive
6:03:45.502 PM: [6220.9632] <4> tar_base::keepaliveThread: INF - keepalive thread terminating (reason: WAIT_OBJECT_0)
6:03:45.502 PM: [6220.6224] <4> tar_base::stopKeepaliveThread: INF - keepalive thread has exited. (reason: WAIT_OBJECT_0)
6:03:45.502 PM: [6220.6224] <2> tar_base::V_Close:
6:03:45.502 PM: [6220.6224] <4> BufferManagerLegacySharedMemory::~BufferManagerLegacySharedMemory(): INF - [1528] waited for full buffer 0 times, delayed 0 times.
6:03:45.533 PM: [6220.9600] <2> ov_log::V_GlobalLog: INF - BEDS_Term() Enter InitFlags:0x1
6:03:45.565 PM: [6220.6224] <4> OVStopCmd: INF - EXIT - status = 0
6:03:45.565 PM: [6220.6224] <4> OVShutdown: INF - Finished process
6:03:45.565 PM: [6220.6224] <4> WinMain: INF - Exiting C:\Program Files\Veritas\NetBackup\bin\tar32.exe
6:03:47.565 PM: [6220.6224] <4> ov_log::OVClose: INF - Closing log file: C:\Program Files\Veritas\NetBackup\logs\TAR\041613.LOG
 

Can anyone help me??
 

Operating Systems:

Comments 4 CommentsJump to latest comment

Yasuhisa Ishikawa's picture

This seems an issue of TCP KeepAlive.

Please try to set longer KeepAliveTime value(2 hour by default) on both servers and client. This parameter is of Windows TCP stack, so check MS doc or ask to OS admin for further information.

Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan

francesco.arioldi@nispro.it's picture

Thank-you Yasuhisa for your kindly response. I have applied your suggestion and a job with more than 10k files is terminated successfully after 8 hours.

Now I have to restore about 1M of files from different tapes, so I'm scheduling a  lot of jobs to accomplish this.

 

rookie11's picture

i think u tried to restore from NBU GUI. ur NBU GUI might have gone blank or sort of hung while viewing those files inside folder. 

please give it a try using bplist command to view files which were backedup then use bprestore command to restore them.

 

Marianne's picture

bprd running out of memory is a known issue with large amount of files to be restored in older versions of NBU.

Please upgrade to NBU 7.1.0.4 or 7.5.0.5. NBU 6.x reached EOSL in Oct last year.
A 64-bit master server with more memory will also be a good idea.

In the meantime, try to select smaller amount of files to restore at a time.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links