Restoring huge files fails with status 5, small files ok
Hi, need your help.
We have a Netbackup master and media server with version 7.0 running on Windows Server 2008 Enterprise 64bit.
We have been trying to restore several files from tape to a Linux server. When we trigger a restore, the job runs for several hours and fail with status 5.
The smallest file is a little over 2 gigabytes in size. We tried restoring just that one file, the job ran over an hour, then failed with status 5.
Just to test, We tried to restore the bp.conf file to the exact same server and path and it was successful.
There is a firewall between the Netbackup servers and the Linux client. We initially thought it to be tcp idle timeout settings on the firewall. OUr network admin, said it was set to 30 mins. Following http://www.symantec.com/docs/HOWTO56221, we set the keepalive time to 15 minutes. We still get a status 5.
We tried to restore from other dates to make sure it wasn't the tape, but got the same error.
Here are the job details.
3/05/2014 11:55:55 PM - begin Restore
3/05/2014 11:56:04 PM - 1 images required
3/05/2014 11:56:04 PM - media GQR422 required
3/05/2014 11:56:41 PM - restoring image sdc-drrac01.inside.edirail.com.au_1399055641
3/05/2014 11:56:52 PM - requesting resource GQR422
3/05/2014 11:56:52 PM - awaiting resource GQR422 Reason: Media is in use, Media Server: N/A,
Robot Number: NONE, Robot Type: NONE, Media ID: GQR422, Drive Name: N/A,
Volume Pool: N/A, Storage Unit: N/A, Drive Scan Host: N/A
3/05/2014 11:56:57 PM - connecting
3/05/2014 11:57:01 PM - Warning bpbrm(pid=7284) expected start message from sdc-drrac01.inside.edirail.com.au; read: Unrecognized -J string spsrestoreoptions=0
3/05/2014 11:57:01 PM - connected; connect time: 00:00:04
3/05/2014 11:57:21 PM - granted resource GQR422
3/05/2014 11:57:21 PM - granted resource HP.ULTRIUM4-SCSI.000
3/05/2014 11:57:25 PM - mounted
3/05/2014 11:57:27 PM - positioning GQR422 to file 3
3/05/2014 11:58:21 PM - positioned GQR422; position time: 00:00:54
3/05/2014 11:58:21 PM - begin reading
4/05/2014 1:00:26 AM - Error bptm(pid=10904) cannot write data to socket, 10053
4/05/2014 1:00:29 AM - Error bptm(pid=10904) The following files/folders were not restored:
4/05/2014 1:00:30 AM - Error bptm(pid=10904) UTF - /ora-backup/RAILMDR1/RAILMDR/backupset/2014_05_02/o1_mf_annnn_TAG20140502T021113_9p7765z4_.bkp
4/05/2014 1:00:37 AM - restored image sdc-drrac01.inside.edirail.com.au_1399055641 - (socket write failed(24)); restore time 01:03:56
4/05/2014 1:00:42 AM - Warning bprd(pid=9924) Restore must be resumed prior to first image expiration on 4/08/2014 4:34:01 AM
4/05/2014 1:00:47 AM - end Restore; elapsed time: 01:04:52
the restore failed to recover the requested files(5)
I already have some of the logs that may help but I'm trying to figure out which portion to post because its pretty long.