Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Restoring huge files fails with status 5, small files ok

Created: 11 May 2014 | 6 comments

Hi, need your help.

We have a Netbackup master and media server with version 7.0 running on Windows Server 2008 Enterprise 64bit. 

We have been trying to restore several files from tape to a Linux server. When we trigger a restore, the job runs for several hours and fail with status 5. 

The smallest file is a little over 2 gigabytes in size. We tried restoring just that one file, the job ran over an hour, then failed with status 5. 

Just to test, We tried to restore the bp.conf file to the exact same server and path and it was successful.

There is a firewall between the Netbackup servers and the Linux client. We initially thought it to be tcp idle timeout settings on the firewall. OUr network admin, said it was set to 30 mins. Following http://www.symantec.com/docs/HOWTO56221, we set the keepalive time to 15 minutes. We still get a status 5. 

We tried to restore from other dates to make sure it wasn't the tape, but got the same error. 

Here are the job details. 

3/05/2014 11:55:55 PM - begin Restore
3/05/2014 11:56:04 PM - 1 images required
3/05/2014 11:56:04 PM - media GQR422 required
3/05/2014 11:56:41 PM - restoring image sdc-drrac01.inside.edirail.com.au_1399055641
3/05/2014 11:56:52 PM - requesting resource GQR422
3/05/2014 11:56:52 PM - awaiting resource GQR422 Reason: Media is in use, Media Server: N/A, 
                 Robot Number: NONE, Robot Type: NONE, Media ID: GQR422, Drive Name: N/A, 
                 Volume Pool: N/A, Storage Unit: N/A, Drive Scan Host: N/A
                
3/05/2014 11:56:57 PM - connecting
3/05/2014 11:57:01 PM - Warning bpbrm(pid=7284) expected start message from sdc-drrac01.inside.edirail.com.au; read: Unrecognized -J string spsrestoreoptions=0   
3/05/2014 11:57:01 PM - connected; connect time: 00:00:04
3/05/2014 11:57:21 PM - granted resource GQR422
3/05/2014 11:57:21 PM - granted resource HP.ULTRIUM4-SCSI.000
3/05/2014 11:57:25 PM - mounted
3/05/2014 11:57:27 PM - positioning GQR422 to file 3
3/05/2014 11:58:21 PM - positioned GQR422; position time: 00:00:54
3/05/2014 11:58:21 PM - begin reading
4/05/2014 1:00:26 AM - Error bptm(pid=10904) cannot write data to socket, 10053       
4/05/2014 1:00:29 AM - Error bptm(pid=10904) The following files/folders were not restored:       
4/05/2014 1:00:30 AM - Error bptm(pid=10904) UTF - /ora-backup/RAILMDR1/RAILMDR/backupset/2014_05_02/o1_mf_annnn_TAG20140502T021113_9p7765z4_.bkp          
4/05/2014 1:00:37 AM - restored image sdc-drrac01.inside.edirail.com.au_1399055641 - (socket write failed(24)); restore time 01:03:56
4/05/2014 1:00:42 AM - Warning bprd(pid=9924) Restore must be resumed prior to first image expiration on 4/08/2014 4:34:01 AM
4/05/2014 1:00:47 AM - end Restore; elapsed time: 01:04:52
the restore failed to recover the requested files(5)

I already have some of the logs that may help but I'm trying to figure out which portion to post because its pretty long.

 

Operating Systems:

Comments 6 CommentsJump to latest comment

RamNagalla's picture

hi ,

what is the read timeout values in Media server...?

 please increase the read timeout values in the Meida server and try the restore again..

 

and also set the TCP timeout value to 300 seconds(5 min)

Marianne's picture

There is a firewall between the Netbackup servers and the Linux client.

 

KeepAlive timeout usually helps. Where exactly did you adjust KeepAlive settings? Master? Media? Client?

Best to adjust on all of them. 
Media server seems to be most important. See http://www.symantec.com/docs/TECH145234
 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mtimbol's picture

Hi Marianne,

We change the keepalive setting in both the master and media server.

 

Hi Nagalla,

You said to decrease the "tcp timeout value" to 5 mins. Don't you mean the keepalive settings?

Wouldn't that create too much unecessary network traffic.

 

.... btw, we were originally trying to restore to a Linux server with one network card. Both production and backup traffic went through that. As workaround, we restored to another Linux server with a second network card dedicated to backup only. This does not go through the firewall, restore was successful.

However, I would still like to be able to restore through the firewall. 

Marianne's picture

Apologies - something wrong with the URL in my previous post.
Try this: http://www.symantec.com/business/support/index?page=content&pmv=print&impressions=&viewlocale=&id=S:TECH145234 

About creating too much traffic - 

Seems there are 2 choices:

1. Increase firewall timeout

2. Reduce keepalive setting.

Advantages and disadvantages in this TN: http://www.symantec.com/docs/TECH167276 

Try adjusting keepalive on the client.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Michael G Andersen's picture

Does the linux client have it's own firewall started ?

If you have not already, get the network admin to set up logging on the firewall so you can if anything is rejected under restore attempt

I would also do netstat on the client and compare to the logs to see if expected connections was established

Maybe a stupid question, but have you configured the linux client to run through firewalls in netbackup ? the connect back option

Deb Wilmot's picture

Marianne - TECH145234 was expired (which is why the first link didn't work).   I don't know if everyone can view the expired content so I republished that technote. IT should again be viewable at: http://www.symantec.com/docs/TECH145234
 

mtimbol - please review the above technote - it does have a lot of good info.

Deb