Video Screencast Help
Symantec Appoints Michael A. Brown CEO. Learn more.

Restore failing with error : Error bptm ... cannot write data to socket, Broken pipe

Created: 28 Mar 2013 | 7 comments

 

Hi

While restoring data of a unix client we faced the error : 

"Cannot write data to socket, Broken pipe."

We are having Netbackup 7.5.0.3 master server (Solaris 10) , client is HP-UX 11.23. 
Please find the below logs :. This is a very critical restore for us. Please suggest

Thanks

28/03/2013 12:44:30 - begin Restore
28/03/2013 12:44:33 - media needed: 006116
28/03/2013 12:44:33 - media needed: 006108
28/03/2013 12:44:34 - restoring from image xxxxxxx_1363861974
28/03/2013 12:44:34 - Info bprd (pid=19456) Restoring from copy 1 of image created Thu Mar 21 07:32:54 2013
28/03/2013 12:44:44 - started process bptm (pid=28126)
28/03/2013 12:44:46 - requesting resource 006108
28/03/2013 12:44:47 - granted resource 006108
28/03/2013 12:44:47 - granted resource HP.ULTRIUM4-SCSI.005
28/03/2013 12:44:48 - started process bptm (pid=28126)
28/03/2013 12:44:48 - mounting 006108
28/03/2013 12:46:05 - mounted 006108; mount time: 0:01:17
28/03/2013 12:46:07 - positioning 006108 to file 8
28/03/2013 12:47:32 - positioned 006108; position time: 0:01:25
28/03/2013 12:47:34 - begin reading
28/03/2013 12:48:50 - Error bptm (pid=28127) cannot write data to socket, Broken pipe
===============================================================

 

Operating Systems:

Comments 7 CommentsJump to latest comment

Marianne's picture

Thanks for the logs - I will go through them a bit later in the day.

We need more logs, please:

bpbrm on the media server

bpcd and tar logs on the client.

If these log folders don't exist, please create them and retry the restore. Collect a full set of logs (including the ones previously posted) and upload.

If this matter is urgent, please log a call with Symantec Support as today is a public/bank holiday in most parts of the world.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Nicolai's picture

If not done already set CLIENT_READ_TIMEOUT to 36000 on client and master/media server.

If there is a firewall between master/media and clint set TCP_KEEPALIVE_INTERVAL to 15 minutes using ndd -set /dev/tcp tcp_keepalive_interval {time}. The time is in miliseconds.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

msanches's picture

Hi,Guys 

Restoring from a backup is FlashBackup (granular)

Follow in annex the logs requested (by Marianne)

The configuration parameter (comment buy Nocalai) TCP_KEEPALIVE_INTERVAL is : 7200000
But we cannot firewall between master/media

ndd -get /dev/tcp tcp_keepalive_interval
7200000

The configuration parameter (comment buy Nocalai) CLIENT_READ_TIMEOUT is : 10800

Thanks

AttachmentSize
LOG_MEDIA_BPCD.log_.rar 38.19 KB
LOG_MEDIA_BPBRM.log_.rar 45.35 KB
Marianne's picture

We are still missing bpcd and tar logs from the client.... 

We need to see what is happening on the client as well.

*** EDIT ***

The timestamps in the logs also do not correspond with the Job details.

We see above that restore was started at 12:44 and failed at 12:48.

We see in bpbrm a restore that completed successfully at 12:24.
The next timestamp is 14:36:

bptm starts at 14:59.

bprd seems to come from media server instead of master server:
bprd: canopus is not the primary server pabkp.nextel.com.br...exiting
 

We need a full set of logs that contain information about start and end of failed restore from:
master: bprd  
media server: bptm and bpbrm
client: bpcd and tar.
 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

msanches's picture

Marianne,

Informations very important (about this problem)

The media-sever  was the one who backed(Backup FlashBackup) up and that's where I need to restore

The log TAR, I believe that is clean (because restore not initialized on netbackup), please verify 

Thanks

Mauricio

 

 

AttachmentSize
BPCD_log.rar 38.18 KB
TAR_log.rar 694 bytes
Marianne's picture

Tar shows successful restore of a single file that completed at 12:24:

Please read through my previous post again.

You have not given us any logs that contains evidence of the failed restore between 12:44 and 12:48.

We also need media server's bptm log that corresponds with timestamps in bpbrm log.

As per my post above: 

We need a full set of logs that contain information about start and end of failed restore from:

master: bprd  
media server: bptm and bpbrm
client: bpcd and tar.
 

If media server is also the client, then we need those logs on the media server.
We still need the master's bprd log.

If there is a time difference between media server and master, please tell us exactly how much. 

I have no idea what this means:

....  restore not initialized on netbackup

How else are restores done if not on NetBackup?

Seems there was a problem with 'feedback' of successful restore to master :

12:24:24.537 [12964] <2> bpbrm write_msg_to_progress_file: (2201645.001) INF - TAR EXITING WITH STATUS = 0

12:24:24.538 [12964] <2> bpbrm handle_restore: from client canopus: INF - TAR RESTORED 1 OF 1 FILES SUCCESSFULLY

12:24:24.678 [12964] <16> bpbrm close_progress_log: could not close progress file /usr/openv/netbackup/logs/user_ops/pmsanche/logs/jbp-24494364483841946230000000094-uqaW1V.log on pabkp

Where did you check keepalive and timeout settings?
Master or media server?
Check both, please.

 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

msanches's picture

Hi, 

I run the restore again with date today 29/03/2013 17:26hs, and attached new logs for analysis (all logs media)
And I have the same problem with the restore

The parameters keepalive and timeout are shown below;
========================================

 

ndd -get /dev/tcp tcp_keepalive_interval (master and media the same)
7200000
 
The parameters timeout (bp.conf) are shown below;
=====================================
SERVER_CONNECT_TIMEOUT = 1800
CLIENT_READ_TIMEOUT = 36000
LIST_FILES_TIMEOUT = 10800
 

YES: media server is also the client

 

Thanks