Video Screencast Help

Error bpbrm: listen for client timeout during accept from data listen socket after 60 seconds

Created: 18 Apr 2011 | 7 comments
bsb's picture

I keep receiving the error message in the title even though I have gone through and set my timeout settings to at least 900 seconds. I have also run all the bpclntcmd commands and all of them return appropriately.

 

Is the 60 second timeout referred to in the error message a client side OS setting? I am running Red-Hat 5.

Thanks

Comments 7 CommentsJump to latest comment

Nicolai's picture

This is a issue when the OS doesn't return a list of files after 60 seconds. Try do a "ls -l" on the directory with the most files and see if it takes more than 60 seconds complete. Be aware that re-issuing the command properly will return the file list pretty quick since the list is in the file cache.

To find the directory that causes the timeout follow this T/N:

http://www.symantec.com/docs/TECH31513

You can't change the 60 second timeout, with CLIENT_READ_TIME or CLIENT_CONNENCT_TIMEOUT. To remedy the issue you can reduce the number of files, exclude it for backup or tune the file system.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

bsb's picture

Sorry I should have mentioned this is an RMAN backup not filesystem. I looked at the tech doc you linked to though and turned on the higher level tracing as it suggested. I am seeing some errors from vnetd:

 

13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.507: max_time: 120 0x00000078
13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.508: sock: 8 0x00000008
13:21:56.140 [18374] <2> vnet_sock_ready: vnet.c.509: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_byte: vnet.c.173: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_string: vnet.c.268: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_signed: vnet.c.312: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_pop_status: vnet.c.390: Function failed: 11 0x0000000b
13:21:56.140 [18374] <2> vnet_vnetd_connect_forward_socket_end: vnet_vnetd.c.614: try again: 540 0x0000021c

 

Thanks

rizwan84tx's picture

Could you enable bpcd log on the client (Verbose 5) and check if a response was received.

Best Regards,

Rizwan

bsb's picture

I turned on logging. I do not see much in the BPCD logs. In the activity monitor I see a status 54 at 12:57:59 AM and in my RMAN logs I see several errors like the following:

 

        RMAN-03009: failure of backup command on s1 channel at 04/19/2011 01:14:10
        ORA-19506: failed to create sequential file, name="c-2237647430-20110412-00", parms=""
        ORA-27028: skgfqcre: sbtbackup returned error
        ORA-19511: Error received from media manager layer, error text:
        VxBSACreateObject: Failed with error:
        Server Status:  timed out connecting to client
        ORA-19600: input file is backup piece  (+ARCHIVE/autobackup/2011_04_12/s_748224764.305.748224765)
        ORA-19601: output file is backup piece  (c-2237647430-20110412-00)
        channel s1 disabled, job failed on it will be run on another channel

 

In the BPCD log at that time I see where the failure occured but there is no actual error message. It just submits a close request then another open.

 

00:57:59.825 [6374] <2> process_requests: BPCD_OPEN_FOR_WRITE_RQST
00:57:59.826 [6374] <2> process_requests: open for write file name = /usr/openv/netbackup/logs/user_ops/dbext/logs/4769.0.1303189075
00:57:59.828 [6374] <2> process_requests: buffer length = 22
00:57:59.829 [6374] <2> process_requests: our open flags = 0x119
00:57:59.829 [6374] <2> process_requests: client open flags = 0x1441
00:57:59.829 [6374] <2> process_requests: completed the write of the file number to the output socket
00:57:59.833 [6374] <2> process_requests: BPCD_LOG_RQST
00:57:59.900 [6374] <2> process_requests: BPCD_CLOSE_FILE_RQST
00:58:01.858 [6374] <2> process_requests: BPCD_OPEN_FOR_WRITE_RQST
00:58:01.860 [6374] <2> process_requests: open for write file name = /usr/openv/netbackup/logs/user_ops/dbext/logs/4751.0.1303189077
00:58:01.861 [6374] <2> process_requests: buffer length = 22
00:58:01.863 [6374] <2> process_requests: our open flags = 0x119
00:58:01.863 [6374] <2> process_requests: client open flags = 0x1441
00:58:01.863 [6374] <2> process_requests: completed the write of the file number to the output socket
00:58:01.867 [6374] <2> process_requests: BPCD_LOG_RQST
00:58:01.930 [6374] <2> process_requests: BPCD_CLOSE_FILE_RQST
00:58:09.286 [7145] <2> process_requests: BPCD_DISCONNECT_RQST
00:58:09.286 [7145] <2> bpcd exit_bpcd: exit status 0  ----------->exiting

Since the backups write many of the files it is not a connection issue. It is consistent but not constant. Do I need to set one of the timeouts to be as long as the max file write time? I do not see any realted to how long a file takes to write.

Thanks

 

Nicolai's picture

Try ramp-up CLIENT_READ_TIMEOUT. RMAN sometimes need very long time to determine what archives is needed. All this time Netbackup need to wait. Try setting CLIENT_READ_TIMEOUT = 86400 and retry the operation. Today where all communication is passed thru 13724, more meaningful messages may be masked as sockets errors because of vnetd.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

bsb's picture

I set the read timeout to the max it would allow me (327667) and it still timed out.

 

readCommFile: ERR - timed out after 32767 seconds while reading from /usr/openv/netbackup/logs/user_ops/dbext/logs/20626.0.1303445742

 

Brian

Nicolai's picture

Can you confirm that a connection is really established ?

How fragmented is the recover catalog ?. I have seen really long response time from a recover catalog because some household routines of the RMAN catalog wasn't running (index update etc etc). Talk to the DBA about this.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.