Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Catalog Backup fails with network connection timed out(41)

Created: 06 Feb 2013 • Updated: 07 Feb 2013 | 6 comments
This issue has been solved. See solution.

 

Gidday,
 
We recently manifested an issues where our daily full catalog backup started to fail with a network connection timed out(41) error. After a lot of research, troubleshooting, restarting, bouncing services etc I came up empty. In desperation, I froze the current tape it was using for the catalog backups, assigned some new ones to the pool and it magically started to work again (!?!?).
 
However, it began to periodically fail again, and now fails consistently. I’ tried the old trick of assigned some new tapes to sue, but the issue persists and I am completely at a loose end. Any ideas on where to look for the cause?
 
 
Master: Win2k3 R2 Sp2, NBU 7.5.0.3 Ent
Media: Win2k8 R2 Sp1, NBU 7.5.0.3 Ent
 
 
Console logs below:
 
 
CATALOG_DRIVEN_BACKUP staging begins.
 
 
7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) staging relational database files for catalog backup     
7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) staging NBAZDB backup to D:\VERITAS\NetBackupDB\staging       
7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) done staging NBAZDB backup to D:\VERITAS\NetBackupDB\staging      
7/02/2013 1:26:30 p.m. - Info bpdbm(pid=4796) staging NBDB backup to D:\VERITAS\NetBackupDB\staging       
7/02/2013 1:26:49 p.m. - Info bpdbm(pid=4796) done staging NBDB backup to D:\VERITAS\NetBackupDB\staging      
7/02/2013 1:28:32 p.m. - Info bpdbm(pid=4796) validating NBAZDB backup in D:\VERITAS\NetBackupDB\staging       
7/02/2013 1:28:34 p.m. - Info bpdbm(pid=4796) done validating NBAZDB backup in D:\VERITAS\NetBackupDB\staging      
7/02/2013 1:28:34 p.m. - Info bpdbm(pid=4796) validating NBDB backup in D:\VERITAS\NetBackupDB\staging       
7/02/2013 1:28:46 p.m. - Info bpdbm(pid=4796) done validating NBDB backup in D:\VERITAS\NetBackupDB\staging      
the requested operation was successfully completed(0)
 
 
D:\VERITAS\NetBackupDB\staging\DARS_DATA.db writes to tape:
 
 
7/02/2013 1:26:52 p.m. - Info nbjm(pid=7348) starting backup job (jobid=4698153) for client ucbackupmaster, policy UCCatalog, schedule Full 
7/02/2013 1:26:52 p.m. - Info nbjm(pid=7348) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=4698153, request id:{7CD98CB4-9847-4DF3-8F08-F9A593FE4EDC}) 
7/02/2013 1:26:52 p.m. - requesting resource UCBackupMedia1-SL500-STU
7/02/2013 1:26:52 p.m. - requesting resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster
7/02/2013 1:26:52 p.m. - requesting resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog
7/02/2013 1:26:52 p.m. - granted resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster
7/02/2013 1:26:52 p.m. - granted resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog
7/02/2013 1:26:52 p.m. - granted resource UC0514
7/02/2013 1:26:52 p.m. - granted resource HP.ULTRIUM4-SCSI.001
7/02/2013 1:26:52 p.m. - granted resource UCBackupMedia1-SL500-STU
7/02/2013 1:26:54 p.m. - estimated 0 Kbytes needed
7/02/2013 1:26:54 p.m. - Info nbjm(pid=7348) started backup (backupid=ucbackupmaster_1360196813) job for client ucbackupmaster, policy UCCatalog, schedule Full on storage unit UCBackupMedia1-SL500-STU
7/02/2013 1:26:55 p.m. - started process bpbrm (7484)
7/02/2013 1:26:56 p.m. - connecting
7/02/2013 1:26:57 p.m. - Info bpbrm(pid=7484) ucbackupmaster is the host to backup data from    
7/02/2013 1:26:57 p.m. - Info bpbrm(pid=7484) reading file list from client       
7/02/2013 1:26:58 p.m. - Info bpbrm(pid=7484) listening for client connection        
7/02/2013 1:27:07 p.m. - Info bpbrm(pid=7484) INF - Client read timeout = 300     
7/02/2013 1:27:07 p.m. - connected; connect time: 00:00:11
7/02/2013 1:27:08 p.m. - Info bpbrm(pid=7484) accepted connection from client        
7/02/2013 1:27:08 p.m. - Info bpbrm(pid=7484) start bpbkar on client        
7/02/2013 1:27:12 p.m. - Info bphdb(pid=5596) Backup started          
7/02/2013 1:27:12 p.m. - Info bphdb(pid=4796) Backup started          
7/02/2013 1:27:12 p.m. - Info bptm(pid=9240) start           
7/02/2013 1:27:12 p.m. - Info bptm(pid=9240) using 262144 data buffer size       
7/02/2013 1:27:13 p.m. - Info bptm(pid=9240) setting receive network buffer to 1048576 bytes     
7/02/2013 1:27:13 p.m. - Info bptm(pid=9240) using 64 data buffers        
7/02/2013 1:27:14 p.m. - Info bptm(pid=9240) start backup          
7/02/2013 1:27:14 p.m. - Info bptm(pid=9240) backup child process is pid 8256.13428      
7/02/2013 1:27:14 p.m. - Info bptm(pid=9240) Waiting for mount of media id UC0514 (copy 1) on server ucbackupmedia1.
7/02/2013 1:27:14 p.m. - Info bptm(pid=8256) start           
7/02/2013 1:27:14 p.m. - mounting UC0514
7/02/2013 1:27:58 p.m. - Info bptm(pid=9240) media id UC0514 mounted on drive index 4, drivepath {8,0,0,0}, drivename HP.ULTRIUM4-SCSI.001, copy 1
7/02/2013 1:27:58 p.m. - mounted; mount time: 00:00:44
7/02/2013 1:28:03 p.m. - positioning UC0514 to file 1
7/02/2013 1:28:09 p.m. - positioned UC0514; position time: 00:00:06
7/02/2013 1:28:09 p.m. - begin writing
7/02/2013 1:28:16 p.m. - Info bptm(pid=9240) waited for full buffer 439 times, delayed 445 times   
7/02/2013 1:28:16 p.m. - Info bphdb(pid=4796) done. status: 0         
7/02/2013 1:28:24 p.m. - Info bptm(pid=9240) EXITING with status 0 <----------       
7/02/2013 1:28:24 p.m. - Info bpbrm(pid=7484) validating image for client ucbackupmaster       
7/02/2013 1:28:30 p.m. - end writing; write time: 00:00:21
the requested operation was successfully completed(0)
 
 
The third step seems to fail:
 
 
7/02/2013 1:28:50 p.m. - Info nbjm(pid=7348) starting backup job (jobid=4698159) for client ucbackupmaster, policy UCCatalog, schedule Full 
7/02/2013 1:28:50 p.m. - Info nbjm(pid=7348) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=4698159, request id:{0B669D3A-E130-40F6-BC95-B1E9AC203379}) 
7/02/2013 1:28:50 p.m. - requesting resource UCBackupMedia1-SL500-STU
7/02/2013 1:28:50 p.m. - requesting resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster
7/02/2013 1:28:50 p.m. - requesting resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog
7/02/2013 1:28:50 p.m. - granted resource ucbackupmaster.NBU_CLIENT.MAXJOBS.ucbackupmaster
7/02/2013 1:28:50 p.m. - granted resource ucbackupmaster.NBU_POLICY.MAXJOBS.UCCatalog
7/02/2013 1:28:50 p.m. - granted resource UC0514
7/02/2013 1:28:50 p.m. - granted resource HP.ULTRIUM4-SCSI.001
7/02/2013 1:28:50 p.m. - granted resource UCBackupMedia1-SL500-STU
7/02/2013 1:28:51 p.m. - estimated 107257414 Kbytes needed
7/02/2013 1:28:51 p.m. - Info nbjm(pid=7348) started backup (backupid=ucbackupmaster_1360196930) job for client ucbackupmaster, policy UCCatalog, schedule Full on storage unit UCBackupMedia1-SL500-STU
7/02/2013 1:28:52 p.m. - started process bpbrm (9968)
7/02/2013 1:28:54 p.m. - Info bpbrm(pid=9968) ucbackupmaster is the host to backup data from    
7/02/2013 1:28:54 p.m. - Info bpbrm(pid=9968) reading file list from client       
7/02/2013 1:28:54 p.m. - connecting
7/02/2013 1:28:56 p.m. - Info bpbrm(pid=9968) starting bpbkar32 on client        
7/02/2013 1:28:56 p.m. - connected; connect time: 00:00:02
7/02/2013 1:28:58 p.m. - Info bpbkar32(pid=7520) Backup started          
7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) start           
7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) using 262144 data buffer size       
7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) setting receive network buffer to 1048576 bytes     
7/02/2013 1:28:58 p.m. - Info bptm(pid=13836) using 64 data buffers        
7/02/2013 1:28:59 p.m. - Info bptm(pid=13836) start backup          
7/02/2013 1:28:59 p.m. - Info bptm(pid=13836) backup child process is pid 6932.5972      
7/02/2013 1:28:59 p.m. - Info bptm(pid=13836) media id UC0514 mounted on drive index 4, drivepath {8,0,0,0}, drivename HP.ULTRIUM4-SCSI.001, copy 1
7/02/2013 1:28:59 p.m. - Info bptm(pid=6932) start           
7/02/2013 1:28:59 p.m. - mounted
7/02/2013 1:28:59 p.m. - positioning UC0514 to file 2
7/02/2013 1:29:01 p.m. - positioned UC0514; position time: 00:00:02
7/02/2013 1:29:01 p.m. - begin writing
7/02/2013 1:34:10 p.m. - end writing; write time: 00:05:09
network connection timed out(41)
 
I’ve tried increasing timeouts to no avail.
 
Cheers,
 
Matthew
 

Comments 6 CommentsJump to latest comment

sazz.'s picture
7/02/2013 1:29:01 p.m. - begin writing
7/02/2013 1:34:10 p.m. - end writing; write time: 00:05:09
network connection timed out(41)
 
Failing after 5 minutes which is by deafult. Are you sure you have increased the CLIENT_READ_TIMEOUT.Try to increase this to 1800 and see if it makes any difference.
http://www.symantec.com/docs/HOWTO50656
 
Burpythehippo's picture

Hi Sazz,

Yes, client read timeout is set to 3600 and has been for some time. The write time always fails at around 00:05:09 irrespective of timeout values. 

sazz.'s picture

Please post bpbkar, bptm and bpbrm logs as file attachment and give them respective names. The best would be run the manual catalog and post the output of detailed status and bpbkar,bptm and bpbrm.

Burpythehippo's picture

Hi Sazz,

Master server logs attached (nothing was logged to bpbrm even with logging set to 5).

AttachmentSize
bptm.zip 18.23 KB
bpbkar.zip 30.03 KB
Marianne's picture

Have you tried to backup normal filesystem/folder on master server to the media server?

I doubt that this is a media issue. Media is mounted, postioned and waiting for data.
We see a Client Connect timeout, (not Read timeout) although initial connection was successful. 

Please ensure all of the following logs exist:

master: bpcd and bpbkar

media: bpbrm and bptm
(These logs go on media server - not master)

PS: Where did you increase timeouts? Master or media server?
Timeouts should be increased on the media server.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

SOLUTION
Burpythehippo's picture

Hi Marianne,

Thanks for the clarifcations! I had foolishly only increased the Client Connect Timeout on the Master server, not the Media. I've set this to 3600 and the catalog started writing after about 20 minutes. I think i might have some fragmentation on the catalog disk causing some slowdown.

Thanks to you and Sazz for pointing me in the right direction. I'll monitor things for a while, but for the immediate moment I think I'll mark this as resolved. 

Cheers,

Matthew