Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Netbackup 7.5 Duplication Failing after 2 Hours - Cisco ASA timeout?

Created: 14 Feb 2013 | 3 comments

Hi All,

We have an issue whereby our duplication is failing after exactly 2 hours.

We are duplicating across a Virgin media managed WAN link and the NBUs are behind Cisco Asa Firewalls and we suspect this is a TCP connection timeout issue.

The issue reporting on the NBU appliance is as follows:

14/02/2013 13:45:25 - begin Duplicate

14/02/2013 13:45:25 - requesting resource LCM_nbu-xxxx_dedupe_stu

14/02/2013 13:45:25 - granted resource LCM_nbu-xxxx_dedupe_stu

14/02/2013 13:45:25 - started process RUNCMD (9872)

14/02/2013 13:45:25 - ended process 0 (9872)

14/02/2013 13:45:26 - requesting resource nbu-xxxx_dedupe_stu

14/02/2013 13:45:26 - reserving resource @aaabK

14/02/2013 13:45:26 - reserved resource @aaabK

14/02/2013 13:45:26 - granted resource MediaID=@aaabL;DiskVolume=PureDiskVolume;DiskPool=dp_disk_nbu-xxxx;Path=PureDiskVolume;StorageServer=nbu-xxxx;MediaServer=nbu-xxxx

14/02/2013 13:45:26 - granted resource nbu-xxxx_dedupe_stu

14/02/2013 13:45:27 - requesting resource @aaabK

14/02/2013 13:45:28 - Info Duplicate(pid=9872) Initiating optimized duplication from @aaabK to @aaabL     

14/02/2013 13:45:28 - granted resource MediaID=@aaabK;DiskVolume=PureDiskVolume;DiskPool=dp_disk_nbu-yyyy;Path=PureDiskVolume;StorageServer=nbu-yyyy;MediaServer=nbu-xxxx

14/02/2013 13:46:09 - Info bpdm(pid=22962) started           

14/02/2013 13:46:09 - started process bpdm (22962)

14/02/2013 13:46:11 - Info bpdm(pid=22962) requesting nbjm for media        

14/02/2013 13:46:21 - begin writing

14/02/2013 13:46:24 - end writing; write time: 00:00:03

14/02/2013 13:46:26 - begin writing

14/02/2013 13:46:29 - end writing; write time: 00:00:03

14/02/2013 13:46:30 - begin writing

14/02/2013 13:46:34 - end writing; write time: 00:00:04

14/02/2013 13:46:35 - begin writing

14/02/2013 13:46:38 - end writing; write time: 00:00:03

14/02/2013 13:46:40 - begin writing

14/02/2013 13:46:43 - end writing; write time: 00:00:03

14/02/2013 13:46:44 - begin writing

14/02/2013 13:46:47 - end writing; write time: 00:00:03

14/02/2013 13:46:49 - begin writing

14/02/2013 13:46:52 - end writing; write time: 00:00:03

14/02/2013 13:46:53 - begin writing

14/02/2013 13:46:56 - end writing; write time: 00:00:03

14/02/2013 13:46:57 - begin writing

14/02/2013 13:47:00 - end writing; write time: 00:00:03

14/02/2013 13:47:01 - begin writing

14/02/2013 13:47:05 - end writing; write time: 00:00:04

14/02/2013 13:47:06 - begin writing

14/02/2013 13:47:09 - end writing; write time: 00:00:03

14/02/2013 13:47:10 - begin writing

14/02/2013 13:47:13 - end writing; write time: 00:00:03

14/02/2013 13:47:15 - begin writing

14/02/2013 13:47:17 - end writing; write time: 00:00:02

14/02/2013 13:47:19 - begin writing

14/02/2013 13:47:22 - end writing; write time: 00:00:03

14/02/2013 13:47:23 - begin writing

14/02/2013 13:47:26 - end writing; write time: 00:00:03

14/02/2013 13:47:27 - begin writing

14/02/2013 13:47:31 - end writing; write time: 00:00:04

14/02/2013 13:47:32 - begin writing

14/02/2013 13:47:35 - end writing; write time: 00:00:03

14/02/2013 13:47:36 - begin writing

14/02/2013 13:47:39 - end writing; write time: 00:00:03

14/02/2013 13:47:40 - begin writing

14/02/2013 13:47:44 - end writing; write time: 00:00:04

14/02/2013 13:47:45 - begin writing

14/02/2013 13:47:48 - end writing; write time: 00:00:03

14/02/2013 13:47:50 - begin writing

14/02/2013 13:47:53 - end writing; write time: 00:00:03

14/02/2013 13:47:55 - begin writing

14/02/2013 15:02:29 - end writing; write time: 01:14:34

14/02/2013 15:02:58 - Info bpdm(pid=22962) EXITING with status 0        

14/02/2013 15:02:59 - Info nbu-xxxx(pid=22962) StorageServer=PureDisk:nbu-yyyy; Report=PDDO Stats for (nbu-yyyy): scanned: 534147084 KB, CR sent: 15703012 KB, CR sent over FC: 0 KB, dedup: 97.1%

14/02/2013 15:45:30 - Error bpduplicate(pid=9872) socket read failed: errno = 10054 - An existing connection was forcibly closed by the remote host.

14/02/2013 15:45:30 - Error bpduplicate(pid=9872) host nbu-xxxx backup id sap-enp5.cbc.int_1360782263 optimized duplication failed, file read failed (13).

14/02/2013 15:45:30 - Error bpduplicate(pid=9872) Duplicate of backupid sap-enp5.cbc.int_1360782263 failed, file read failed (13).   

14/02/2013 15:45:30 - Error bpduplicate(pid=9872) Status = no images were successfully processed.     

14/02/2013 15:45:30 - end Duplicate; elapsed time: 02:00:05

file read failed(13)

===========================================

A log on one of the firewalls at the time of the failure is as follows:

6 Feb 14 2013 15:39:26 106015 nbu2.x.y 39775 10.x.x.x 1556 Deny TCP (no connection) from nbu2.x.y/39775 to 10.102.5.29/1556 flags ACK on interface xxxxxx

 

And the config for both firewalls are:

xxxxxxx-FW1# show run | i timeout
arp timeout 3600
timeout xlate 3:00:00
timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00 icmp 0:00:02
timeout sunrpc 2:00:00 h323 2:00:00 h225 2:00:00 mgcp 0:05:00
timeout mgcp-pat 0:05:00 sip 0:30:00 sip_media 0:02:00
timeout sip-invite 0:03:00 sip-disconnect 0:02:00
timeout uauth 0:05:00 absolute
 
yyyyyy-FW-1# show run | i timeout
arp timeout 14400
timeout xlate 3:00:00
timeout conn 1:00:00 half-closed 0:10:00 udp 0:02:00 icmp 0:00:02
timeout sunrpc 2:00:00 h323 2:00:00 h225 2:00:00 mgcp 0:05:00
timeout mgcp-pat 0:05:00 sip 0:30:00 sip_media 0:02:00
timeout sip-invite 0:03:00 sip-disconnect 0:02:00
timeout uauth 0:05:00 absolute
 
-----------------------------
 
We notice that the timeout sunrpc is configured for 2 hours
 
Could this be the issue?
 
Thanks in advance for any input or help on this matter.
 
Mark.

 

 

 

Discussion Filed Under:

Comments 3 CommentsJump to latest comment

NetRestore's picture

Hi Mark,

 

The crazy thing is we are now experiencing the sam issue over a WAN link except we are behind Fortigate 310B Firewalls.

 

 

Error bpduplicate (pid=8460) socket read failed: errno = 10054 - An existing connection was forcibly closed by the remote host. 
02/15/2013 06:36:03 - Error bpduplicate (pid=8460) host (HOST) backup id (CLIENT)_1360867209 optimized duplication failed, file read failed (13).
02/15/2013 06:36:03 - Error bpduplicate (pid=8460) Duplicate of backupid (CLIENT)_1360867209 failed, file read failed (13).
02/15/2013 06:36:03 - Error bpduplicate (pid=8460) Status = no images were successfully processed.
02/15/2013 06:36:03 - end Duplicate; elapsed time 2:00:15
 
I see the same constant thread for anything reaching two hours.
 
What media servers are you running on?
NetRestore's picture

Also from what i could gather is that timeouts are only related to stale connections and not active connections with data traversing them... 

Could be wrong, but we've checked and rechecked our OS timeouts and Firewalls and they are not causing the problem, supposedly.

Mark_Solutions's picture

The firewall setting can certainly cause this even though the link should be active. The issue being that a steady stream of data is working but the communication between media servers regarding the actual progress of that stream is using a different line of communication and it is this link that gets broken.

So the data is flowing and Ok but when a progress report is expected it does not get one and so considers the stream to have failed and kills the job.

The servers themselves can also cause it as a result of keep alive settings - appliances already have the keepalive interval, probes and times optimised so they should be fine.

The Master Server may need its setting done to match to help too (varies depending on the O/S) but your 2 hour setting can still affect things as it is not the data flow that is the issue it is the progress communications that can cause it all to fail - best to put them right up.

If they are IP Specific then make sure that media server to media server and media servers to master servers are all allowed with a good timeout

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.