Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.

Backups failing for Linux clients with status 24, and 41

Created: 29 Apr 2013 • Updated: 29 Apr 2013 | 8 comments

Netbackup 7504, Windows 2008r2 master and media servers.

 

Hello all,

I have two Linux clients that started failing out with a 41, and later, 24.

 

****History -

These were running fine. The Networking team moved these clients, and a few others, to a new switch, and they were still fine. Later, they moved a few clients (including these two) back to the original switch, which is when this issue started.

 

****What I have tried -

1) I spoke with the networking team. They told me they connected these back to the original ports, and they see no communication issues between the clients, media, or master servers.

2) I went under Host propeties>clients, and right-clicked on both clients, and hit connect. They immediately connected, and I can browse the client properties with no issues.

3) Created a test policy with just one of the two clients (noc-edi-102), and changed the backup selection from all_local_drives to /etc, kicked off a backup....it failed.

4) After upping all needed verbosity in the logging, I was seeing timeout errors, so I upped the client read timeout from 5 minutes/300 seconds to 20 minutes/1200 seconds on both the client and the media server and also upped the client connect timeout on the media server to 20 minutes and retried. ****This is when the error went from status 41, and is now 24.

5) Ran bpclntcmd -hn between master, media, and client, and all resolved, forward and reverse.

6) I am already also working with my technical support vendor on this issue, but they are drawing a blank so far. They thought they saw an issue with timing out at a particular path, so they had me redirect the backup selection to a location that doesnt include that path. Same error.

7) Confirmed with Unix team that no firewall settings have been touched on these boxes.

8) Added client ip, and name (both long and short name) to host file on master and media servers.

9) Even tried pointing directly to tape, rather than disk, because I'm grasping at straws.  

****Reporting

Job detail status -

4/27/2013 7:00:00 PM - requesting resource PDCDD_SU_1
4/27/2013 7:00:00 PM - requesting resource pdc00nbua801w.ohlogistics.com.NBU_CLIENT.MAXJOBS.noc-edi-102.ohlogistics.com
4/27/2013 7:00:00 PM - requesting resource pdc00nbua801w.ohlogistics.com.NBU_POLICY.MAXJOBS.PDC_NOC-EDI-102
4/27/2013 7:00:00 PM - awaiting resource PDCDD_SU_1 - Maximum job count has been reached for the storage unit
4/28/2013 12:26:32 AM - granted resource pdc00nbua801w.ohlogistics.com.NBU_CLIENT.MAXJOBS.noc-edi-102.ohlogistics.com
4/28/2013 12:26:32 AM - granted resource pdc00nbua801w.ohlogistics.com.NBU_POLICY.MAXJOBS.PDC_NOC-EDI-102
4/28/2013 12:26:32 AM - granted resource MediaID=@aaaae;DiskVolume=PDCDisk2;DiskPool=PDCDD_DP;Path=PDCDisk2;StorageServer=pdc00ddma901;MediaServer=pdc00nbua802w
4/28/2013 12:26:32 AM - granted resource PDCDD_SU_1
4/28/2013 12:26:32 AM - estimated 0 Kbytes needed
4/28/2013 12:26:32 AM - Info nbjm(pid=4532) started backup (backupid=noc-edi-102.ohlogistics.com_1367126792) job for client noc-edi-102.ohlogistics.com, policy PDC_NOC-EDI-102, schedule Full on storage unit PDCDD_SU_1
4/28/2013 12:26:33 AM - started process bpbrm (1368)
4/28/2013 12:26:36 AM - Info bpbrm(pid=1368) noc-edi-102.ohlogistics.com is the host to backup data from    
4/28/2013 12:26:36 AM - Info bpbrm(pid=1368) reading file list from client       
4/28/2013 12:26:37 AM - connecting
4/28/2013 12:26:42 AM - Info bpbrm(pid=1368) starting bpbkar32 on client        
4/28/2013 12:26:42 AM - Info bpbkar32(pid=0) Backup started          
4/28/2013 12:26:42 AM - Info bptm(pid=1752) start           
4/28/2013 12:26:42 AM - Info bptm(pid=1752) using 1048576 data buffer size       
4/28/2013 12:26:42 AM - Info bptm(pid=1752) setting receive network buffer to 1048576 bytes     
4/28/2013 12:26:42 AM - Info bptm(pid=1752) using 128 data buffers        
4/28/2013 12:26:43 AM - connected; connect time: 00:00:06
4/28/2013 12:26:47 AM - Info bptm(pid=1752) start backup          
4/28/2013 12:26:48 AM - Info bptm(pid=1752) backup child process is pid 8180.2272      
4/28/2013 12:26:48 AM - Info bptm(pid=8180) start           
4/28/2013 12:26:48 AM - begin writing
4/28/2013 12:42:33 AM - Error bpbrm(pid=1368) from client noc-edi-102.ohlogistics.com: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
4/28/2013 12:42:42 AM - Error bpbrm(pid=1368) cannot send mail to etyree@ohl.com,tsmith@ohl.com       
4/28/2013 12:42:43 AM - end writing; write time: 00:15:55
socket write failed(24)

****Bpbkar from client

15:23:43.788 [30956] <4> bpbkar PrintFile: /etc/minicom.users

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - cwd = /etc

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - path = modprobe.conf

15:23:43.788 [30956] <4> bpbkar PrintFile: /etc/modprobe.conf

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - cwd = /etc

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - path = modprobe.conf.dist

15:23:43.788 [30956] <4> bpbkar PrintFile: /etc/modprobe.conf.dist

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - cwd = /etc

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - path = modprobe.conf~

15:23:43.788 [30956] <4> bpbkar PrintFile: /etc/modprobe.conf~

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - cwd = /etc

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - path = motd

15:23:43.788 [30956] <4> bpbkar PrintFile: /etc/motd

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - cwd = /etc

15:23:43.788 [30956] <2> bpbkar SelectFile: INF - path = mtab

15:39:37.051 [30956] <16> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110: Connection timed out

15:39:37.051 [30956] <16> bpbkar Exit: ERR - bpbkar FATAL exit status = 24: socket write failed

15:39:37.051 [30956] <4> bpbkar Exit: INF - EXIT STATUS 24: socket write failed

15:39:37.051 [30956] <2> bpbkar Exit: INF - Close of stdout complete

15:39:37.051 [30956] <4> bpbkar Exit: INF - setenv FINISHED=0

 

Please let me know what other information I can provide. Ive been working on this for a while, so its possible I may have left out something that I tried.

 

 

Thanks all,

 

Todd

Operating Systems:

Comments 8 CommentsJump to latest comment

Dyneshia's picture

Did you reboot the client after they moved back to the orginal switch ?

Toddman214's picture

No. I confirmed with our Unix admins that these were not rebooted after the move, but would a Linux box even need a rebooted after a reconnection to another switch? Netbackup services have, however, been bounced since then. Oh, yeah, thats another thing we tried.....bouncing Netbackup services on the client.

Dyneshia's picture

HelloToddman214 , were you able to have the admin reboot ?

 

Will Restore's picture

>> flush_archive(): ERR - Cannot write to STDOUT. Errno = 110

 

 

  Article URL http://www.symantec.com/docs/TECH74690

Disable Ipv6  ...

A restart of the network or a reboot of the server is required

Will Restore -- where there is a Will there is a way

LucSkywalker1957's picture

Linux/Unix is very sensitive to network changes. I've seen Linux machines panic from a change like that once in a while. The best way to rule it out is to take the outage and reboot the servers. If you still experience the problem after you reboot the clients did you make any changes in your backup environment?

Toddman214's picture

I have had our Unix admin reboot both boxes, and disable ipv6. I'm still getting these error 24 failures,

5/14/2013 8:26:28 PM - Info bptm(pid=8164) backup child process is pid 7140.7216      
5/14/2013 8:26:28 PM - Info bptm(pid=7140) start           
5/14/2013 8:26:28 PM - begin writing
5/14/2013 8:42:17 PM - Error bpbrm(pid=3112) from client noc-edi-xxx.xxxogistics.com: ERR - Cannot write to STDOUT. Errno = 110: Connection timed out
5/14/2013 8:42:26 PM - Error bpbrm(pid=3112) cannot send mail to xxxxxxx       
5/14/2013 8:42:27 PM - end writing; write time: 00:15:59
socket write failed(24)

RonCaplinger's picture

Looks like you may be using MSDP/Puredisk.  From my recent experience with Linux media servers, be sure that TCP offloading is (still) disabled on the clients and the media servers.  There is also a Ring Buffer parameter in TCP setup on Linux hosts (not sure what the name is); be sure that is set to the max.  You *do* need to reboot the Linux client for these changes to take effect, even if your system doesn't tell you to.

Michael G Andersen's picture

If it is still a problem, I would try to manually delete the host cache on client, media, master

If it has been solved please tell what the solution was