This issue has been solved.

Backup ends in Status Code 24

Created: 13 Jan 2011 • Updated: 03 Feb 2011
Login to vote
-1 1 Vote

Dear all,

I have upgraded a number of my NetBackup clients from version 6.0 MP7 to version 7.0.1. Since doing this, all of the upgraded clients report status code 24 socket write failed.  Sometimes this error appears 8 times during a backup, other times it might only appear once.

I did raise a support ticket with Symantec who advised me that this is a networking issue and that I need to get my network engineer to look at it, I was also advised to disable the TCP Chimney offload as per one of the tecnotes.

Unfortunately, both the Master server and some of the clients are connected to the network using LACP port channels, if you disable the TCP offload, then the LACP bond does not get created, so I cannot disable the TCP offload

I also asked my network engineer to take a look at it, he went on the LAN switch and noticed a high number of output packet drops were being recorded on the network interfaces for both the Master server and clients.

What is curious is that this output packet drop does not appear for servers that have the NetBackup 6.0 client installed - only the ones upgraded to version 7.0

According to Cisco's documentation, output packet drop happens when the NIC is overloaded so the switch cannot process the packets fast enough.  Cisco recommend reducing the bandwidth used to resolve this.

Has anyone else seen this issue?  If so, how did you resolve it?

Many thanks,

Richard.

Quick Look Solution

Finaly cracked it

The problem was being caused by there not being enough memory on the Master Server to receive the incomming packets.  We could see that by doing a netstat -i which showed a high receive packet drop.  This corresponds to the output packed drop seen on the cisco switches.

The problem was resolved by setting the /usr/openv/netbackup/NET_BUFFER_SZ to 262144 now it all works properly

Filed Under

Comments

Rajesh_s
Certified
13
Jan
2011

I think you maight have

I think you maight have aleady verifed it , check the network setting on your network card . It is recommaned to use Full duplex settings.

 Rajesh 
13
Jan
2011

If this is just on Windows clients, may be NIC drivers

I can't find it right at the moment, but I just saw a Symantec Tech Note that referenced an issue with older Broadcom-based NIC drivers that encounter this issue.  they recommend an upgrade of the NIC drivers; we seem to be about 4 major revisions behind on at least one of our clients that is encountering this status 24 problem.

14
Jan
2011

Cannot change NIC settings

Dear all,

Many thanks for your help.  Unfortunately, I have not been able to make the changes you recommend for the following reasons:

1. To use a LACP port channel, the TCP offload must be enabled

2. It is not possible to force a network card to 1GB full duplex , the 802.3 specification requires that autonegotiation is used.  If you try to force it, the switch ports reject it which is the correct behaviour

3. Some of the servers experiencing the problem with status code 24 are VMware guest virtual machines.  The VMware gues NIC driver (an AMD one) does not allow you to set the NIC card speed and duplex.

However, on the servers that are worst affected by this problem, I have uninstalled NetBackup 7.0.1 and re-installed NetBackup 6.0 MP7.  I will see if this makes a difference over the weekend when the backups run to both the backup speed and also the "output packet drops" recorded on the cisco switch.

Many thanks.

Rajesh_s
Certified
14
Jan
2011

HI Richard, You are facing

HI Richard,

You are facing issue with VMware Guest OS or with Physical machines .

Can i know the guest OS type and also physical machine too.

For VM's ,

Hope VMware tools are installed on the guest OS and its updated to latest version. Please re-check the drives once again.

Try to run a single client manual backup and check whether in this situation also you are facing issue (mean while on the client side verify the n/w utilization)

Regards,

Rajesh

 

 

 Rajesh 
14
Jan
2011

Both physical and VM

Hi Rajesh,

I see the problem with both physical servers and with Virtual Machines.

All servers both physical and virtual where I see this problem are running either Windows 2003 32bit or Windows 2008 64bit.  All Virtual Machines have the VMware tools installed and are running on ESX 4.1.

I suspect that the problem may be with the LACP port channel I have on the Master Server.  I will run this weekends backups and the test you suggest.  If this does not work, I will remove the LACP port channel and use a different method of load balancing on the NIC ports.

Regards,

Richard.

Rajesh_s
Certified
14
Jan
2011

HI Richard,One more question

HI Richard,

One more question here , how exactly are you taking backup . Is your master server is also a media server or is there any other media server which backups up ur clients.

 

Exclude the system state backup and try .

Regards,

Rajesh

 

 Rajesh 
21
Jan
2011

The Master Server is also a Media Server

Hi Rajesh,

The Master Server is also a Media server.  The backups are taken directly to tape (LTO3) in a Sun StorageTek SL500 tape library using HP LTO3 tape drives.

The System State is not being backed up on any of the clients showing the problem.

On some of the clients that have this problem, I removed NetBackup 7.0.1 and re-installed NetBackup 6.0 MP7.  When I did this, the problem went away, no more status code 24.  Some of the clients with this problem are virtual machines, others are physical servers.

I have also removed the LACP port channel on the Master/Media server, this did not work either, I still get status code 24.

My network engineer has looked at the switch ports on the Cisco switch and has noticed a high level of output packet drops.  He has stated this is caused by the server sending out too many packets that the switch can't handle.

The only thing I can think of now is to replace the network cards in the Master/Media server with a different brand.  It is a HP DL385 server using the built in NICs which I believe are broadcom ones, I will put in a PCI card with Intel ones and see if that fixes it.

Regards.

Richard.

03
Feb
2011
SOLUTION

Finaly cracked it

The problem was being caused by there not being enough memory on the Master Server to receive the incomming packets.  We could see that by doing a netstat -i which showed a high receive packet drop.  This corresponds to the output packed drop seen on the cisco switches.

The problem was resolved by setting the /usr/openv/netbackup/NET_BUFFER_SZ to 262144 now it all works properly