Clients failing backups with status code 24 errors

Article:TECH145234  |  Created: 2010-11-30  |  Updated: 2014-05-18  |  Article URL http://www.symantec.com/docs/TECH145234
Article Type
Technical Solution

Product(s)

Environment

Issue



Clients fails backups of large amounts of data producing a status 24 error code.


Error



 

From the client’s bpbkar log-   
 
1:23:06.352 PM: [3348.488] <32> TransporterRemote::write[2](): FTL - SocketWriteException: send() call failed, could not write data to the socket, possible broken connection.
1:23:06.352 PM: [3348.488] <16> NBUException::traceException(): (
An Exception of type [Symantec::NetBackup::Ncf::OperationFailedException] was thrown. Details about the exception follow...:
Error code  = (-1008).
Src file    = (d:\653\src\cl\clientpc\util\tar_tfi.cpp).
Src Line    = (275).
Description = (%s getBuffer operation failed).
Operation type=().
)
1:23:06.352 PM: [3348.488] <16> NBUException::traceException(): (
An Exception of type [Symantec::NetBackup::Ncf::SocketWriteException] was thrown. Details about the exception follow...:
Error code  = (-1027).
Src file    = (TransporterRemote.cpp).
Src Line    = (310).
Description = (send() call failed, could not write data to the socket, possible broken connection).
Local IP=(). Remote IP=(). Remote Port No.=(0).
No. of bytes to write=(32768) while No. of bytes written=(0).
)
1:23:06.352 PM: [3348.488] <4> tar_base::V_vTarMsgW: INF - tar message received from tar_backup_tfi::processException
1:23:06.352 PM: [3348.488] <2> tar_base::V_vTarMsgW: FTL - socket write failed

Environment



Netbackup media server, client and master server separated by a Firewall, router or any device known to implement a tcp idle timeout.


Cause



Idle timeout setting on the firewall/router is too low to allow backups or restores to complete for large amount of data. Smaller backups may work okay.


Solution



The issue was with the idle timeout setting on the firewall. When the DMZ media server backups a large amounts of data from a DMZ client it is only sending the occasional meta data updates back to the master server to update the images catalog.

 
If the firewall idle timeout setting is set too short to allow backups and restores to complete, the firewall can break the connection between the DMZ media server and the master server. The DMZ media server then breaks the connection to the client thus producing the status 24 or 40 type of errors.
 
Increasing the firewalls idle timeout setting to be longer than the amount of time it take to complete the backup or restore should resolve the issue.
 
Another workaround may be to increase the frequency of the TCP keepalive packets sent from the DMZ media server to help maintain the TCP socket during idle periods.
 
# Solaris : ndd -get /dev/tcp tcp_keepalive_interval              
# HP-UX   : ndd -get /dev/tcp tcp_keepalive_interval
# AIX     : no -a | grep keepintvl                                                 : no -a | grep keepidle                                             
Windows servers-
 
Value Name: KeepAliveTime
Key: Tcpip\Parameters
Value Type: REG_DWORD-time in milliseconds
Valid Range: 1-0xFFFFFFFF
Default: 7,200,000 (two hours)
Recommendation: 300,000
Description: The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission. Keep-alive packets are not sent by default. This feature may be enabled on a connection by an application.



Article URL http://www.symantec.com/docs/TECH145234


Terms of use for this information are found in Legal Notices