Video Screencast Help
Symantec Appoints Michael A. Brown CEO. Learn more.

Error code 41

Created: 20 Oct 2012 | 8 comments

All of a sudden I'm receiving error code 41 on my Oracle servers, I have increase the time out to 1800, backup will not write, just stays active.

Comments 8 CommentsJump to latest comment

Nicolai's picture

General Symantec recommendation for status code 41: http://www.symantec.com/docs/HOWTO34927

Status 41 is a network connection timed out - which mean the network connection was not answered in the other end. 

Are you able to ping the Oracle Servers ?

And if you are able to ping what does this command return :  bptestbpcd -client xxx -verbose

 

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

Dyneshia's picture

In addtion to the tech note Nicolai referenced, here is some addional information :

A status 41 during backup/restore is 99% of the time caused by network performance and tuning.

1) The first item we tune is the Client read timeout.

During a backup, the NetBackup client agent may not be able to send keepalives to the media server if the read of client data is taking a long time to complete one buffer of data. When this occurs, the media server may timeout after the time specified in the media server's host properties.

With database systems, some clients with very large filesystems / active filesystems / lots of small files may need a value of 1800, 3600, 7200, 10800 or higher.

For your system:

Please login to master and/or media server and increase the following timeout to 3600

In the Admin GUI go to Host Properties => Master Server Host Properties => Timeouts => "Client Read Timeout" -> 3600

Also set the "Client Read Timeout" to 3600 for the client on which the backup fails:
Go to Host Properties => Clients => Properties (of the specific client) => Timeouts

They must match or the restore/backup will use the lower value.

After performing the above steps restart the Netbackup services on the Master server so that these settings take effect.  If you cannot restart Netbackup services, run :

\netbackup\bin\admincmd\bprdreq -rereadconfig

Try the backup again.

 

3)Ensure the NIC and switch ports are set to 100mbps/Full duplex. (Some gigabit adapters can run 10mbps, 100mbps, or 1000mbps and can be hard coded to 1000mbps)

 

4) Do you have a netbackup/NET_BUFFER_SZ file on the master/media or client?
If so what value is in the file?
 

5) Also check kernel parameters of TCP by running following commands on master/media and client.

ndd -get /dev/tcp tcp_recv_hiwat
ndd -get /dev/tcp tcp_xit_hiwat
 

for windows please see : http://www.symantec.com/docs/TECH60844

 Note that any changes will cause a momentary hitch in the network on this box so you don't want to do it if anyone is using it or there are jobs running.

6) NIC teaming

Many network administrators are using NIC teaming to aggregate the bandwidth of their NICs to get higher throughputs in their networks. NIC teaming presents several problems to Netbackup the most common being packets out of order. If you get a high percentage of packets out of order backup performance can be impacted to the point where backups will fail.

From a network design standpoint it makes more sense to use a dedicated network for backing up your servers which has several advantages to NIC teaming. By using even 100 meg NICs at full duplex you should get above 180 meg of throughput and it also removes the Netbackup traffic off the production network. By keeping the backup network at layer 2 using just switches you are able to take advantage of the high speed throughput of LAN switches that are designed to switch close to wire speeds. Also by using non-routable private addressing (following RFC 1918) and not advertising that network within the environment, this can provide a limited amount of security in that the backup network would not be seen throughout the environment.

 If you are using NIC teaming,  please disable NIC teaming and have client go over 1 IP address, then try the backup again.

Here are some other suggestions, however your best bet is above.
 - Update the NIC drivers on both NIC's, making sure both are on the same version
 - Check for update on the NIC Teaming software

 7) For NBU 6.x and ealier check if IPv6 is being used on the master server, media server or client.  If it is, it must be disabled.

8) Once of of the above has been done, run another test backup/restore.  If it fails again, try adding the NOSHM touch file as stated in http://www.symantec.com/docs/TECH29294

Giri_S's picture

Hi Dyneshia,

Thanks for your brilliant post :)

Thanks.

Netbackup Admin (Unix)

Marianne's picture

All of a sudden I'm receiving error code 41 on my Oracle servers..

So, backups have been working fine up now? Has anything changed in NBU?
If nothing in NBU, you need to find out what is happening on RMAN and Oracle level that could cause unresponsive backup.
 

I have increase the time out to 1800

Which timeout have you increased? Client Connect or Client Read Timeout?
Status 41 can be Client Connect Timeout or Client Read Timeout. 
Important to know which timeout we are troubleshooting.
 

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Dyneshia's picture

Glad to help G.S.  Please dont forget to mark the solution :)

Marianne's picture

G.S. is not the user with the problem.....

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

setaylor's picture

Thanks for all of your help, After changing the time out setting, the back stayed active but still would not write to tape. Once I deactived all of my polices and suspend the backups I rebooted the master and media server. I then restarted those backups that was giving error 41, backups completed with out any errors 

Nicolai's picture

And this is where we as Netbackup admin pull our hair out (what's left) and scream at the screen surprise

Thanks for letting us know.

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.