Video Screencast Help

Backup failing with status 40

Created: 04 Apr 2011 • Updated: 31 Oct 2013 | 13 comments
This issue has been solved. See solution.

After writing some data backup failing with status 40

please see the Detailed Status

Apr 2, 2011 10:52:52 PM - requesting resource ptldexnbu001-hcart2-robot-tld-0
Apr 2, 2011 10:52:52 PM - requesting resource memcf-master.NBU_CLIENT.MAXJOBS.exptldarc01
Apr 2, 2011 10:52:52 PM - requesting resource memcf-master.NBU_POLICY.MAXJOBS.Toledo-Windows-Production_4
Apr 2, 2011 10:53:02 PM - granted resource  memcf-master.NBU_CLIENT.MAXJOBS.exptldarc01
Apr 2, 2011 10:53:02 PM - granted resource  memcf-master.NBU_POLICY.MAXJOBS.Toledo-Windows-Production_4
Apr 2, 2011 10:53:02 PM - granted resource  XC0323
Apr 2, 2011 10:53:02 PM - granted resource  R10_F1_D1
Apr 2, 2011 10:53:02 PM - granted resource  ptldexnbu001-hcart2-robot-tld-0
Apr 2, 2011 10:53:04 PM - estimated 0 kbytes needed
Apr 2, 2011 10:53:53 PM - connecting
Apr 2, 2011 10:54:02 PM - connected; connect time: 0:00:00
Apr 2, 2011 10:54:02 PM - begin writing
Apr 4, 2011 6:41:56 AM - current media XC0323 complete, requesting next media Any
Apr 4, 2011 6:41:58 AM - current media -- complete, awaiting next media Any. Waiting for resources.
          Reason: Drives are in use, Media server: ptldexnbu001,
          Robot Type(Number): TLD(0), Media ID: N/A, Drive Name: N/A,
          Volume Pool: NetBackup, Storage Unit: ptldexnbu001-hcart2-robot-tld-0, Drive Scan Host: N/A,
          Disk Pool: N/A, Disk Volume: N/A
Apr 4, 2011 6:42:59 AM - granted resource  XC0326
Apr 4, 2011 6:42:59 AM - granted resource  R10_F1_D1
Apr 4, 2011 6:42:59 AM - granted resource  ptldexnbu001-hcart2-robot-tld-0
Apr 4, 2011 6:43:00 AM - mounting XC0326
Apr 4, 2011 6:43:53 AM - mounted XC0326; mount time: 0:00:53
Apr 4, 2011 6:43:53 AM - positioning XC0326 to file 1
Apr 4, 2011 6:44:02 AM - positioned XC0326; position time: 0:00:09
Apr 4, 2011 6:44:02 AM - begin writing
Apr 4, 2011 10:51:50 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:51:58 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:52:20 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:52:40 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:53:00 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:53:30 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:54:04 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:54:45 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:55:17 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:55:43 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:56:16 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:56:42 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:57:09 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 10:57:40 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 11:29:55 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 11:36:23 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 11:47:26 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 4, 2011 11:52:15 PM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:07:28 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:15:38 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:23:15 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:30:01 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:38:16 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:47:01 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:52:20 AM - Error bpbrm (pid=5392) could not write FILE ADDED message to stderr
Apr 5, 2011 12:55:44 AM - Critical bpbrm (pid=5392) from client exptldarc01: FTL - socket write failed
network connection broken  (40)

Comments 13 CommentsJump to latest comment

Yogesh9881's picture

Status code 40 is network connectivity issue.

bpclntcmd -hn <client hostname>

bpclntcmd -ip <client IP> {check from both direction}

bptestbpcd -client <client hostname> (check from both direction)

also check port status

i also requesting you to post your NBU environmets details like

Master server OS , NBU version, backup policy type , client OS etc

 

Regds,

Yogesh

happy to help cool

If this post has helped you, please vote or mark as solution.

Before break-up, make sure you have a good backup.....  ;-)

Zahid.Haseeb's picture

The connection between the client and the server was broken. This status code can also appear if the connection is broken between the master and the media server during a backup.

Try pinging the client from the server. If pinging is not possible, check for loose connections or other network problems.

For more details see the below TN too:

http://www.symantec.com/business/support/index?page=content&id=HOWTO34926

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Marianne's picture

"Error bpbrm (pid=5392) could not write FILE ADDED message to stderr"

We are experiencing the same problem at a customer.

I could see in (San) media server bpbrm log that connection with bpdbm on master (to update file list) was failing.

At the same time, bpbrm connection to nbjm on master (to update job info) was successful.

So, I see the this problem as an inter-process communication failure and not a network problem. If the job is rerun often enough, it eventually goes through. (Customer has enabled checkpoint restart which helps to get the job done eventually...)

The problem was first seen when OS on master server was upgraded to W2008 R2 while NBU version was still 6.5.x. I told the customer that this was not supported (support for W2008 R2 was introduced with 7.0).

The customer has upgraded to 7.0.1 in the meantime, but the problem is still present.

We have logged a Support call on the customer's behalf with no progress whatsoever in about 10 days...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Michael Simcorp's picture

Have seen something similar on windows machines when they exhaust the non-interactive desktop heap

We install a program called dheapmon to verify that is the issue

Regards

Michael

Marianne's picture

Thanks Michael - will definitely recommend this to the customer.

Problem seems to linked to load on the master server. Rerun of backup during the day (no load on master) is always successful.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Michael Simcorp's picture

http://www.symantec.com/business/support/index?page=content&id=TECH48099 might also be helpful

If I remember correct you can also se some pop-ups on the console

Another thing could netcard buffer congestion on the master or media server, know we have increased the Number of RX Descriptors on our netcards in the master/media servers

Regards

Michael

 

Zahid.Haseeb's picture

 

Marianne: Yes this could be one reason that the problem of inter-process communication failure in the above link as i mentioned but i dont think its must
  • This status code may occur if nbjm was unable to connect to bpbrm or to bpmount. Examine the nbjm unified log (originator ID 117) or the bpbrm or the bpmount legacy logs for more detail on the cause of the error.

Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb

zahidhaseeb.wordpress.com

Marianne's picture

Thanks for your clever observation, Zahid.

My memory failed me a bit here - connection to bpjobd (not nbjm) on master was successful, bpdbm connection failed.

Extract from my observations that was submitted with the support call:

All the failures have this in common: Connection to bpdbm (NetBackup Database Management Service) on the Master server fails. Extract from bpbrm:

19:04:37.768 [4968.5680] <2> logconnections: BPDBM CONNECT FROM 10.110.0.1.52623 TO 10.110.8.10.13724
19:04:57.674 [4968.5680] <2> put_strlen_str: cannot write data to network:  An existing connection was forcibly closed by the remote host.
19:04:57.674 [4968.5680] <16> bpbrm main: could not write FILE ADDED message to stderr

 

Connections to bpjobd on the master are fine:

19:13:18.437 [4968.5680] <2> logconnections: BPJOBD CONNECT FROM 10.110.0.1.52758 TO 10.110.8.10.13724
19:13:18.437 [4968.5680] <2> job_authenticate_connection: ignoring VxSS authentication check for now...
19:13:18.437 [4968.5680] <2> job_connect: Connected to the host nbumaster contype 10 jobid <8223> socket <476>
19:13:18.437 [4968.5680] <2> job_connect: Connected on port 52758

 

All the logs are with a senior Symantec engineer. I will let you know when we get feedback.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Frédéric RIGAL's picture

Hi all

I've the same problem on a PDC (W2K3 R2) with NBU 7.1.0.1

Did you have some Symantec Support answers?

Best regards

ASC Symantec Partner

Marianne's picture

The support engineer NEVER came back to us - even my requests for escalation got ignored.

We did not bother to pursue the case after the client rebooted the media server during scheduled downtime. Backups started working again after the reboot.

Mark's explanation below possibly explains why the a reboot seemed to have fix the problem, just to see it returning after a couple of weeks...

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

mpatt's picture

I had a bunch of my backups failing iwth Status 24, 40 or 42, the 40's with the reason

 Error bpbrm (pid=21887) could not write FILE ADDED message to stderr

and 42 with

cannot add fragment to image database, error = network read failed

After many weeks of trying to troubleshoot through very high loads on the Media servers, i doubled my NUMBER_BUFFERS to 128 and Halved my SIZE to 131072 but left NET_BUFFER_SZ to 524288.

Since then i see more 40's in my backup report. Our Master/Media are Solaris 10. As with Marianne's issue earlier, the backups are successful during the day when the load is lower.

 

P.S: bptm log has large number of Wait for EMPTY Buffer than Wait for FULL Buffer, a clear indication of high load on the server inspite of 2 quad core cpus.

Mark_Solutions's picture

Not sure from the above but are your Master / Media Servers Windows based?

If so the memory could be getting exhausted or you could be running out of TCPIP ports (or both)

Try using the following setting in the registry on all Windows Master / Media Servers, reboot after changes for them to take effect:

HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
DWORD – TcpTimedWaitDelay  - Decimal Value of 30
DWORD – MaxUserPort – Decimal Value 65534

W2008 does not use this in same way as 2003 so for 2008 from a command line use:

Netsh int ipv4 set dynamicport tcp start=10000 num=50000

HKLM\System\CurrentControlSet\Control\Session Manager\Memory Management\

DWORD - PoolUsageMaximum  - Decimal value of 40

DWORD - PagedPoolSize Hex value of FFFFFFFF (this is 8 x F)

This will greatly improve memory performance and vastly increase the number of tcpip ports available whilst reducing the network wait times from the default 4 minutes down to 30 seconds, so freeing your ports up.

Hope this helps

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

SOLUTION
mpatt's picture

Solaris 10. I may agree that we are exhausting ports. I even have tcp_time_wait_interval set to just 1000 (1 Sec).

Additionally

 

 

# ndd -get /dev/tcp tcp_conn_req_max_q
256
# ndd -get /dev/tcp tcp_conn_req_max_q0
10000