Info bpbkar32(pid=3124) done. status: 44: network write failed network write failed(44)
Environment
Veritas Netbackup = 7.1
OS of Netbackup = win2008
Tape Library attached with six drives
Problem
I am doing Catalog backup. While doing backup on Tape Cartridge at around 80% completion the backup got failed.
1/26/2012 10:36:15 AM - Info nbjm(pid=4108) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=143574, request id:{12B409E1-989A-442B-A1EA-8F10361D8B3D})
1/26/2012 10:36:15 AM - requesting resource NBU-Server-hcart-robot-tld-0
1/26/2012 10:36:15 AM - requesting resource NBU-Server.NBU_CLIENT.MAXJOBS.NBU
1/26/2012 10:36:15 AM - requesting resource NBU-Server.NBU_POLICY.MAXJOBS.Catalog_Backup
1/26/2012 10:36:15 AM - awaiting resource NBU-Server-hcart-robot-tld-0 - No drives are available
1/26/2012 10:39:53 AM - Info bpbrm(pid=5288) NBU-Server is the host to backup data from
1/26/2012 10:39:53 AM - Info bpbrm(pid=5288) reading file list from client
1/26/2012 10:39:53 AM - granted resource NBU-Server.NBU_CLIENT.MAXJOBS.NBU
1/26/2012 10:39:53 AM - granted resource NBU-Server.NBU_POLICY.MAXJOBS.Catalog_Backup
1/26/2012 10:39:53 AM - granted resource 0014L5
1/26/2012 10:39:53 AM - granted resource IBM.ULT3580-TD5.002
1/26/2012 10:39:53 AM - granted resource NBU-Server-hcart-robot-tld-0
1/26/2012 10:39:53 AM - estimated 48899691 Kbytes needed
1/26/2012 10:39:53 AM - Info nbjm(pid=4108) started backup job for client NBU-Server, policy Catalog_Backup, schedule Full on storage unit NBU-hcart-robot-tld-0
1/26/2012 10:39:53 AM - started process bpbrm (5288)
1/26/2012 10:39:53 AM - connecting
1/26/2012 10:39:54 AM - Info bpbrm(pid=5288) starting bpbkar32 on client
1/26/2012 10:39:54 AM - connected; connect time: 00:00:01
1/26/2012 10:39:55 AM - Info bpbkar32(pid=3124) Backup started
1/26/2012 10:39:55 AM - Info bptm(pid=960) start
1/26/2012 10:39:55 AM - Info bptm(pid=960) using 65536 data buffer size
1/26/2012 10:39:55 AM - Info bptm(pid=960) setting receive network buffer to 263168 bytes
1/26/2012 10:39:55 AM - Info bptm(pid=960) using 30 data buffers
1/26/2012 10:39:55 AM - Info bptm(pid=960) start backup
1/26/2012 10:39:55 AM - Info bptm(pid=960) Waiting for mount of media id 0014L5 (copy 1) on server NBU-Server.
1/26/2012 10:39:55 AM - mounting 0014L5
1/26/2012 10:40:32 AM - Info bptm(pid=960) media id 0014L5 mounted on drive index 2, drivepath {3,0,4,0}, drivename IBM.ULT3580-TD5.002, copy 1
1/26/2012 10:40:32 AM - mounted; mount time: 00:00:37
1/26/2012 10:40:32 AM - positioning 0014L5 to file 281
1/26/2012 10:41:30 AM - positioned 0014L5; position time: 00:00:58
1/26/2012 10:41:30 AM - begin writing
1/26/2012 10:55:28 AM - Info bptm(pid=960) waited for full buffer 31786 times, delayed 40142 times
1/26/2012 10:55:28 AM - Info bpbkar32(pid=3124) bpbkar waited 10922 times for empty buffer, delayed 11000 times.
1/26/2012 10:55:28 AM - Error bpbrm(pid=5288) db_FLISTsend failed: network write failed (44)
1/26/2012 11:00:28 AM - Error bpbrm(pid=5288) could not send server status message
1/26/2012 11:00:28 AM - end writing; write time: 00:18:58
1/26/2012 11:00:33 AM - Info bpbkar32(pid=3124) done. status: 44: network write failed
network write failed(44)
Comments
Please ensure you check out
Please ensure you check out this post whenstarting new threads.
https://www-secure.symantec.com/connect/blogs/minimum-information-required-when-logging-problem-details
Did this every work ?
If it did, when did it last work
What has changed (if it did work, something HAS changed)
Is the master servre also the media server runnign the backup
Does it fail every time, always at the same point ?
Can you run other similar sized backups from this server, run a test if necessary
From this, the performance looks poor :
1/26/2012 10:55:28 AM - Info bptm(pid=960) waited for full buffer 31786 times, delayed 40142 times
1/26/2012 10:55:28 AM - Info bpbkar32(pid=3124) bpbkar waited 10922 times for empty buffer, delayed 11000
This could well be the main issue :
1/26/2012 11:00:33 AM - Info bpbkar32(pid=3124) done. status: 44: network write failed
network write failed(44)
The main question I guess at the moment, is if the master is acting as the media server.
If the master is the media, then tyhe problem may be a little mopre complex
If the media server is separate from the master server, I would suspect the Network.
Martin
Try to increase Client Read Timeout
This may help...or not.
Authorized Symantec Consultant(ASC) Data Protection in Tokyo, Japan
Try running the backup to
Try running the backup to NULL to test local performance.
http://www.symantec.com/docs/TECH17541
Then move onto checking disk defragmentation.
Check network speed on uplinks and switch ports are matched. (Had a mismatch last week on new client and this fixed it.)
Tip: Get overview/document your NBU environment. Run 'nbsu' and review the output.
• If this provides help, please vote or mark appropriate solution.
Also, please explain what
Also, please explain what troubleshooting steps you have taken, and what you suspect the issue to be.
Apologies, but I am unsure why Accredited members are not posting up some seriously detailed troubleshooting steps, and are not running basic testing before logging a new post.
These lines:
1/26/2012 10:55:28 AM - Info bptm(pid=960) waited for full buffer 31786 times, delayed 40142 times
1/26/2012 10:55:28 AM - Info bpbkar32(pid=3124) bpbkar waited 10922 times for empty buffer, delayed 11000 times. Thanks,
... should show that some performance testing needs to be done, it shows that the issue is most likely between the client and the media server (could be same box) - hence my questions, and Stuarts suggestions.
The questions in here https://www-secure.symantec.com/connect/blogs/minimum-information-required-when-logging-problem-detailsMartin
should be supplied from accredited memebers on every post, without having to be asked for them - I appreciate some questions will not be relevant to every problem, but they are a good guide, and show the level of details needed. If a question is not relevant, think "what other details can I give that may be needed in place of this question".
Thanks,
Martin
Unpatched NBU 7.1? W2008
Unpatched NBU 7.1?
W2008 patch level? Physical resources on master?
From the opening post, The Master seems to be Media server as well - NBU-Server-hcart-robot-tld-0 .... client NBU-Server
There seems to a disconnect between bpbrm and bpdbm:
Error bpbrm(pid=5288) db_FLISTsend failed: network write failed (44)
You will need all relevant logs....
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
bpbrm and bptm at VERBOSE = 5
bpbrm and bptm at VERBOSE = 5 please ...
@ mph999 The same machine is
@ mph999
The same machine is Master and Media Server. (Other Backups of different machine are running fine. And the Catalog bacjup is being backed up from Netbackup Server so the NBU does not seems loaded)
@ Yasuhisa
Client read timeout is 6000
@ Stuart
My Network seems fine and all backups are going good.
=========================
Will share the bpbrm and bptm logs soon
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Thanks all for kind inputs
Thanks all for kind inputs and follow-up on my Post.
I increased the Browse time out to 6000 from 300. The read time out was already 6000 while backup was failing. But after increasing the Browse time out to 6000 from 300 my backup got successful. I triggered the backup again and trying to notice that this may be the cause.
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
OK, I would argue that
OK, I would argue that changing the client browse timeout for a catalog backup on a 'non-busy' (???) server is NOT a fix. It is a workaround that is used to hide the problem caused by some other issue.
The above is likely to be true IF .... the increase in browse timeout required to make it work is quite large, but we need to know the history ...
If this was working at a browse timeout of 300, and if you tested and found that you only needed to increse the browse timeout to say 320, then yes, I would accept that - the system just got a bit bigger ...
If however, you have to increse the timeout to 6000 before it works again, something is very wrong - a system doesn't suddenly require an extra 5700 seconds to complete a 'task' that it could do previously in under 300.
So, I would do further testing to see what the value is for the backup to fail ... for example, reduce the time out value until it fails. If this value is 'close to' 300 then ok, that it probaby ok. If the valuse is high, then you haven't fioxed the problem, you only have hidden it.
Martin - Senior Symantec UK TSE
Yes you are right. That wqas
Yes you are right. That wqas not the solution. Again failed. I opened a case with Symantec and will share the status in the end
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Re-reading the original post
Re-reading the original post it looks like the backup actually completed but then failed to update itself with the final result.
You have said that the system is not busy but it could actually be a port lock out issue causing this.
The internal communication gets its ports locked out so by the time the backup finishes it cannot communicate
It writes for 13 minutes 58 seconds (guess your catalog is not very big?), but then has the failure exactly after 5 minutes.
Four things worth doing here:
1. Add the following registry key to the Master Server:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
DWORD – TcpTimedWaitDelay - Decimal Value of 30
2. From a "run as administrator" from command line
Netsh int ipv4 set dynamicport tcp start=10000 num=50000
This gives it 60000 connections, the default is 16383
3. On the Master Servers Host Properties ensure that in the Timeouts section the Client Connect, Client Read, File Browse and Media Server Connect timeouts are set to 600, just to be sure.
4. Now it could be something to do with bpdbm or bpcd which have hard coded 5 minute timeouts.
Check to make sure you dont have a load of bpdbm processes running. If the system is quiet do a bpdown and see if the bpdbm processes are still running - then do another bpdown and see if they go. If not reboot the Master for a full cleanup - regular process cleanups / reboots are always worth doing.
Also plain 7.1 has issues, especially for bpdbm, and there is an EEB to help overcome this which comes to light for GRT Restore browsing errors. This increases the bpdbm hardcoded timeout and I believe this is included in 7.1.0.3 so I would strongly reccomend patching your Master Server
Hope this helps
Authorised Symantec Consultant
Don't forget to give a "Thumbs Up" or mark as "Solution" if someones advice has helped you.
Why have you opened a case
Why have you opened a case witgh Symantec - you should open the case with either your Network support guys, or whoever supports the operating systems.
This is not a NetBackup issue.
Martin
I have a single machine which
I have a single machine which is Master / Media Server. My Catalog is on the same machine. All other backups of different Server(which included SQL,Flat file backup and Exchange2010 backup) on DSU and Tape Library is going fine and 100% perfect. How this could be the Network error ?
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Perhaps I was a little
1. The exact error
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
How much disk space do you
How much disk space do you have on the drive where your databases are held?
Authorised Symantec Consultant
Don't forget to give a "Thumbs Up" or mark as "Solution" if someones advice has helped you.
Total Space of C Drive is
Total Space of C Drive is 80GB and free space is around 8GB. The Netbackup is installed on default location which is C Drive
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
You are starting to get low
You are starting to get low on disk space
I asked because during the catalog backup the databses are staged which takes up additional space
When EMM detects a shortage of disk space it can shut itself down, which would cause this error during the catalog backup.
This is not supposed to happen until 1 or 2 % free space but I have seen many issues when it gets to 10% as well
See if you can clear down your logs to see how much free space you can make then see if it works
Authorised Symantec Consultant
Don't forget to give a "Thumbs Up" or mark as "Solution" if someones advice has helped you.
Log folder size is around
Log folder size is around 10GB.
( I feel that in future I have to move the Catalog to another partition. Any way to shrink/compact/defragment the Catalog )
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
OK, my error : Try this
OK, my error :
Try this ...
bpimmedia -mediaid ddd
Thanks,
MArtin
I would clear down your logs
I would clear down your logs to free up the disk space and try the catalog backup again.
If that works then I would look at re-locating your catalog files (the db folder can be relocated relatively easy) or as an alternative you can relocate your logs elsewhere to save them using your space up
Give it a try and I will get you the links for moving your db folder and the logs - but lets see if it works first
Authorised Symantec Consultant
Don't forget to give a "Thumbs Up" or mark as "Solution" if someones advice has helped you.
See the below logs. It seems
See the below logs. It seems that the space was out of space. But now my backup is running successful with less space.
For example
When my backup was failing that time the space was around 12GB and now the space is around 9GB and backup is running fine.
See the below logs for reference:
log entry in bpdbm
08:55:27.560 [6420.1388] <2> get_adaptable_string: (4) network read() error: No buffer space available.
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Network buffer space and disk
Network buffer space and disk space is not the same...
Please do not leave logging level at 5 if you are not troubleshooting a specific problem.
Level 5 logs grow large, make your system slow, consume system resources, etc... ONLY increase logging level while troubleshooting specific problems when you have an open case with Symantec. Drop down logging levels to 0 as soon as all necessary logs have been collected.
Level 0 logs are sufficient to troubleshoot day-to-day issues.
Please also check at Windows level for disk fragmentation.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
This couldnt be the problem
This couldnt be the problem ?
08:55:27.560 [6420.1388] <2> get_adaptable_string: (4) network read() error: No buffer space available.
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Yes, that could be causing
Yes, that could be causing the problem.
I was trying to say that it points to a network problem, not disk space.
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
What sort of that buffer
What sort of that buffer could be ?
(4) network read() error: No buffer space available
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
I also pointed out it could
I also pointed out it could be a network problem, but my advice was discounted ...
To answer the latest questions ...
The send /receive buffers on the NIC ...
OK, lets straighten something out ...
If we suggest something, or ask for detail that seems not relevent, please provide it anyway - we are not asking for the fun of of it ...
Just becuase you can see no reason for the network being the issue, does not mean it is not the issue, until it has been PROVED 100%.
A little story to demonstrate ...
I had a case recently, NBU would not mount a tape in the drive, hung when mounting.
Now, I am the first to point out that tape issues are usually not NBU ... but ...
Robtest could load a tape
The operating system could access that tape, and, write to it using tar ... etc ...
Only fault, was NBU not mounting the tape ...
So, even I had to admit it was looking like NBU was the fault ....
Nope, turns out, it was the firmware on the drive(s).
So, you see, no matter what the issue does, or does not look like, it can be the most unexpected thing that casues the problem.
Martin
GOOGLE found some info for
GOOGLE found some info for us.
Firstly a Symantec TechNote with a different status code but the same network buffer error.
The TN is very old, so I cannot tell if it will be applicable to W2008.
http://www.symantec.com/docs/TECH55906
The error is caused when the master server ran out of network buffer space.
Resolution:
Check the boot.ini on the master and verify that the /3GB switch is not used. What the /3GB switch does is allocate 1GB of address space to the Windows operating system and 3GB of address space to user mode processes. This allows Windows to better accommodate demanding applications such as Exchange and SQL Servers.
By only allowing the operating system to allocate 1GB of address space, a limit is also placed on the amount of network buffers your operating system can use. Remove the /3GB entry from your boot.ini and the "No buffer space available" error from the operating system should not reappear.
Second Google find:
http://docs.dal.net/docs/connection.html#6
6 · [10055] No buffer space available
Scenario: Joe wanted to call Mary, but his hands were already full.
This means mIRC is having a problem creating a new a network socket; it cannot use your Internet connection to connect to an IRC server. If you are using a lot of other network applications at the same time, you might get this error. Close some other applications and/or reset your Internet connection to fix this problem. This error also indicates a shortage of resources on your system. It can occur if you're trying to run too many applications (of any kind) simultaneously on your machine. If this tends to occur after running certain applications for a while, it might be a symptom of an application that doesn't return system resources (like memory) properly. It may also indicate you are not closing the applications properly. If it persists, exit Windows or reboot your machine to remedy the problem. You can monitor available memory with Windows Explorer's "Help/About..." command.
Third Google find:
http://msdn.microsoft.com/en-us/library/windows/de...
Windows Sockets Error Codes
No buffer space available.
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
*******************************************************************
So, it seems that you have a resource problem at OS level?
Fellow Connect expert AAlmroth has written an excellent article regarding performance tuning on Windows servers: https://www-secure.symantec.com/connect/articles/t...
Maybe this article will provide some useful info?
Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows.
Handy NBU links
Also check out the post I did
Also check out the post I did earlier in the thread to tune your network - did you do these and reboot the server?:
1. Add the following registry key to the Master Server:
HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
DWORD – TcpTimedWaitDelay - Decimal Value of 30
2. From a "run as administrator" from command line
Netsh int ipv4 set dynamicport tcp start=10000 num=50000
This gives it 60000 connections, the default is 16383
Also, what NET_BUFFER_SZ values are you using (if any - \ntebackup\bin\)?
Authorised Symantec Consultant
Don't forget to give a "Thumbs Up" or mark as "Solution" if someones advice has helped you.
mph999 first of all I would
mph999 first of all I would like to pay bundle of thanks in the regard of your kins help and I do value it. If at some place if you feel that your suggesstions are being ignored so sorry for that.
Mark and Marianne let me do this and share the result
Any comment will be appreciated. Mark as Solution if your query is resolved
__________________
Thanks in Advance
Zahid Haseeb
zahidhaseeb.wordpress.com
Would you like to reply?
Login or Register to post your comment.