Video Screencast Help
Protect Your POS Environment Against Retail Data Breaches. Learn More.

Netbackup 7.x and media read errors - Are the basics done ?

Created: 04 Oct 2013 • Updated: 07 Oct 2013 | 2 comments
Language Translations
Speedy1205's picture
+2 2 Votes
Login to vote

This Article is only for Netbackup 7.5 version which uses Linux as operation system for example SLES 11 or RHEL.

When you using the SLP or Vault Feature in NetBackup you can see a lot of issues like those:

failed, media read error (85).
cannot position to correct image (94)
with error 84 (media write error)

In the last week / month I had a lot of the above issue and tried to look in each of them. From the first view it looks like those errors are not really related or will have a same simple resolution, but they does. 

Please be aware that I don’t want to tell that the below mentioned Steps are always resolving this issues, but this is a good Step to start. All those settings are related to the OS in this Case Linux and should help in big Environments to resolve issue like above, when it is seen very often. Those values can be setup different on each master server, where you should fine the best settings for you Environment.

So the file we are talking is the /etc/security/limits.conf which comes by a SLES installation without any values as they are all uncommented. This can dependence again on which Linux distribution you are using.

So this will give us the default values like for example mine:

ulimit -a

core file size        

(blocks, -c) 1

data seg size          

(kbytes, -d) unlimited

scheduling priority    

 (-e) 0

file size             

(blocks, -f) unlimited

pending signals            

(-i) 30558

max locked memory  

(kbytes, -l) 64

max memory size        

(kbytes, -m) 3334548

open files                     

(-n) 1024

pipe size           

(512 bytes, -p) 8

POSIX message queues

(bytes, -q) 819200

real-time priority

(-r) 0

stack size

(kbytes, -s) 8192

cpu time

(seconds, -t) unlimited

max user processes

(-u) 30558

virtual memory

(kbytes, -v) 4821040

file locks

(-x) unlimited

The open files value is definitive too small for a big Environment and can cause all the above issues. The Master Server is the main part in the Backup Environment and depending on the size of the Environment (Media Server, Clients etc.) the limit from 1000 will be reached very fast. As for me the Errors I got always looked different or I used different settings (vault and SLP) I didn’t believe that this will solve all of those issue. But I gave it a try and used the Symantec recommended values from the following Technote:

Example Operating System Tuning Values for Linux Master server running 7.x
http://www.symantec.com/business/support/index?page=content&id=TECH167095

After I had this values and rebooted the server all the above errors was resolved. For sure it will not resolve those issues in all environment, but it is an easy way to check, before starting the deep troubleshooting.

I hope this helps !

Comments 2 CommentsJump to latest comment

thesanman's picture

Thanks for this!

I have been struggeling with seemingly randon tape media read or write failures for 6+ months now during my weekend vault process.

A re-run would mostly always work on the 1st or 2nd attempt.

I couldn't see a pattern to either the drive(s) or the media involved; neither to the OS patches or the NBU version.

I had applied the sysctl changes years ago; but what you have pointed me at was the ulimit values which I applied across my multiple RHEL Linux Media servers and the Master server; in particular open files.

I have only had one weekend run since making the changes but no failures seen!  Time will tell but so far a big thumbs up from me!

Thanks.

NBU v7.5.0.6 Master and Media servers on RHEL 5/6 & Win2008; SAN based LTO3, 4 and 6 tape libraries
Linux, Solaris, Windows and OpenVMS clients.
PureDisk, SLP, VMware, HyperV, Oracle, Netezza, SQL/Server,

0
Login to vote
Speedy1205's picture

Hi thesanman,

glad it helped you and pointed you in the direction to go. I just wrote it up as I had the same issue as you and it went a long time with this issues and then this small change fixed all the issue seen.

Just want that not all the community have to work so hard to figure out what’s going especially when it’s a kind of random media etc. It maybe not fixing all those issues for all users, but it’s a good step in good direction.

Thanks for the Feedback!

 

0
Login to vote