Video Screencast Help
Protect Your POS Environment Against Retail Data Breaches. Learn More.

Netbackup: Flash backups on Zimbra mail servers - Performance improvement

Created: 02 Apr 2013 • Updated: 02 Apr 2013 | 11 comments

Can anybody help me out with my enviornment (details below), facing frequent issues with errors 40, 13, 20, 50 etc.

One Master server, 165 SAN Media Servers, HP VLS 9000 (VTL).

We use flash backups on all 165 servers and run the backup through sepearate policy for each server.

Backups from Monday to Sunday, 6 days Diff Incr and 1 day Full backup. Each has got 2 week retention.

I need help on the setup and improvement in performance.

Please let me know in case any information is required on the same.

Operating Systems:

Comments 11 CommentsJump to latest comment

Mark_Solutions's picture

165 SAN Media Servers - that is quite a few!

So all of these constantly have to check in with the Master Server about their storage unit status etc.

From the error numbers you are getting it is possible that the Master Server is not coping - it may be its port usage is lacking or its memory or the NetBackup Processes are getting overloaded

Let us know the O/S and specification of the Master Server (RAM, Network Speed etc) so that we can advise further

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Anand Avaala's picture

Hi Mark,

Thanks for you response.

Master server OS:
[/usr/openv/lib]# cat /etc/*-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
[/usr/openv/lib]# uname -mrs
Linux 2.6.18-53.1.6.el5 x86_64

RAM: 16GB
Network Speed:
Speed: 1000Mb/s
Duplex: Full
Port: FIBRE

We are using 7.5.3 on Master as well as all SAN Media servers.

 

Anand Avaala's picture

Hi Mark,

Thanks for you response.

Master server OS:
[/usr/openv/lib]# cat /etc/*-release
Red Hat Enterprise Linux Server release 5.1 (Tikanga)
[/usr/openv/lib]# uname -mrs
Linux 2.6.18-53.1.6.el5 x86_64

RAM: 16GB
Network Speed:
Speed: 1000Mb/s
Duplex: Full
Port: FIBRE

We are using 7.5.3 on Master as well as all SAN Media servers.

Mark_Solutions's picture

OK - doesn't look too bad then - so maybe just needs a little tuning to prevent the overload and / or timeouts

First look at keep alive settings - these are typical settings:

# cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
# cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
# cat /proc/sys/net/ipv4/tcp_keepalive_probes
9

If they are at these values then change them as follows:

# echo 510 > /proc/sys/net/ipv4/tcp_keepalive_time
# echo 3 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 3 > /proc/sys/net/ipv4/tcp_keepalive_probes

To keep persistent after a reboot see below – use vi editor:

The changes would be rendered persistent with an addition such as the following to /etc/sysctl.conf

## Keepalive at 8.5 minutes

# start probing for heartbeat after 8.5 idle minutes (default 7200 sec)
net.ipv4.tcp_keepalive_time=510

# close connection after 4 unanswered probes (default 9)
net.ipv4.tcp_keepalive_probes=3

# wait 45 seconds for reponse to each probe (default 75
net.ipv4.tcp_keepalive_intvl=3

You don’t need a restart for them to take effect - then run : chkconfig boot.sysctl on
to commit the changes

See if these help to start with - it may need some nbrb tuning (using nbrbutil) to maximise its capabilities - but the Status 50 points towards a possible keep alive issue

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Anand Avaala's picture

Hi Mark,

I have made those changes as you advised above and will monitor the environment for couple of days and get back to you.

However do we need to make any changes to Kernel Semaphores in regards to tune up the performance on the master.

Current Kernel Sem values:
# Syntax of the following paramter:  kernel.sem = SEMMSL SEMMNS SEMOPM SEMMNI
# 4 values defining limits for System V IPC semaphores.
# These fields are, in order:
#     SEMMSL  The maximum semaphores per semaphore set.
#     SEMMNS  A system-wide limit on the number of semaphores in all semaphore sets.
#     SEMOPM  The maximum number of operations that may be specified in a semop(2) call.
#     SEMMNI  A system-wide limit on the maximum number of semaphore identifiers.
kernel.sem = 300 32000 64 1024

 

Because I have read about those values in some of the tech notes from Symantec.

Please suggest me on the same.

Thanks a ton Mark.

Mark_Solutions's picture

Love to help you but this one is out of my scope i am afraid!

If you have found it is a tuning tech note from Symantec then I can't see that it would hurt - but maybe try one thing at a time so that you know what actually does the trick for you

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Anand Avaala's picture

Even after tuning those parameters suggested by you, backups are failing with error 13 and 40.

Please find the detailed description below for both the errors.

4/3/2013 4:09:07 AM - Info bpbkar(pid=9680) 60000 entries sent to bpdbm       
4/3/2013 4:09:07 AM - Error bpbrm(pid=9636) from client sz0066a-util.westchester.pa.mail.comcast.net:  1364951613 755297/      
4/3/2013 4:09:22 AM - Error bpbrm(pid=9636) db_FLISTsend failed: file read failed (13)      
4/3/2013 4:09:24 AM - Error bptm(pid=9694) media manager terminated by parent process      
4/3/2013 4:09:28 AM - Info bpbkar(pid=0) done. status: 13: file read failed      
4/3/2013 4:09:28 AM - end writing; write time: 03:06:07
file read failed(13)

4/3/2013 4:04:14 AM - Info bpbkar(pid=16939) 100000 entries sent to bpdbm       
4/3/2013 4:04:42 AM - Info bpbkar(pid=16939) 105000 entries sent to bpdbm       
4/3/2013 4:05:19 AM - Error bptm(pid=16950) media manager exiting because bpbrm is no longer active   
4/3/2013 4:05:19 AM - Info bpbkar(pid=16939) 110000 entries sent to bpdbm       
4/3/2013 4:05:20 AM - Info bptm(pid=16950) EXITING with status 174 <----------       
network connection broken(40)
 

I suspect its something to do with bpbrm and media manager causing these errors.

In case, if its out of your scope, could you please refer someone who can actually help me out on this.

Thanks for your support by the way.

 

 

 

Mark_Solutions's picture

After how long do they fail?

As for "referring you" - this is an open forum - we are all here to help and do it in our own time, so hopefully someone will see this and can assist further - my only referral would be to advise you to open a support case with Symantec

 

Need to see the full log (please attach as a text file rather than pasting into the thread)

Authorised Symantec Consultant

Don't forget to "Mark as Solution" if someones advice has solved your issue - and please bring back the Thumbs Up!!.

Anand Avaala's picture

Hi Mark/Wiriadi Wangsa,

Sorry for the long delay in replying back to this forum post.

 

We are working with Symantec on this, however we have a particular client where the Incr backup fails with error 13 or 14 and Full backup goes successful without any issues.

I have checked both the below values and they are set to the max.

/usr/openv/netbackup/MAX_FILES_PER_ADD = 100000
/usr/openv/netbackup/bin/DBMto = 30
 
However, INCR backup went fine without any errors when we uninstall NBU in the media server and reinstall it back. And issue occurs after sometime. 
 
Please let me know if you need any logs related to this.
 
Thank you.