Video Screencast Help

Slow Filesystem Backup Performance after Upgrading to 7.1

Created: 22 Mar 2011 • Updated: 25 Mar 2011 | 18 comments
trahn's picture
This issue has been solved. See solution.

 

Ladies and Gents,
 
I am facing strange filesystem backup performance issues after upgrading to NetBackup 7.1 and wonder if anyone has experienced a similar behavior.
 
Customer setup is
- master on RHEL 5.4 x86_64, upgraded from 6.5.5 to 7.1
- a couple of media servers on RHEL 5.4 x86_64, upgraded from 6.5.5 to 7.1
- a couple of media servers on SPARC Solaris 10, mixed versions 6.5.5 and 7.1
- Solaris, Linux and Windows clients with version 6.5.5 and only a few with 7.1
 
After upgrading the master and most of the media servers filesystem backup performance over LAN dropped below 1MB/sec for full and diffincr backups. Prior to the upgrade we were seeing write performance rates of 10 to 30MB/sec.
 
Now comes the interesting part:
- On the same client that is showing the poor filesystem backup performance.
  - Restore performance is good at around 30MB/sec
  - Oracle RMAN Backup (Archive Jobs) are writing with around 30MB/sec (on the same client!)
 
Extensive testing showed
 - SAN Backups are doing fine.
 - NDMP Backup with LAN transport are doing fine as well.
 - The poor filesystem LAN backup performance is not limited to a single client of platform (Linux, Solaris and Windows, 6.5.5 and 7.1 are all affected)
 - The same poor filesystem backup performance can be observed when writing to a  7.1 media server as well as to a 6.5.5 media server
 - The same poor filesystem backup performance can be observed when writing to disk (Advanced Disk) or tape
 - We've tested the network throughput and didn't find any glitches
 - We didn't test the client disk read performance but then Oracle backup jobs are doing fine on the same system anyway (see above)
 - Changing the NET_BUFFER_SZ values didn't change anything at all
 - NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS are not set and hence on their default values. Changing those parameters for testing purposes didn't change anything either.
 - I can't see any hints, let alone errors, in the bpbkar and bptm logs.
 
 Other customers that we've upgraded to 7.1 don't see any performance issues but they are running Windows master servers.
 
 Anyone here with a RHEL master on 7.1 who can or cannot confirm a similar behavior?
 
 Am I missing something obvious?
 
 Any help is greatly appreciated.
 
 Kind regards,
 Thomas

Comments 18 CommentsJump to latest comment

Marianne's picture

This statement surely excludes 7.1 as the cause:

"The same poor filesystem backup performance can be observed when writing to a  7.1 media server as well as to a 6.5.5 media server"

We assume that 6.5 media servers are only backing up 6.5 clients?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

trahn's picture

Hello Marianne,

You are right, it is probably not the NetBackup 7.1 media servers causing the behavior as writing to a 6.5 media server storag unit is also slow.

And yes we tested with a 6.5 client when writing to a 6.5 media server.

Could it be the 7.1 master? I suspected it could be something about writing the meta data into the image catalog. But then all other backup types should be slow as well, shouldn't they?

Thomas

Marianne's picture

Backup performance is determined by data transfer between bpbkar on client and bptm on media server. Check bptm log at the end of the backup for 'waited for full/empty buffers'. bpbrm on media server sends meta-data to bpdbm on master. Verbose logging of bpbrm should tell you how often metadata is sent.

I have found that a lot of users look at their backups much closer after an upgrade and only notice performance issues then. If you search this forum or Google, you will find performance complaints after EACH new NetBackup release...

Another thought - can we safely assume that filesystem backups are reading from different filesystems as database backups? Please do yourself a favour and perform all the 'usual' performance testing - use bpbkar on client to backup to /dev/null, and if read performance is acceptable, ftp a fairly large file on the same filesystem to the media server.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

trahn's picture

I have looked for the empty/full data buffers waits and delays and can influence those by tuning NET_BUFFER_SZ. However this doesn't seem to have any impact on the backup performance in my case. So I assume that something else is throttling the filesystem backups.

You are right about the filesystems. Oracle backups are reading data from SAN LUNs while all the regular filesystem backups get their data from internal disks.

One more thing to notice: performance drops even more, if more than one client is doing a standard filesystem backup. Backup rates drop to ridiculous 6K/sec.

And yes, I've checked the backup performance of standard policies that have run in the past. As I mentioned we have never ever seen rates below 10M/sec. It is not just a vague feeling. :)

We are now upgrading the remaining media servers to 7.1. I heard about similar cases opened with Symantec support with no definite solution. But it seems that the problems vanished into thin air after all the media servers had been upgraded.

I will keep you updated.

Cheers,

Thomas

 

Marianne's picture

What are you seeing in bptm? Waiting for full or waiting for empty buffers? Tweaking buffer sizes will only help if bptm is waiting for empty buffers (i.e. data is received fast enough but cannot be written away quick enough). If bptm is waiting for full buffers, the data is simply not getting to bptm fast enough.

Please do the bpbkar test to backup a filesystem to /dev/null. See this TN for Windows test procedure: http://www.symantec.com/docs/TECH17541 or else Planning and Performance Guide  http://www.symantec.com/docs/TECH62317  and if acceptable read speed is seen, ftp to media server.

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Dikdi's picture

Hi,

 

I think this is not a problem of NetBackup. Performance tunning parameters may not be of any help in this. Problem may be with file system you are backing up.

 

I have seen File System backup with more than 50 MBPS in NBU 65...

 

Check the file system, how many files are there? what is the file size (average)?

If you are backing up millions of small files, then you will never get performance.

Checkout for fragmentation on file system, get it defraged, this can also help increasing the backup speed.

Checkout for compression/encryption is enabled on file system....

 

Check out other methods of backing up file system like raw disk backup - snapshot client - flash backup may be a good option.

 

Deepak

trahn's picture

Please keep in mind that we never had filesystem performance problems before upgrading to 7.1.

We are seeing performce problems for all(!) clients and filesystems when doing standard backups.

For testing purposes we were backing up "/usr" which I think is rather not a candidate for flashbackup.

Marianne, I have not tested bpbkar throughput locally (/dev/null) as all the clients are showing the same weak performance. If the problem still persists after upgrading all media servers I will give it a try though.

Best regards,

Thomas

therockyb's picture

HI,

I upgraded today my 3 media server and my master server from 7.0.1 to 7.1, they are all running W2K8R2 X64.

I have the exact same problem that thran his having, all my backup are now running at around 150 KB/Sec (from 20K to 50K)

I made no other modification on my systems or my networks, it is affecting my 3 sites (in 3 different continent) and they are all using different LTO drives (LTO 2-4 and 5) with different interface (SCSI, SAS, FC)

Something his going on here with this update... I will call technical support first thing tomorrow morning and I will follow-up if they have any idea what is going on.

trahn's picture

Hello therockyb,

if you open up a case with Symantec would you mind passing me the case-id? I would give it to the case owner working on my incident and try to get those two cases connected. This might speed up the progress on this case.

If you'd rather send me the case-id by PM, please send it to trahn[at]@anykey[dot]de.

BTW: Upgrading all Media Servers to 7.1 didn't help.

Cheers,

Thomas

therockyb's picture

Thomas,

I have opened a case more than 10 hours ago and I still can't get technical support to call me back... apparently 7.1 generated a storm of call for support.

I will send you my case number in PM.

trahn's picture
It seems that I've found the cause, probably a bug. I need to check a
little bit further before I post it here. Still possible that I am
mistaken. I will keep you updated.

Regards,
Thomas
therockyb's picture

Hi Thomas,

Any development in your investiguation?

Today I tried a different mix of client, Windows, Linux 6.5, 7.0, 7.0.1, 7.1

I have the same problem with every client version.

jsouza's picture

I'm having performance problem with 7.0 version using FlashBackup Windows of VMWare 4.1.

I getting 30 to 40 MB/sec using FC and this error in bpbkar and bptm .. I'm using a LTO 4 of HP FC (MSL8048 )

 

BPTM

11:48:56.282 [6004.4904] <2> write_data: waited for full buffer 174 times, delayed 10959 times

BPBKAR

11:48:56.266 AM: [5884.5472] <4> BufferManagerLegacySharedMemory::~BufferManagerLegacySharedMemory(): INF - bpbkar waited 1324 times for empty buffer, delayed 1559 times.

It is a problem or a tuning?

trahn's picture

Hi therockyb,

please check if you've got "Bandwidth" or "Throttle Bandwidth" selections set in "Host Properties -> Master Server".

These are two separated entries in the master server properties and result in two different options in bp.conf or the Windows registry.

1.) LIMIT_BANDWIDTH (for IP ranges)

2.) THROTTLE_BANDWIDTH (for hosts and networks)

Not sure if THROTTLE_BANDWIDHT existed prior to 7.1.

If LIMIT_BANDWIDTH is used to throttle bandwidth for a single host

e.g.

LIMIT_BANDWIDTH = 192.168.235.107 192.168.235.107 100

try to use

THROTTLE_BANDWIDTH = 192.168.235.107 100

Either use the NetBackup Admin Console or the Windows registry. I was testing on Linux though.

Do a

bprdreq -rereadconfig

and rerun a policy to check wether the perfomance problems persist.

Please let me know if this helped.

Regards,

Thomas

trahn's picture

In the NB7.1 release notes on page 74:

"Do not use a LIMIT_BANDWIDTH (IPv4) configuration or a
THROTTLE_BANDWIDTH (IPv6) configuration in the initial release of
NetBackup 7.1. If you use either of these configuration options, you can
encounter a sever performance degradation.
An EEB will be available to fix the issues."

Damn.

SOLUTION
therockyb's picture

 

Thomas,

This is awesome that you found the problem… thank you so much !

You were right, I had two entries in my LIMIT_BANDWITH configuration, I removed them and everything is back to normal.

You have no idea all the problems you just saved me… I really appreciate your help !!!

Yan

trahn's picture

Yan,

glad it worked for you, too.

I just got the reply from support that an eeb is not yet available.

Regards,

Thomas

Marianne's picture

I have marked your post as Solution - thanks for the feedback!

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links