Video Screencast Help

Slow backup performance

Created: 11 Oct 2012 • Updated: 15 Oct 2012 | 14 comments
This issue has been solved. See solution.

Hi

 

We have extremely slow performance in our new netbackup environment. It cannot do more than 8mb/sec. We don't have many clients running backups on this except few NDMP shares and nbu servers backing up itself.

Here is the info about our environment

Netbackup 7.5

OS : linux 6.2

Data domain

1 master and 2 media servers.

NDMP shares are direct attached to tape drives.

Filer: Sun7410.

We have mounted nfs filesystem from data domain on media server and tried copying 2 GB file and it works fine.

Physically mounted a tape using robtest and tried dd command to send the data to tape and that works great as well.

Tried copying tar files using scp between nbu servers, that is good also.

The network team said everything is good from switch point of view.

I am clueless whereelse to look. The server network settings seems to be good also.

Both san (ndmp) and networks backups gives the same performance which is even more confusing. Any help would be appreciated.

 

Thanks

Sai

Comments 14 CommentsJump to latest comment

Nicolai's picture

Did you configure NUMBER_DATA_BUFFERS and SIZE_DATA_BUFFERS ?

http://www.mass.dk/netbackup/guides/49-netbackup-buffer-tuning.html

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

spenukon's picture

Yes I do have that settings.

 

NUMBER_DATA_BUFFERS - 16

SIZE_DATA_BUFFERS - 262144

mph999's picture

Can you post up the details of a slow job that completes successfully from activity monitor (details tab)

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Marianne's picture

Please also post bptm log on media server that contains start and and of slow job.
(Post log file as file attachment.)

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

spenukon's picture

Hi Mariann

 

Please check the attachment.

 

Thanks

Sai

AttachmentSize
log.101212.doc 401.98 KB
spenukon's picture

Here is the successfull job completion of NDMP backup. The throughput was only 143kb/sec.

 

 

10/12/2012 04:50:39 - Info nbjm (pid=21227) starting backup job (jobid=476) for client mtprdsan1002.pr.mdwr.s.nokia.com, policy sun-ndmp-lbsp_splunk-monthly-prod, schedule daily-differential
10/12/2012 04:50:39 - Info nbjm (pid=21227) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=476, request id:{FCB8BD98-141F-11E2-89EF-96D0EC7673EE})
10/12/2012 04:50:39 - requesting resource mtprdsan6010-hcart-robot-tld-0-mtprdsan1002
10/12/2012 04:50:39 - requesting resource mtprdsan6009.atl.stor.s.nokia.com.NBU_CLIENT.MAXJOBS.mtprdsan1002.pr.mdwr.s.nokia.com
10/12/2012 04:50:39 - requesting resource mtprdsan6009.atl.stor.s.nokia.com.NBU_POLICY.MAXJOBS.sun-ndmp-lbsp_splunk-monthly-prod
10/12/2012 04:50:39 - Waiting for scan drive stop HP.ULTRIUM4-SCSI.000, Media server: mtprdsan6010.atl.stor.s.nokia.com
10/12/2012 04:51:10 - Info bpbrm (pid=7525) mtprdsan1002.pr.mdwr.s.nokia.com is the host to backup data from
10/12/2012 04:51:10 - Info bpbrm (pid=7525) reading file list from client
10/12/2012 04:51:10 - Info bpbrm (pid=7525) starting ndmpagent on client
10/12/2012 04:51:10 - granted resource  mtprdsan6009.atl.stor.s.nokia.com.NBU_CLIENT.MAXJOBS.mtprdsan1002.pr.mdwr.s.nokia.com
10/12/2012 04:51:10 - granted resource  mtprdsan6009.atl.stor.s.nokia.com.NBU_POLICY.MAXJOBS.sun-ndmp-lbsp_splunk-monthly-prod
10/12/2012 04:51:10 - granted resource  200197
10/12/2012 04:51:10 - granted resource  HP.ULTRIUM4-SCSI.000
10/12/2012 04:51:10 - granted resource  mtprdsan6010-hcart-robot-tld-0-mtprdsan1002
10/12/2012 04:51:10 - estimated 1349718373 kbytes needed
10/12/2012 04:51:10 - Info nbjm (pid=21227) started backup (backupid=mtprdsan1002.pr.mdwr.s.nokia.com_1350013870) job for client mtprdsan1002.pr.mdwr.s.nokia.com, policy sun-ndmp-lbsp_splunk-monthly-prod, schedule daily-differential on storage unit mtprdsan6010-hcart-robot-tld-0-mtprdsan1002
10/12/2012 04:51:10 - started process bpbrm (pid=7525)
10/12/2012 04:51:10 - connecting
10/12/2012 04:51:10 - connected; connect time: 0:00:00
10/12/2012 04:51:11 - Info ndmpagent (pid=7527) Backup started
10/12/2012 04:51:11 - Info bpbrm (pid=7525) bptm pid: 7528
10/12/2012 04:51:11 - Info bptm (pid=7528) start
10/12/2012 04:51:11 - Info bptm (pid=7528) using 30 data buffers
10/12/2012 04:51:11 - Info bptm (pid=7528) using 1024 data buffer size
10/12/2012 04:51:11 - Info bptm (pid=7528) start backup
10/12/2012 04:51:11 - Info bptm (pid=7528) Waiting for mount of media id 200197 (copy 1) on server mtprdsan6010.atl.stor.s.nokia.com.
10/12/2012 04:51:11 - mounting 200197
10/12/2012 04:52:05 - Info bptm (pid=7528) media id 200197 mounted on drive index 0, drivepath /dev/rmt/0n, drivename HP.ULTRIUM4-SCSI.000, copy 1
10/12/2012 04:52:05 - mounted 200197; mount time: 0:00:54
10/12/2012 04:52:10 - positioning 200197 to file 11
10/12/2012 04:54:08 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Direct Access Restore information is supported
10/12/2012 04:54:08 - positioned 200197; position time: 0:01:58
10/12/2012 04:54:08 - begin writing
10/12/2012 04:54:09 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Backing up "/export/Splunk".
10/12/2012 04:54:09 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Tape record size: 1024.
10/12/2012 04:54:09 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: File history: N.
10/12/2012 04:54:09 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Date of the last level '3': Fri Oct 12 01:10:24 2012.
10/12/2012 04:54:09 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Date of this level '4': Fri Oct 12 03:54:08 2012.
10/12/2012 04:54:09 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Update: TRUE.
10/12/2012 06:58:49 - Info ndmpagent (pid=7527) mtprdsan1002.pr.mdwr.s.nokia.com: Runtime [/export/Splunk] 1102577664 bytes (1102577664): 134 seconds
10/12/2012 06:58:53 - Info ndmpagent (pid=7527) NDMP backup successfully completed, path = /export/Splunk
10/12/2012 06:59:11 - Info bptm (pid=7528) EXITING with status 0 <----------
10/12/2012 06:59:11 - Info bpbrm (pid=7525) validating image for client mtprdsan1002.pr.mdwr.s.nokia.com
10/12/2012 06:59:11 - Info ndmpagent (pid=7527) done. status: 0: the requested operation was successfully completed
10/12/2012 06:59:11 - end writing; write time: 2:05:03
the requested operation was successfully completed  (0)
 

spenukon's picture

Here is the job completion of normal backups that go to data domain.  Througput was only 2mb/sec.

 

10/12/2012 03:10:48 - Info nbjm (pid=21227) starting backup job (jobid=475) for client atlnagios01.pr.mdwr.s.nokia.com, policy unix-lan-backupservices-monthly-all, schedule daily-differential
10/12/2012 03:10:48 - Info bpbrm (pid=3065) atlnagios01.pr.mdwr.s.nokia.com is the host to backup data from
10/12/2012 03:10:48 - Info bpbrm (pid=3065) reading file list from client
10/12/2012 03:10:48 - Info nbjm (pid=21227) requesting STANDARD_RESOURCE resources from RB for backup job (jobid=475, request id:{09C0298A-1412-11E2-8706-54DF7F705CA4})
10/12/2012 03:10:48 - requesting resource mtprdsan3002lsu-ost-sug
10/12/2012 03:10:48 - requesting resource mtprdsan6009.atl.stor.s.nokia.com.NBU_CLIENT.MAXJOBS.atlnagios01.pr.mdwr.s.nokia.com
10/12/2012 03:10:48 - requesting resource mtprdsan6009.atl.stor.s.nokia.com.NBU_POLICY.MAXJOBS.unix-lan-backupservices-monthly-all
10/12/2012 03:10:48 - granted resource  mtprdsan6009.atl.stor.s.nokia.com.NBU_CLIENT.MAXJOBS.atlnagios01.pr.mdwr.s.nokia.com
10/12/2012 03:10:48 - granted resource  mtprdsan6009.atl.stor.s.nokia.com.NBU_POLICY.MAXJOBS.unix-lan-backupservices-monthly-all
10/12/2012 03:10:48 - granted resource  MediaID=@aaaab;DiskVolume=mtprdsan3002lsu;DiskPool=mtprdsan3002lsu-dp;Path=mtprdsan3002lsu;StorageServer=mtprdsan3002.atl.stor.s.nokia.com;MediaServer=mtprdsan6010.atl.stor.s.nokia.com
10/12/2012 03:10:48 - granted resource  mtprdsan6010-ost-mtprdsan3002lsu-su
10/12/2012 03:10:48 - estimated 0 kbytes needed
10/12/2012 03:10:48 - Info nbjm (pid=21227) started backup (backupid=atlnagios01.pr.mdwr.s.nokia.com_1350007848) job for client atlnagios01.pr.mdwr.s.nokia.com, policy unix-lan-backupservices-monthly-all, schedule daily-differential on storage unit mtprdsan6010-ost-mtprdsan3002lsu-su
10/12/2012 03:10:48 - started process bpbrm (pid=3065)
10/12/2012 03:10:48 - connecting
10/12/2012 03:10:49 - Info bpbrm (pid=3065) starting bpbkar on client
10/12/2012 03:10:49 - Info bpbkar (pid=16631) Backup started
10/12/2012 03:10:49 - Info bpbrm (pid=3065) bptm pid: 3066
10/12/2012 03:10:49 - Info bptm (pid=3066) start
10/12/2012 03:10:49 - Info bptm (pid=3066) using 1024 data buffer size
10/12/2012 03:10:49 - Info bptm (pid=3066) using 30 data buffers
10/12/2012 03:10:49 - connected; connect time: 0:00:00
10/12/2012 03:10:50 - Info bptm (pid=3066) start backup
10/12/2012 03:10:51 - Info bptm (pid=3066) backup child process is pid 3083
10/12/2012 03:10:51 - begin writing
10/12/2012 03:16:34 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/7/var/run/saslauthd/mux] is a socket special file. Skipping
10/12/2012 03:22:05 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/5/var/log/nagios/rw/live] is a socket special file. Skipping
10/12/2012 03:46:54 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/3/var/run/saslauthd/mux] is a socket special file. Skipping
10/12/2012 03:48:27 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/3/var/log/nagios/rw/live] is a socket special file. Skipping
10/12/2012 04:16:36 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/1/var/run/saslauthd/mux] is a socket special file. Skipping
10/12/2012 05:09:08 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/root/1] is in a different file system from [/vz]. Skipping
10/12/2012 05:09:08 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/root/7] is in a different file system from [/vz]. Skipping
10/12/2012 05:09:08 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/root/3] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/2] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/1] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/6] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/4] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/7] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/5] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/8] is in a different file system from [/vz]. Skipping
10/12/2012 05:54:23 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz/private/3] is in a different file system from [/vz]. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/acpid.socket] is a socket special file. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/sdp] is a socket special file. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/audispd_events] is a socket special file. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/pcscd.comm] is a socket special file. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/avahi-daemon/socket] is a socket special file. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/dbus/system_bus_socket] is a socket special file. Skipping
10/12/2012 07:28:53 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/run/cups/cups.sock] is a socket special file. Skipping
10/12/2012 07:28:57 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var/lib/nfs/rpc_pipefs] is in a different file system from [/var]. Skipping
10/12/2012 07:29:11 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/var] is in a different file system from [/]. Skipping
10/12/2012 07:29:51 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/home] is in a different file system from [/]. Skipping
10/12/2012 07:30:07 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/sys] is on file system type sysfs. Skipping
10/12/2012 07:30:07 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/boot] is in a different file system from [/]. Skipping
10/12/2012 07:36:14 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/vz] is in a different file system from [/]. Skipping
10/12/2012 07:38:43 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/proc] is on file system type PROC. Skipping
10/12/2012 07:38:43 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/dev] is in a different file system from [/]. Skipping
10/12/2012 07:38:43 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/tmp/.font-unix/fs7100] is a socket special file. Skipping
10/12/2012 07:38:52 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/misc] is in a different file system from [/]. Skipping
10/12/2012 07:38:52 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/net] is in a different file system from [/]. Skipping
10/12/2012 07:39:50 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/usr/openv/var/vnetd/terminate_bpcd.uds] is a socket special file. Skipping
10/12/2012 07:39:50 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/usr/openv/var/vnetd/terminate_vnetd.uds] is a socket special file. Skipping
10/12/2012 07:39:50 - Info bpbrm (pid=3065) from client atlnagios01.pr.mdwr.s.nokia.com: TRV - [/usr/openv/var/vnetd/bpcd.uds] is a socket special file. Skipping
10/12/2012 07:51:14 - Info bptm (pid=3066) waited for full buffer 974900 times, delayed 975018 times
10/12/2012 07:51:14 - Info bptm (pid=3066) EXITING with status 0 <----------
10/12/2012 07:51:14 - Info bpbrm (pid=3065) validating image for client atlnagios01.pr.mdwr.s.nokia.com
10/12/2012 07:51:14 - Info bpbkar (pid=16631) done. status: 0: the requested operation was successfully completed
10/12/2012 07:51:14 - end writing; write time: 4:40:23
the requested operation was successfully completed  (0)
 

Marianne's picture

 

Sure about your buffer settings? This is what is reported in bptm log:

03:51:11.454 [7528] <2> read_config_file: using 256 value from /usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS_NDMP
03:51:11.454 [7528] <2> io_init: using 30 data buffers
03:51:11.454 [7528] <2> io_init: using 1024 data buffer size

Please double-check?

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

spenukon's picture

Marianne

 

You are right, the settings was wrong on media server. I changed the SIZE_DATA_BUFFERS_NDMP settings to 262144 as per Nicolai and the backups are running as faster as 90mb/sec. Thanks a lot.

Nicolai's picture

0/12/2012 04:51:11 - Info bptm (pid=7528) using 1024 data buffer size

You need to check this value  - it shoud be 262144. If the backup really use 1K block size it will be very very slow. Check SIZE_DATA_BUFFERS_NDMP on the media server that control the NDMP backup.

more info on SIZE_DATA_BUFFERS_NDMP http://www.symantec.com/docs/TECH147938

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

SOLUTION
mph999's picture

Here we are:

10/12/2012 07:51:14 - Info bptm (pid=3066) waited for full buffer 974900 times, delayed 975018 times

There are a lot of 'delays', we see that bptm is waiting for a full bufer, which means the data isn't being sent from the client very quickly.

The most likely causes are :

1.  Data read speed of the client

2. Network (this can includenetwork settings in the os and possibly network buffer size setting)

But ...

10/12/2012 03:10:49 - Info bptm (pid=3066) using 1024 data buffer size

10/12/2012 03:10:49 - Info bptm (pid=3066) using 30 data buffers

On the media servers create these files :

/usr/openv/netbackup/db/config/SIZE_DATA_BUFFERS

Put the value 262144 in this file

/usr/openv/netbackup/db/config/NUMBER_DATA_BUFFERS

Put the value 128 in this file

(Nicolia has covered the ndmp tuning file )

Re-run a backup that uses this media server - does it go any quicker.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
spenukon's picture

Hi Nicolai and mph99

Thanks a lot for your help. The NDMP backups are running faster after I changed the NDMP settings but the data domain backups are still slow. The data domain backups are little better than before. Its doing 14mb/sec now but it should be faster than this.  We have same setup in other region and it's doing much faster there.

spenukon's picture

Hi

I have changed the SIZE_DATA_BUFFERS to 1048576 and the data domain backups are running faster.

It is doing 120mb/sec. Thank you all for your time.

 

Thanks

Sai

Marianne's picture

Please select the post that has helped you most and Mark as Solution?
(My vote is for Nicolai.)

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links