Video Screencast Help

Buffer Issue - Netbackup 7.5.0.3

Created: 04 Sep 2012 • Updated: 23 Jan 2013 | 8 comments
This issue has been solved. See solution.

Hi,

I'm new to NetBackup. I've noticed  there is a buffer issue in our environment. I’ve read the tuning guide and articles but still very confused on how to fix this issue.

I've set the below on all 3 of our media servers but still have the same issue. Anyone have any suggestions?

 

NET_BUFFER_SZ buffer value to 262144,

NUMBER_DATA_BUFFERS to 16

SIZE_DATA_BUFFERS to 262144

 

Log

09/2012 7:00:19 p.m. - connected; connect time: 00:00:00
4/09/2012 7:00:20 p.m. - Info bpbrm(pid=3508) starting bpbkar32 on client
4/09/2012 7:00:53 p.m. - Info bpbkar32(pid=5396) Backup started
4/09/2012 7:00:53 p.m. - Info bptm(pid=4792) start
4/09/2012 7:00:54 p.m. - Info bptm(pid=4792) using 262144 data buffer size
4/09/2012 7:00:54 p.m. - Info bptm(pid=4792) setting receive network buffer to 262144 bytes
4/09/2012 7:00:54 p.m. - Info bptm(pid=4792) using 30 data buffers
4/09/2012 7:00:56 p.m. - Info bptm(pid=4792) start backup
4/09/2012 7:00:57 p.m. - begin writing
4/09/2012 7:03:10 p.m. - Info bpbkar32(pid=5396) change journal NOT enabled for <C:\>
4/09/2012 7:04:27 p.m. - Info bpbkar32(pid=5396) change journal NOT enabled for <D:\>
4/09/2012 7:11:36 p.m. - Info bptm(pid=4792) waited for full buffer 10024 times, delayed 38654 times
4/09/2012 7:11:40 p.m. - Info bpbkar32(pid=5396) bpbkar waited 66 times for empty buffer, delayed 66 times.
4/09/2012 7:12:11 p.m. - Info bptm(pid=4792) EXITING with status 0 <----------

Comments 8 CommentsJump to latest comment

mph999's picture

4/09/2012 7:11:36 p.m. - Info bptm(pid=4792) waited for full buffer 10024 times, delayed 38654 times

Your backup looked quite small (10 mins) ao not a massive amount of time to get accurate results, a backup that runs a bit longer is sometimes better but no matter ...

4/09/2012 7:11:36 p.m. - Info bptm(pid=4792) waited for full buffer 10024 times, delayed 38654 times

This shows that bptm was delayed 38654 times during this backup, which is not a massive number if the backup was over several hours, but in 10 0mins, I guess it is quite large.

I do not know if this was a backup of the media server itself, or a client over a network.

Nicolai is very good at explaining this, and has a good webpage on it - I'm sure he will be along soon.

anyways ...

Think of the data as water and the buffers as buckets.  A buckets full of water is only tipped out to the drive when it is full.  A bucket can only be refilled by the 'clilent' when it is empty.

So, if bptm is waiting for full buckets, it doesn't matter if you have more of less you still can't fill them quick enough, so I would suggest that in this case, the number of buffers (buckets) makes no difference.

I know from experience, that a data buffer size of 262144 should will give good performance.

Ideally, the number of delays should be zero, if so, it shows the bukets are filling and empting at a good rate, performance my then be further improved by adding more, or increasing the size, but you are not at that stage.

Simply, it looks to me like the client cannot supply the data quick enough, so the performance bottleneck  is the speed the data is read from the disk, or if a remote client the network.

The best way to try and test this, is to backup from the media server itself (removes the network from being a possible cause) and try and use a disk that has good performance.

Without further details of exactly what you are backing up and from where, I would say this issue is outside of NBU.

Martin

 

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
NZData's picture

Thanks for your comment Martin.

This backup above was backing up a media server to disk. I'm running a dedupe job to tape and getting the below too

 

5/09/2012 10:39:10 a.m. - Info bptm(pid=6320) media id P105L5 mounted on drive index 2, drivepath {5,0,2,0}, drivename IBM.ULT3580-HH5.002, copy 2
5/09/2012 10:39:14 a.m. - Info bptm(pid=6320) INF - Waiting for positioning of media id P105L5 on server wdcntbkp05.corporate.nzpost.co.nz for writing.
5/09/2012 10:40:44 a.m. - Info bptm(pid=6320) waited for full buffer 1228 times, delayed 2262 times   
5/09/2012 10:40:48 a.m. - end reading; read time: 00:02:20
5/09/2012 10:40:48 a.m. - Info bpdm(pid=4700) completed reading backup image        
5/09/2012 10:40:50 a.m. - Info bpdm(pid=4700) using 30 data buffers        
5/09/2012 10:40:54 a.m. - begin reading
5/09/2012 10:42:02 a.m. - Info bptm(pid=6320) waited for full buffer 1653 times, delayed 2914 times   
5/09/2012 10:42:06 a.m. - end reading; read time: 00:01:12
5/09/2012 10:42:06 a.m. - Info bpdm(pid=4700) completed reading backup image        
5/09/2012 10:42:08 a.m. - Info bpdm(pid=4700) using 30 data buffers        
5/09/2012 10:42:12 a.m. - begin reading
5/09/2012 10:43:44 a.m. - Info bptm(pid=6320) waited for full buffer 2171 times, delayed 3868 times   
5/09/2012 10:43:46 a.m. - end reading; read time: 00:01:34
5/09/2012 10:43:46 a.m. - Info bpdm(pid=4700) completed reading backup image        
5/09/2012 10:43:47 a.m. - Info bpdm(pid=4700) using 30 data buffers        
5/09/2012 10:43:52 a.m. - begin reading
5/09/2012 10:48:13 a.m. - Info bptm(pid=6320) waited for full buffer 7036 times, delayed 11941 times   
5/09/2012 10:48:14 a.m. - end reading; read time: 00:04:22
5/09/2012 10:48:14 a.m. - Info bptm(pid=6320) setting receive network buffer to 262144 bytes     
5/09/2012 10:48:18 a.m. - Info bpdm(pid=4700) completed reading backup image        
5/09/2012 10:48:22 a.m. - Info bpdm(pid=4700) using 30 data buffers        
5/09/2012 10:48:26 a.m. - begin reading
5/09/2012 10:53:19 a.m. - Info bptm(pid=6320) waited for full buffer 7789 times, delayed 13581 times   
5/09/2012 10:53:20 a.m. - end reading; read time: 00:04:54
5/09/2012 10:53:20 a.m. - Info bptm(pid=6320) setting receive network buffer to 262144 bytes     
5/09/2012 10:53:25 a.m. - Info bpdm(pid=4700) completed reading backup image        
5/09/2012 10:53:28 a.m. - Info bpdm(pid=4700) using 30 data buffers        
5/09/2012 10:53:32 a.m. - begin reading

captain jack sparrow's picture

you cannot run Dedupe job to tape media. It has to be on Disk (iSCSI ,  FC or native) can you reconfim on this.

Also for buffers it differs from environment to environment. You may try alternate values and check if it improves

But make sure you don't do it on your production backup. Also make note of values being changed pre-post production backups. changes in values to next backup and previous backup would cause issues during restores.

 Cheers !!!

CJS

 

mph999's picture

I'd be surprised, the value of the buffer size used for backup is always that which is used for restore, even if the value is changed.  So in other works, SIZE_DATA_BUFFERS has no effect on restores.

Perhaps number of  buffers does though, I've never had to play about with this, once set, should be left alone.

Martin

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Mick Scott's picture

Can we take a step back ... what I/O throughput are you getting to tape and disk ?

Can you explain your environment in a bit more detail. All I can see from above is 3 media servers, some dedup disk and an LTO5 tape drive. what NBU and OS versions ? 

Are you backing up to disk then duplicating to tape?

 

Nicolai's picture

http://www.mass.dk/netbackup/guides/49-netbackup-buffer-tuning.html

My recommendation for buffer tuning:

NUMBER_DATA_BUFFERS = 256

NUMBER_DATA_BUFFERS_RESTORE = 512

SIZE_DATA_BUFFERS = 262144

I always recommend to do tape drive performance testing local (not over network) and keep in mind local disk also has a performance limit that's most often slower than a LTO5 tape drive. If the media server run Linux/UNIX the GEN_DATA file directive is a absolute brilliant tool.

http://www.symantec.com/docs/TECH75213

It also worth to notice tape drive will de-generate over time. This will reduce both read and write speeds, therefore using a well known good tape and tape drive is paramount. If you suspect a tape drive is below speed use a cleaning tape. You may also try the sg_logs command in Linux to retrieve the drive internal error  correction counters. "-p 2" is the write error counters - "-p 3" is the read counter. I am not aware of a tool that can do the same in Windows-

 

# sg_logs -p=2  /dev/nst0
  HP        Ultrium 3-SCSI    L29S
Write error counter page
  Errors corrected without substantial delay = 52
  Errors corrected with possible delays = 0
  Total operations = 0
  Total errors corrected = 0
  Total times correction algorithm processed = 58
  Total bytes processed = 723
  Total uncorrected errors = 0
 

 

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

SOLUTION
Stavros41's picture

Its looks as though there is some mistake in your configuration.

You are listing a number of data buffers as 16 but your attached log shows clearly 30 are being used?

 

NET_BUFFER_SZ buffer value to 262144,

NUMBER_DATA_BUFFERS to 16

SIZE_DATA_BUFFERS to 262144

 

Log

09/2012 7:00:19 p.m. - connected; connect time: 00:00:00
4/09/2012 7:00:20 p.m. - Info bpbrm(pid=3508) starting bpbkar32 on client
4/09/2012 7:00:53 p.m. - Info bpbkar32(pid=5396) Backup started
4/09/2012 7:00:53 p.m. - Info bptm(pid=4792) start
4/09/2012 7:00:54 p.m. - Info bptm(pid=4792) using 262144 data buffer size
4/09/2012 7:00:54 p.m. - Info bptm(pid=4792) setting receive network buffer to 262144 bytes
4/09/2012 7:00:54 p.m. - Info bptm(pid=4792) using 30 data buffers

 

As the others have noted without additional details it will be hard to reccomend a setting and there won't be a magic one size fits all reccomendation.

Ganu's picture

Hi ,

I have backup failing with below error.. Can any one help me troubleshoot the issue.

Logs:

10/25/2012 08:42:41 - Info bpbkar (pid=6968) change journal NOT enabled for <C:\>
10/25/2012 08:45:37 - Info bpbkar (pid=6968) change journal NOT enabled for <D:\>
10/25/2012 08:46:11 - Info bpbkar (pid=6968) change journal NOT enabled for <R:\>
10/25/2012 08:46:16 - Error bpbrm (pid=21150) socket read failed: errno = 131 - Connection reset by peer
10/25/2012 08:46:16 - Error bpbrm (pid=21150) could not send server status message
10/25/2012 08:46:16 - Error bptm (pid=21149) system call failed - Connection reset by peer (at child.c.1298)
10/25/2012 08:46:16 - Error bptm (pid=21149) unable to perform read from client socket, connection may have been broken
10/25/2012 08:46:16 - Info bpbrm (pid=3513) sending message to media manager: STOP BACKUP <Client Name>_1351168924
10/25/2012 08:46:16 - Info bpbrm (pid=3513) media manager for backup id matlksrt1sqs001.nam.pwcinternal.com_1351168924 exited with status 150: termination requested by administrator