Video Screencast Help

Number_Data_Buffers Question

Created: 06 Jan 2014 • Updated: 21 Feb 2014 | 19 comments
This issue has been solved. See solution.

So recently we made the move to SLPs.  We write to a virtual tape via a VTL and at the same time in the SLP we have duplications for two physical tapes.

Everything is going fine, all the SLPs and duplication finish fine, but today we noticed one ran way longer than it has in the past.  Looking at the details it showed.

1/6/2014 9:58:59 AM - Info bptm(pid=8972) media id FC4076 mounted on drive index 44, drivepath /dev/alias/nst/F0020EC025, drivename IBM.ULTRIUM-TD4.060, copy 3
1/6/2014 1:01:11 PM - Info bptm(pid=9556) waited for empty buffer 199 times, delayed 382394 times   
1/6/2014 1:01:11 PM - end reading; read time: 03:19:08
1/6/2014 1:01:11 PM - positioning FC5520 to file 2
1/6/2014 1:01:11 PM - positioned FC5520; position time: 00:00:00
1/6/2014 1:01:11 PM - begin reading
1/6/2014 1:56:17 PM - Info bptm(pid=9556) waited for empty buffer 56 times, delayed 105814 times   
1/6/2014 1:56:17 PM - end reading; read time: 00:55:06
1/6/2014 1:56:17 PM - Info bptm(pid=8972) waited for full buffer 605 times, delayed 1159589 times   
1/6/2014 1:56:19 PM - Info bptm(pid=8972) setting receive network buffer to 262144 bytes     
1/6/2014 1:56:19 PM - positioning FC5520 to file 3
1/6/2014 1:56:19 PM - positioned FC5520; position time: 00:00:00
1/6/2014 1:56:28 PM - Info bptm(pid=9556) completed reading backup image        
1/6/2014 1:56:28 PM - Info bptm(pid=9556) EXITING with status 0 <----------       
1/6/2014 1:56:36 PM - Info bptm(pid=8972) EXITING with status 0 <----------       
1/6/2014 1:56:36 PM - end Duplicate; elapsed time: 10:07:24
the requested operation was successfully completed(0)

 

We then looked deeper and a large amound of SLPs have similar waiting for empty/full buffer lines.

 

So after some searching, it looks like we need to raise our buffer number, right now our NUMBER_DATA_BUFFERS is set to 32 while our SIZE_DATA_BUFFERS is set to 262144 (support had us set this to this number for path to tape to work).

 

Should changing from 32 to 64 or 128 help with this?

 

Thanks!

 

Operating Systems:

Comments 19 CommentsJump to latest comment

revaroo's picture

Try changing it to 64. I think the bptm process is the reading process, so it's filling the buffers and waiting for an empty one to fill.

It may just be that it is not sending the read data to the tape drive fast enough so could be a tape drive issue. Try upping the number of data buffers.

Nicolai's picture

Increase NUMBER_DATA_BUFFERS to 128 or 256. Also create NUMBER_DATA_BUFFERS_RESTORE and set it also to 128 or 256.

It's free to change NUMER_DATA_BUFFERS as long you have memory enough. You don't break something increasing the buffers - worst cases is wasted memory space.

 

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

mph999's picture

he he ...  Nicolai - the master of the 'number of data buffers'

He speaks wise words ....

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Hanzo581's picture

Just to verify, there is no harm in changing the buffers?  I thought I had read that data backed up at in the past with lower buffer settings needs to be test restored after the buffer setting change, or could I have been reading old guides?

 

Thanks again for the info.

mph999's picture

There is a separate setting 'NUMBER_DATA_BUFFERS_RESTORE" for the number of data buffers used for a restore.

Providing a backup was successful, there is no way that the number of data buffers used for the backup could affect the restore - impossible.

It is also a mis-understanding that the buffer size used for the backup can be changed for the restore, it cannot - whatever the buffer size that was used for the backup will be used for the restore and cannot be changed.

M

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Nicolai's picture

Just to comment MPH999 wise words. He speak them also laugh

It's adjusting SIZE_DATA_BUFFERS that's the "dangerous" part. SIZE_DATA_BUFFERS control how large SCSI block (32K,64K,128K,256K) being used. On windows 2003 pre-SP1 increasing above 64K caused issues. But those problems are long gone and Netbackup today use 256K as the default value. However the number of buffers for each backup stream is still way too low (32). Increase them and you are good

Image this picture:

SIZE_DATA_BUFFERS = size of bucket (5, 10 or 20 liters)

NUMBER_DATA_BUFFERS = number of bucket available for water

Tape drives = receivers of water (Bigger buckets (SIZE_DATA_BUFFERS) are better and more of them (NUMBER_DATA_BUFFERS) means more water moved (backup speed).

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

mph999's picture

It is a little difficult to explain all the possible combinations of possibilities.

Generally, you ned to look to see if the delays are waiting for full or waiting for empty buffers.

Think of the buffer as a bucket.

Oh, the rules are that a buckets can only be filled whenit is empty or emptied when it is full.

If the data is slow from the client (eg bad network) then the buckets fill up slowly, but when full, they get emptied to the tape drives real quick.  then there are no buckets full, so bptm has to wait.

Hence, waiting for full buffer.

If the issue is the tape drive side - then the buckets fill up real quick, but take ages to empty, so the process that fills the buckets has to wait for one to be empty (and thus available to refill).

Hence, waiting for empty buffer.

The process that fills the buckets is different for a local backup (media server backing itself up) or a rempote client (over the network).

For a local backup, bpbkar fills the buffer directly, this this waiting for empty line would be seen in the bpbkar log.

For a remote client, bpbkar sends data to the tcp port, and a child bptm process takes it from there and sends it to the buffer. hence, the bptm log would have the 'waiting for empty lines'.

There is a touch file NOshm  it is a big mis-understanding that this turns off shared memory (buffers) it doesn't.  It makes a local backup behave like a remote backup, so bpbkar would send the data to a port, not the buffer, and a child bptm process takes it from there.

As Nicolai says, more buffers shouldn't cause an issue, if they can't be filled (not enough data) then they just sit there empty and the only issue is more memory is used.

Ideally, you want the total delays for waiting for full and to be 0 - this is not likely to happen but if it did, it would mean there is a perfect balaance of buckets being filled / emptied there are alwasys buckets that can be emptied, nd always some that can be filled - a constant stream.  In this case, adding more buffers may help increase the performance as there may be spare capacity in the ability to send data from the client.  If the number is increased and no performance gains are made then this would mean you had already achived the max possible.

Other points to consider,

How many delays are 'bad'.  Each delay is 15ms, so the total number of seconds can be worked out.

If the backup would take a few hours (if running well) and the total delays only add up to a few minutes, then there is probably no real issue.  If however the backup should take say 1 hours, but the delays add up to 10 minutes, then this is quite a % of the total time.

Generally a few 1000 delays on a backup that takes a couple of hours or more could be considered acceptable, but a 100 000 would be an issue - it's inposisble to say really, without considereing each backup separately.

If the buffers are tuned well (so for LTO drives that would be size 262144 and number 128 or 256 ) ... then gernerally, the two comman causes of waiting for full buffer issues are read spped of the client disks, or network problems.

Waiting for empty issues are more rare, and in my experience have come down to faulty tape drives.

Hope this provides a little insight.

Martin

 

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Hanzo581's picture

Ok, I've changed the NUMBER_DATA_BUFFERS and NUMBER_DATA_BUFFERS_RESTORE to 128, and from the looks of it most of the SLP dupe jobs have at most a few hundred delays...some are still a little high, and a few still seem to be waiting for full buffer a lot, like this one.

 

1/10/2014 4:19:37 AM - positioning AA1774 to file 1
1/10/2014 4:19:37 AM - positioned AA1774; position time: 00:00:00
1/10/2014 4:19:38 AM - begin reading
1/10/2014 4:28:04 AM - Info bptm(pid=22576) waited for empty buffer 9 times, delayed 12640 times   
1/10/2014 4:28:04 AM - Info bptm(pid=10032) waited for full buffer 116 times, delayed 222273 times   
1/10/2014 4:28:05 AM - end reading; read time: 00:08:27
1/10/2014 4:28:05 AM - Info bptm(pid=22576) completed reading backup image        
1/10/2014 4:28:05 AM - Info bptm(pid=22576) EXITING with status 0 <----------       
1/10/2014 4:28:12 AM - Info bptm(pid=10032) EXITING with status 0 <----------       
1/10/2014 4:28:13 AM - end Duplicate; elapsed time: 01:57:46
the requested operation was successfully completed(0)

 

So should I bump the buffer to 256 or is it possible my LT04 drives are just too slow?

Also, are these data buffer number settings only changed on the master server or am I supposed to be doing it on media servers as well?

mph999's picture

Settings go on the media servers only.

Sure try 256 - if no different pu tit back.

Presume you have 262144 as SIZE_DATA_BUFFERS ?

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Hanzo581's picture

My master server pulls double duty and has some media server roles, perhaps that is why I am seeing the issue reduced.  Did not think to set this on all the media servers, I know where to make/change the file on windows, but most of our media servers are linux/unix, where would my linux team create/set the files?

 

Thanks!

Hanzo581's picture

Sorry, forgot to answer the question.  Yes, SIZE_DATA_BUFFERS is set to 262144 on all servers, master and media in the environment.

mph999's picture

The files go in :

/usr/openv/netbackup/db/config

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805
 
Marianne's picture

Seems you have deleted that post in the meantime....

Supporting Storage Foundation and VCS on Unix and Windows as well as NetBackup on Unix and Windows
Handy NBU Links

Sat-Chit-ananda's picture

Hmm, I am new to this forum.

I tried to create it as blog, some how it doesn't go to published state.posted it as a forum discussion,Some one had objection certain content.

Edited it and posted my contents only.Below is the blog link.

https://www-secure.symantec.com/connect/blogs/disk...

Andrea Bolongaro's picture

Hi.

Following a good reading technote for tuning buffers etc.

Best practices for NET_BUFFER_SZ, why can't NetBackup change the TCP send/receive space

http://www.symantec.com/docs/TECH28339

 

 

KDob's picture

Hmm.

 

"waited for full buffer 116 times, delayed 222273 times"

Seems like it is waiting for FULL buffers.  shouldn't that indicate that the buffer size should be smaller, and not larger?  There should be fewer buckets of smaller size, that way there would be a reduction in the time to wait for the buckets to get full?

Curious,

Nicolai's picture

No No No

Small buffer will only add to the tape drive effect called "shoe shining".

Is better to send one 256K buffer (bucket) to the tape drive than sending 4*64K. Worst case scenario you need to do 4 start/stop operations for 64K block size  instead of just one using 256K.

When counting buffers wait you need to look at the average value on all jobs. Some jobs will always misbehave ....

Assumption is the mother of all mess ups.

If this post answered your'e qustion -  Please mark as a soloution.

SOLUTION
mph999's picture

Nicolai is right, but your idea of smaller bukets is logical - a good question.

The problem with smaller buffers, is they get full quicker yes, but also empty quicker.

It's a fine balancing act, and as you see, virtually impossible to explain in a reasonable space to cover all possibilities.  

 

Regards,  Martin
 
Setting Logs in NetBackup:
http://www.symantec.com/docs/TECH75805