Troubleshooting restore speeds
Hi Guys,
I have been doing some restore testing and have found that my restores are actually quite slow compared to may backup speeds. I have a client that I am using for testing and am backing up a 34GB .bkf file. The client is a windows 2003 server with a 4GB fibre MSA1000 attached which is where the file resides running a RAID 5 with about 10 disks in. Now I am restoring from tape to the MSA drive and it is running at an average of between 10000kbs to 15000kbs. My backup from this drive is arround 90000kbs to 100000kbs. This is an lto3 drive that i'm using for testing with no multiplexing.
I have tweaked the following
NUMBER_DATA_BUFFERS = 64
NUMBER_DATA_BUFFERS_RESTORE = 64
SIZE_DATA_BUFFERS = 262144
Communitcation buffers are all set to 256 on clients and media server.
I originally didn't have the NUMBER_DATA_BUFFERS_RESTORE set and thought this may have been the issue but adding this file in has made no difference to the speed of the restore.
I have looked in the bptm log on the media server and can see the below
mpx_read_data: waited for empty buffer 3568 times, delayed 30953 times.
Both of the servers are on the same LAN and are connected at 1000 full
I have run some restores on other servers as well and they run at a similar speed so i'm thinking its the media server but not sure why.
Can anyone help troubleshoot this?
Thanks in advance!
Master/media server = windows 2003 64x sp2 r2 NetBackup 6.5.4
Comments
/* Style Definitions
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
Were the backups run as muliplexed backups? If so then this is expected behavior. The problem is that with backups that are done using multiplexing the data when read from tape cannot be read in one continuous stream due to the nature of multiplexing. One chunk of data at a time is read then the drive move the tape to the next location for that client before reading more data to send over for the restore.
See the following excerpt from the admin guide II exftpp.symantec.com/pub/support/products/NetBackup_Enterprise_Server/290204.pdf
When to use multiplexing
Multiplexing is generally used to reduce the amount of time that is required to complete backups. The performance in the following situations would be improved by using multiplexing:
■
Slow clients. Instances in which NetBackup uses software compression, which normally reduces client performance, are also improved.
■
Multiple slow networks. The parallel data streams take advantage of whatever network capacity is available.
■
Many short backups (for example, incremental backups). In addition to providing parallel data streams, multiplexing reduces the time each job waits for a device to become available. Therefore, the storage device transfer rate is maximized.
Multiplexing reduces performance on restores because it uses extra time to read the images.
I fully understand
I fully understand multiplexing and as I have said in my original post there is no multiplexing on this tape that I am restoring from.
Thanks
What about bpkar log?
What does it shows?
I hope you already read this article (DOCUMENTATION: How to configure buffers for NetBackup in a Windows NT/2000 environment to improve performance; http://seer.entsupport.symantec.com/docs/244652.htm) - you can find out the reason of the delay by understanding who is waiting data in the buffer.
bpkar is not used
I have read the performance tunning guide but I cant seem to make sence of the data I'm getting. As its a remote restore it does not use bpkar process only bptm. I believe only bpkar is used on a backup not a restore. I believe that the media server is at fault from my original post but I cant make it any faster and the server is doing nothing most of the time.
Sorry, missed the point that
Sorry, missed the point that you are talking about recovery.
You are right, bpbkar is not involved there, so those lines can be in the tar logs
I will try and run some more
I will try and run some more tests today with the tar log enabled on the client to see what it produces.
Here are the results from the
Here are the results from the latest restore
BPTM - read_data: waited for empty buffer 5481 times, delayed 20727 times
and the TAR log didnt tell me anything to do with waits times? this restore was done from disk so it should have been even faster but it still seems to be the same slow speeds!
Anyone else seen anything like this before?
Couple of assumptions. Your
Couple of assumptions.
Your tape read performance is much faster than writing to the destination. That can mean too large buffer or some element in chain disk-nic-network causing the slowdown.
It seems that NBU read speed is not a problem here - maybe it's good idea to measure baseline performance of your network and disk on the client side?
According to the performance
According to the performance tuning guide it sayd that the bptm parent is the data producer and this is where the delays are pointing. I just cant see why its so slow.
I increased the amount of buffers to 256 just to see what it did and it made it slower.
I have just run a spotlight
I have just run a spotlight monitor on both servers and neither of them are doing much at all.
Network = 15mbs similar on both
CPU = 2% similar on both
Disk writes on the client 15 -30 per second
Disk reads on the media server 100 - 150 per second
There is no disk queuing on either server
I dont think its a disk issue as the same speeds occour when running from tape.
Will support troubleshoot restore speed issues?
Yes, they will
But make sure first that you have TOE (TCP Offload Engine) disabled, and if you have Broadcom NIC on any of your servers in chain, install the latest driver from the vendor.
Here is network tuning guide for TOE issues:
seer.entsupport.symantec.com/docs/304578.htm
One more:
seer.entsupport.symantec.com/docs/296341.htm
Very frequent issue with Broadcom NICs http://seer.entsupport.symantec.com/docs/316182.htm
TOE is enabled on all my 5
TOE is enabled on all my 5 NIC's and I have 1 broadcom/HP NIC in the team so I will install the Microsoft SNP and if that doesnt help turn off the TCP Offload at OS and NIC level.
I'll let you know if that helps!
Yes, please let us know. I
Yes, please let us know.
I don't know what is the exact reason why Symantec recommends to turn off the TOE, but sometimes it's really helps to increase performance. Maybe buggy implementation in the OS/drivers stops this generally good technology from working properly.
I have turned off TOE on the
I have turned off TOE on the media server and the backups are running faster but the restore has not got any quicker. I just dont get this!
For another test I restored
For another test I restored the same data to the media server locally rather than the remote client and it restored 12.8GB in 1 minute 5 seconds with a speed of 246137kbps so its something to do with the remote restores
I have now tried the same file to a different client ad am getting the same sort of slow 10000kbs speeds so I assume the is either some network config in netbackup the needs to be changed or server network configuration causing the issues
Would you like to reply?
Login or Register to post your comment.