Video Screencast Help
Search Video Help Close Back
to help
New in the Rewards Catalog: Vouchers for "Symantec Technical Specialist" and "Symantec Certified Specialist" exams.

Troubleshooting restore speeds

Updated: 21 May 2010 | 15 comments
Giroevolver's picture
0 0 Votes
Login to vote

Hi Guys,

I have been doing some restore testing and have found that my restores are actually quite slow compared to may backup speeds. I have a client that I am using for testing and am backing up a 34GB .bkf file. The client is a windows 2003 server with a 4GB fibre MSA1000 attached which is where the file resides running a RAID 5 with about 10 disks in. Now I am restoring from tape to the MSA drive and it is running at an average of between 10000kbs to 15000kbs. My backup from this drive is arround 90000kbs to 100000kbs. This is an lto3 drive that i'm using for testing with no multiplexing.

I have tweaked the following

NUMBER_DATA_BUFFERS = 64
NUMBER_DATA_BUFFERS_RESTORE = 64
SIZE_DATA_BUFFERS = 262144
Communitcation buffers are all set to 256 on clients and media server.

I originally didn't have the NUMBER_DATA_BUFFERS_RESTORE set and thought this may have been the issue but adding this file in has made no difference to the speed of the restore.

I have looked in the bptm log on the media server and can see the below

mpx_read_data: waited for empty buffer 3568 times, delayed 30953 times.

Both of the servers are on the same LAN and are connected at 1000 full

I have run some restores on other servers as well and they run at a similar speed so i'm thinking its the media server but not sure why.

Can anyone help troubleshoot this?

Thanks in advance!

Master/media server = windows 2003 64x sp2 r2 NetBackup 6.5.4

Comments

Android's picture
17
Aug
2009
0 Votes 0
Login to vote

/* Style Definitions

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}

Were the backups run as muliplexed backups?  If so then this is expected behavior.  The problem is that with backups that are done using multiplexing the data when read from tape cannot be read in one continuous stream due to the nature of multiplexing.  One chunk of data at a time is read then the drive move the tape to the next location for that client before reading more data to send over for the restore.

See the following excerpt from the admin guide II exftpp.symantec.com/pub/support/products/NetBackup_Enterprise_Server/290204.pdf

When to use multiplexing
Multiplexing is generally used to reduce the amount of time that is required to complete backups. The performance in the following situations would be improved by using multiplexing:

Slow clients. Instances in which NetBackup uses software compression, which normally reduces client performance, are also improved.

Multiple slow networks. The parallel data streams take advantage of whatever network capacity is available.

Many short backups (for example, incremental backups). In addition to providing parallel data streams, multiplexing reduces the time each job waits for a device to become available. Therefore, the storage device transfer rate is maximized.
Multiplexing reduces performance on restores because it uses extra time to read the images.

Giroevolver's picture
17
Aug
2009
0 Votes 0
Login to vote

I fully understand

I fully understand multiplexing and as I have said in my original post there is no multiplexing on this tape that I am restoring from.

Thanks

Mouse's picture
17
Aug
2009
1 Vote +1
Login to vote

What about bpkar log?

What does it shows?

I hope you already read this article (DOCUMENTATION: How to configure buffers for NetBackup in a Windows NT/2000 environment to improve performance; http://seer.entsupport.symantec.com/docs/244652.htm) - you can find out the reason of the delay by understanding who is waiting data in the buffer.

Giroevolver's picture
18
Aug
2009
0 Votes 0
Login to vote

bpkar is not used

I have read the performance tunning guide but I cant seem to make sence of the data I'm getting. As its a remote restore it does not use bpkar process only bptm. I believe only bpkar is used on a backup not a restore. I believe that the media server is at fault from my original post but I cant make it any faster and the server is doing nothing most of the time.

Mouse's picture
18
Aug
2009
0 Votes 0
Login to vote

Sorry, missed the point that

Sorry, missed the point that you are talking about recovery.
You are right, bpbkar is not involved there, so those lines can be in the tar logs

Giroevolver's picture
19
Aug
2009
0 Votes 0
Login to vote

I will try and run some more

I will try and run some more tests today with the tar log enabled on the client to see what it produces.

Giroevolver's picture
20
Aug
2009
0 Votes 0
Login to vote

Here are the results from the

Here are the results from the latest restore

BPTM - read_data: waited for empty buffer 5481 times, delayed 20727 times

and the TAR log didnt tell me anything to do with waits times? this restore was done from disk so it should have been even faster but it still seems to be the same slow speeds!

Anyone else seen anything like this before?

Mouse's picture
20
Aug
2009
0 Votes 0
Login to vote

Couple of assumptions. Your

Couple of assumptions.

Your tape read performance is much faster than writing to the destination. That can mean too large buffer or some element in chain disk-nic-network causing the slowdown.

It seems that NBU read speed is not a problem here - maybe it's good idea to measure baseline performance of your network and disk on the client side?

Giroevolver's picture
20
Aug
2009
0 Votes 0
Login to vote

According to the performance

According to the performance tuning guide it sayd that the bptm parent is the data producer and this is where the delays are pointing. I just cant see why its so slow.

I increased the amount of buffers to 256 just to see what it did and it made it slower.

Giroevolver's picture
20
Aug
2009
0 Votes 0
Login to vote

I have just run a spotlight

I have just run a spotlight monitor on both servers and neither of them are doing much at all.

Network = 15mbs similar on both
CPU = 2% similar on both
Disk writes on the client 15 -30 per second
Disk reads on the media server 100 - 150 per second
There is no disk queuing on either server

I dont think its a disk issue as the same speeds occour when running from tape.

Will support troubleshoot restore speed issues?

Mouse's picture
20
Aug
2009
0 Votes 0
Login to vote

Yes, they will

But make sure first that you have TOE (TCP Offload Engine) disabled, and if you have Broadcom NIC on any of your servers in chain, install the latest driver from the vendor.

Here is network tuning guide for TOE issues:
seer.entsupport.symantec.com/docs/304578.htm

One more:
seer.entsupport.symantec.com/docs/296341.htm

Very frequent issue with Broadcom NICs http://seer.entsupport.symantec.com/docs/316182.htm

Giroevolver's picture
21
Aug
2009
0 Votes 0
Login to vote

TOE is enabled on all my 5

TOE is enabled on all my 5 NIC's and I have 1 broadcom/HP NIC in the team so I will install the Microsoft SNP and if that doesnt help turn off the TCP Offload at OS and NIC level.

I'll let you know if that helps!

Mouse's picture
21
Aug
2009
0 Votes 0
Login to vote

Yes, please let us know. I

Yes, please let us know.

I don't know what is the exact reason why Symantec recommends to turn off the TOE, but sometimes it's really helps to increase performance. Maybe buggy implementation in the OS/drivers stops this generally good technology from working properly.

Giroevolver's picture
21
Aug
2009
0 Votes 0
Login to vote

I have turned off TOE on the

I have turned off TOE on the media server and the backups are running faster but the restore has not got any quicker. I just dont get this!

Giroevolver's picture
21
Aug
2009
0 Votes 0
Login to vote

For another test I restored

For another test I restored the same data to the media server locally rather than the remote client and it restored 12.8GB in 1 minute 5 seconds with a speed of 246137kbps so its something to do with the remote restores

I have now tried the same file to a different client ad am getting the same sort of slow 10000kbs speeds so I assume the is either some network config in netbackup the needs to be changed or server network configuration causing the issues