Video Screencast Help
Protect Your POS Environment Against Retail Data Breaches. Learn More.

Slow Backups on deduplication storage

Created: 02 Aug 2012 | 13 comments

Hi

I was previously using backup exec 2010 with a LTO3 tape library

Job rates were fairly good, around 3000 MB / Minute

 

I switched to Backup Exec 2012 and to deduplication and disk based storage.

I am actually using a brand new server with 24GB RAM a few processors and a HP StorageWorks array with sixteen 10K RPM disks in a raid 5.

Jobs rates now are most of times around 500 Mb / Minute which is really slow.

Sometimes, a single jobs may skyrocket to 2000 mb / Minutes, and sometimes I have got 3 jobs runnings at 100 MB / Minutes each

 

It is really strange as a file copy from network to array drive is copied up to 6000 or 7000 MB per minute

 

Why is it so slow?

I tried to disable checkpoint restart in each job, as I saw it may have performance effect on backup, but it did not changed anything

 

 

 

Comments 13 CommentsJump to latest comment

CraigV's picture

Hi,

 

I edited this post...check and make sure that any AV running on that media server isn't perhaps scanning the dedupe folder, and if it is, exclude it.

Thanks!

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

unarcher's picture

There is an antivirus running on server (ESET NOD32)  but deduplication folder is excluded from File Scanner (Symantec Backup Exec installation folder is too)

villeah's picture

When I had similar problem with dedupe, creating antivirus exceptions didn't help. I had to uninstall AV-program from the server. I'd recommend you to rule that option out.

unarcher's picture

Any other idea there?

Job rate is really awful. It can take days just to backup a single server :(

 

 

CraigV's picture

What speed are your NICs and switch ports set too?

Alternative ways to access Backup Exec Technical Support:

https://www-secure.symantec.com/connect/blogs/alte...

unarcher's picture

1Gb both

Already tried to fix speed or set port to auto, same result

I am actually working on this with Symantec support. He told me that split IO/Sec value seems to be really high on my system between 100 and 600 most of times

Seems to be related to number of disk or stripe size (256K actually on disk storage). He wants me to do additionnal tests copying a lot of small files from network, but I already know what would be the result. It would be really fast compared to backup exec job rate :(

 

 

 

A.W's picture

I met the same issue with my Dell R510 server.

32GB memory, 2 processors and array twelve 7.5k hard disks in a raid 5. 1GB network

Jobs rates now are most around 400 Mb/Minute which is really poor.

 

Keep moving

David Palmerston's picture

A.W. -

This is not a solution for all of the slow job rates, but we have recently identified that our R510 BE Server from October 2011 was misconfigured from the factory.  The misconfigure is more likely to happen with a single processor, but it might be good for you to review your riser card configuration vs. where your controller is for the hard disks and the nic(s).  It is possible to get severely degraded performance if the add-on cards are not placed into the correctly rated slot. (x4, x8, x16).

 

robnicholson's picture

It is possible to get severely degraded performance if the add-on cards are not placed into the correctly rated slot. (x4, x8, x16).

This can indeed be reduce throughput of disk system connected through a PCIe (PCI Express) based controller if the card is placed in a slot with lower capacity than the card can support, e.g. x8 placed in x 4 slot (most high speed disk controllers appear to be x8) and is especially complicated by the fact some Dell systems have what look like x8 slots but only half the pins are wired so it's a x4. You need a torch and a sharp pair of eyes to read the motherboard.

http://en.wikipedia.org/wiki/PCI_Express

But I'm inferring that the disk system of the original poster is using internal disks connected to the pre-installed Perc controller which will therefore be in a slot that is the correct capacity. The basic throughput of copying a test file semi-confirms that it's unlikely to be the disk system of either end or the network that is causing the bottleneck. Not completely eliminated though.

The fact that it's using RAID-5 won't be helping compared to RAID-10. RAID-5 imposes quite a considerable overhead per write (x3 is some cases):

http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_performance

It appears to get worse with random writes which I suspect happens a lot in a deduplication system database.

We're using a SATA-II RAID-10 array which introduces potentially more failures (SATA disks have lower MTBF/duty cycle than SAS), lower spin speeds and 3GB/s transfer. However, the improved performance of RAID-10 goes a long way to offset the lower performance. The higher failure rate we think is acceptable especially as RAID-10 is more resilient to multiple disk failure than RAID-5. When we upgrade the disk enclosure, we'll be putting 3TB x SATA-III in there which takes away the performance angle.

But considering that BE can only manage 1000MB/min (see screenshot) on a good day on a disk system that can also manage a peak copy speed of 6000MB/min means that overly worrying about disk performance isn't going to really get us anywhere.

TIP: we get *much* improved disk speeds to the same disk system when we use a B2D target - can peak at around 3,000MB/min which is better than our aging LTO-3 tape system. Try doing some speed tests with a temporary B2D target and see what you get. If it's still rather low, then it does point to something outside of BE.

Cheers, Rob.

speed1.png
A.W's picture

Hi Bob,

In my enviroment, with B2D, the performance is good. Is about 1600MB/min via 1GB network, but performance of dedu store is poor...

Keep moving

robnicholson's picture

Anyone expecting really exceptional throughput with BE and deduplication is going to be disappointed IMO. We get around 800MB/min on a disk system that can easily handle 6,000MB/min.

This is due IMO to two fundamental bottlenecks in BE:

  1. Speed that the agent (or media server) can calculate the hash values, e.g. MD5 (guess) is never a fast operation
  2. Potential performance issues of the PostgreSQL database used to hold the deduplication database to look-up the hash values and write modified blocks

The remote agent on BE is single threaded and some non-scientific tests I carried out showed that our file server managed about 1000MB/min calculating MD5 hash values. So that is an immediate bottleneck on top of your file servers disk subsystem. You could have solid state disks in there but the throughput would be limited by the hash algorithm.

A solution is to make the agent multi-threaded and it's an ideal candidate if the disk system can supply data faster than a single core can hash it. It's naturally split into 64k blocks.

The bit about PostgreSQL performance is a bit of a guess but it's based upon the fact that you need 1GB of RAM per TB of data being backed up - I can't believe is needs so much for what is a very simple database requirement at it's heart - looking hash value and if not there, write the block. The overriding requirement here is performance.

Most competitors to BE make a big story of how much more efficient their deduplication engine is.

Cheers, Rob.

 

robnicholson's picture

PS. All of the above is guess work as only the BE developers know where the real bottlenecks are. But I really don't think it's in our hardware. I'm guessing that it's MD5 hash - could be a more effecient hash algorithm but I doubt Symantec will divulge that.

Cheers, Rob.

SSR89's picture

Backup Exec Deduplication is pure garbage! I just upgraded from 2012 from 2010 R3 it's not any better. Speeds are horrible. B2D folder on the same drive array gets almost double the speeds of the dedup folder.

Also I keep getting corrupt OST files in my deduplication folder which caused the jobs to stay queued and Symantec tech support has been almost useless. They tell me there is some kind of communication error to the storage device, but its direct attached storage that works just fine with B2D!! Then they tell me to run an inventory to fix it, but inventory takes 22 hours to complete! How am I supposed to back up in the mean time? 

I'm probably going to dump the deduplication all together and just buy more disks and just use regular B2D.