Enterprise Vault Backups, Collections and You
Created: 29 Nov 2012 | Updated: 10 Dec 2012 | 2 comments
An issue which many small to medium companies struggle with is Enterprise Vault data on disk growing over time. There are two aspects to the issue, firstly there is the overall size. Over time small companies still generate sizeable quantities of data, and with limited budgets that becomes hard to manage. Really it's not the volume of data which is the underlying problem, it is the quantity of files. Many, many, many small files are created by Enterprise Vault on the data partitions along with many different folders. The high quantity of small files makes backing up those files slow.
Backups taking a long time is the issue.. One or two years after implementing Enterprise Vault environments begin to struggle because of the backups taking longer and longer, daily backups become almost impossible (unless budget is available for faster, bigger, better backup devices!)
In this article I'll show you a small, worked-through example of what is happening, and offer a potential solution. The solution comes with some caveats, which I'll go into towards the end of the article.
In my sample environment I'm running Enterprise Vault 10.0.2 with a number of small archives, including some mailbox archives and some FSA archives. The net result is that I have:
2.48 Gb of data
These are all stored locally on the EV server, on a single drive (spread across a number of different Enterprise Vault partitions)
I am going to use Windows Server Backup from Windows 2008 R2 in order to backup the folder, to another drive. I know that some 3rd party (ie non-Microsoft) products can perform better in certain environments, but I'm going with the 'free' option, and using what is built in to the Operating System. The source server is running in VMWare Workstation 8, and the two drives are on their own Solid State Disk (the same physical drive for the two drives which Windows sees).
Granted the setup is not as large as I would like, nor is the underlying configuration as tuned as I would like, but you'll see it will demonstrate the fundamental principal.
Windows Server Backup Testing
I ran Windows Server Backup three different times, when there was little to no activity on the Enterprise Vault environment.
Run 1 = 35 minutes
Run 2 = 42 minutes
Run 3 = 41 minutes
Average across 3 runs = 39.3 minutes
The 'solution' I offer is using Enterprise Vault collections. This is something that is enabled on each of the partitions, and has a number of configuration options such as :
Start and End Times - for the collector process (part of StorageFileWatch) to run
Maximum Size of Collection Files - default is 10 Mb, and I have not changed that
Age at which files will be eligible for collections - default is 10 days, and I have not changed that
Once collections were configured on each of my five partitions, I then issues a 'Run Now' for each of the partitions. 'Run Now' is another option on the properties of each partition. There is no harm in running collections on multiple partitions, but as with many aspects of Enterprise Vault, if you deploy this in a real environment you may want to stagger the running of the collections to balance the load.
Further Windows Server Backup Testing
Once collections were enabled, and the collection run finished, I then observed:
2.48 Gb of data
You'll notice that it's the same amount of data, and the same number of files, but a vastly reduced quantity of files.
On a subsequent set of three backup runs I saw the following:
Run 1 = 23 minutes
Run 2 = 23 minutes
Run 3 = 22 minutes
Average across 3 runs = 22.6 minutes
Solved? Not quite - the Caveats
This appears to have solved the problem! My backups are now 'super' fast; at least 33% faster for the SAME amount of data. Unfortunately this all comes with a price.
First of all you have to fit into your busy server schedule the running of the collection processes. You then have to take in to account what Enterprise Vault needs to do in order to retrieve an item. It needs to locate the CAB file, and extract the item from the CAB file, before it can be delivered to a client (or client process). This extra hop obviously takes a little bit of time, a little bit of time for every retrieval. This could become more pronounced if you need to rebuild an index for an archive.
Deletions also have to be taken in to consideration, if they're allowed, and Storage Expiry, if that is enabled. Deletions will lead to items being dereferenced from CAB files. This may lead to what Enterprise Vault calls 'Sparse Collections'. These are collections which are taking up space (in terms of the CAB file) but don't actually contain too much data.
Note: When items are deleted 'from CAB files' what happens is that Enterprise Vault reduces a reference count in the database, the CAB file is not touched.
These 'sparse CAB files' can be restructured, and StorageFileWatch will do so as part of it's scheduled running - but it's another thing that adds a little bit of time, and processing overhead.
In the end it is the same approach as with many aspects of Enterprise Vault, it is necessary to carefully consider this options benefits (of faster backup) with the downsides. Each environment is unique, so it's not really something that can be recommended, or not. It is definitely worth considering though... but the final thought... once enabled, there is no going back. You can create new 'non-collected' partitions, but uncollecting the already collected partitions is not to be taken lightly, and will likely need involvement from Symantec Support.