Centera and collections

Created: 09 May 2006 • Updated: 21 May 2010 | 3 comments

I am looking for any information on Centera collections and Enterprise Vault. Does anyone have any documentation or can provide some description on the pluses and minuses of enabling collections.


Centera and "collections" are really two different beasts. I'm sure you understand this already, but for those whom are playing at home here's the really simple explanations:

"Centera" is a hardware based (EMC storage product) archival solution which uses an SDK (EMC prop interface) to take raw flat file data, and compress it using large block single instance storage. It does other kewl stuff, like making lotsa continuous copies of your data across several volumes so you know you'll never suffer from data loss. Shiny and kewl if you can invest in it.

"Collections" on the other hand, is a software solution put in place by Enterprise Vault on a partition in order to squish many small dvs files in to larger collection files. You get a lil space saving on this (although I can't say a %), but the real win is that your backups start sucking less (since backing up 1000 small dvs files sucks, wheras backing up 100 cab files is better). Collections is configurable, but usually incurs a performance hit on the server when it's crushing up dvs files (small), and when a user wants to pull something from a cab'd archive (also small).

So for the points on the home game, collections is really more for making your backups streamlined better (for those of you whom didn't follow a smart vault and partition layout and are now stuck with huge vaults), while Centera's are more hardware based (additional) compression and data continuance (especially if you replicate between centera sites).

Depending on your vault sizes (# of dvs files), your backup windows, the backup product you use, etc should tell you if you want to enable collections on an open partition. Enabling collections on a centera gets more complicated, especially if you've misconfigured the centera and have it in worm mode.

But that's enough lecturing for now.


Centera collections allow for two or more (up to 100) savesets to have a single CDF created for them. Attachments and the like are still stored as seperate blobs to allow for sharing etc

The advantage of centera collections is the throughput rate is increased compared to having to create and store a CDF for each saveset.

NTFS collections are used to aid in the speed of backups, but there is a large difference between NTFS, and Centera, collections, which are two different beasts, as is Centera with collections enabled compared to centera without collections enabled.

A Centera storage node has a limit to the number of objects it can store (I believe 25 M). Depending on the avg size of your items, if you don't use collections on a Centera you could actually run out of objects before you run out of disk space on a storage node. If the avg size of objects is > 34K then you will never run out of objects before space.