Basic Centera Collections Question

Created: 29 Jun 2009 • Updated: 26 Nov 2012 | 17 comments
I have a very basic question and someone can hopefully re-assure my mind.

We have deployed FSA and archived all the DOCs from one file server.  The Vault usage page shows that we have 10952 items in about 11GB.

The Centera shows us 11GB but only 3924 file and 550 clips.

We use collections and I understand this rolls up files into less clips which explains the number of clips perfectly.  But why so few files?  Surely not dedupe as I wouldn't have thought there would be quite so much duplication on this server?

I have checked all logs and there are no errors reported so I'm fairly sure all is well, but hope someone can put my mind at rest please and explain the number of files shown on Centera?

 Centera does a single instance at the file level. This is why the number of files is different from what you were expecting

Check your collections folder to ensure that you only have directories in there and no files. As long as the files are gone from the collections then they have been collected and placed on the Centera

There aren't any files in the collection folder.  So you are saying this is down to deduplication?

On average we see a 16% difference between whats in our Centera and what EV reports as being in the archives 

That worries me a little more then because ours is approx 60% in just 10000 files.  Would you happen to know what determines the duplicates?

I'm going to configure FSA reporting today too..  does that provide any more helpful information?

 Bear in mund i'm doing my storage on 30TB of mixed files for many divisions including engineeing, finance, HR and normal office admins. In my environment I would not expect much Single Instance seeing as it is a highly managed environment

 Have you run the usage report to see what EV see's as archived?

 The duplicates on a Centera are devised by an algarythim on the centera. When a file is stored on Centera the systems creates an imprint of the item. If any other item is stored and matches the imprint it then does not save the file but returns the first imprint details back to EV In EV this is stored as the SavesetID

Are you using EV8? If yes then most of the SIS will happen on the EV8 side because of the Fingerprint database. This is new to EV8 and provides a high level of SIS. If you are using EV8 then the FSA reporting will provide alot of info on the space savings. If you are not using EV8 then the reporting tool may not provide the info you require because the SIS is happening on the Centera and not on EV and the report only reports on what EV does not what is Single Instanced on the Centera

Thanks for the info.  The figure I quoted earlier was from the usage report.  I've just totalled up the TotalCount column in the Collections table in the database and reach the magic number.  Should this pretty much confirm that all is well?

Are you running v8.  Just curious as to whether your 16% was based on v7 or v8 or a bit of both.

 I'm still on EV7.5 SP3. Cant upgrade due to some other issues I'm working with Symantec at the moment but i do plan on moving there once my issues are ironed out

I've been looking through the database tables and looking at the archiving reports for the files in question and I cannot come to any conclusion how the number of files on Centera is so low.  The database tables all look how I expect including the number of clips and number of savesets.  The analysis I've done on the reports suggests to me that the level of duplication on this share wouldn't result in anything like a 60% descrease in the number of files so I really don't know what the Centera does under the hood to calculate this number of files.  The fact that Centera is reporting the 11GB that EV is showing exactly is the only re-assurance I can find that all is well.  I'm going to have to trust that it is just doing something I don't fully understand..

Centera stores files in C-Clips. Each C-Clip represents any number of files up to 10MB so many files end us being stored as one on the Centera.

If you have a file larger than 10MB the centera splits the file into many parts but if most of your files are less than 10MB then chances are they are stored in many c-clips

This would explain the difference

Centera does not see files. It only sees BLOBS (Binary Large OBjects) As far as Centera is concerned it has a pile of Binary objects stored inside it's C-Clips it does not count files 

Yeah, I know about clips, it's the files I'm uncertain about..  There are only 550 of them so far.  This fits exactly with what is in in the database.  For review the stats are.

Actual Data (DOC & RTF files only):  10952 files equaling 14.61GB
Enterprise Vault Reports:   10952 files equaling 11.3GB (Compressed figure I guess but expected more of a reduction)
Centera Reports:  3924 files in 550 clips equaling 11GB (Snippet below from show pool detail command)

Pool Quota: 2,048 GB
Used Pool Capacity: 11 GB
Free Pool Capacity: 2,037 GB
Number of C-Clips: 550
Number of Files: 3924
Number of scheduled tasks: 0

 The only explination I have is that EV8 is doing the SIS at a lower level that Centera and so the actual stored items is far less because of this

Seeing as I'm not using EV8 yet I cant give you a way to identify this other than FSA reporting

 If you are worried I recomend opening a Sev 3 call with Symantec and have them put your mind to rest. I know that in my environment the data stored on Centera has never failed me. It has always returned every item i have expected it to without fail.

I forgot to mention this

EVCenterachecker will give you an item by item check between whats in EV and whats on the Centera.

It does take some time to run, last time i ran it on one of my larger vaultstores it took 30 days. This will set your mind at rest.

Hi there

