Video Screencast Help

Enterprise Vault and Deduplication

Created: 27 Mar 2013 • Updated: 01 Apr 2013 | 2 comments
Alex Zn's picture
This issue has been solved. See solution.

Hi

Does Enterprise Vault perform deduplication on file level only or it can do block level deduplication?

Comments 2 CommentsJump to latest comment

Rob.Wilcox's picture

De-duplication is taken care of at a file level by OSIS (introduced with EV 8).  With OSIS if the appropriate sharing level is enabled, if you and I both receive the same PDF file (for example) it will only be archived (stored on disk) once.  The 'other copy' is simply a database reference.

SOLUTION
Arjun Shelke's picture

Enterprise Vault archives an item using single instance storage if both of the following conditions apply:

  • The target vault store has a sharing level of "Share within vault store" or "Share within group".

  • The current open partition is not hosted on an EMC Centera device.

Enterprise Vault archives an item for single instance storage as follows:

  • It identifies the parts of an item that are suitable for sharing, such as large message attachments. These parts are referred to as SIS parts. Enterprise Vault uses a minimum size threshold for SIS parts, to balance the likely storage savings against the resources that are required to create, archive, and retrieve them.

  • It generates a digital fingerprint to each SIS part. The fingerprint is a cryptographic, hash-based identifier that is determined by the contents of the SIS part.

  • For each SIS part, Enterprise Vault accesses the vault store group's fingerprint database to determine whether a SIS part with the same fingerprint is already stored within the vault store's sharing boundary. A SIS part with the same fingerprint indicates an identical SIS part.

    • If an identical SIS part is not already stored within the sharing boundary, Enterprise Vault stores the SIS part and saves the SIS part's fingerprint information in the fingerprint database.

    • If an identical SIS part is already stored within the sharing boundary, Enterprise Vault references the stored SIS part. It does not store the SIS part again.

  • It stores the remainder of the item (the item minus any SIS parts) as the residual saveset file. The residual saveset file holds Enterprise Vault metadata about the item and unique information about it, such as the file name if it is a document or attachment, and follow up flags if it is a message.

When Enterprise Vault receives a request to restore an archived item, it reconstitutes the item from the item's residual saveset file and SIS part files.

If an item's target vault store has a sharing level of "no sharing" or the target partition is hosted on an EMC Centera device, then Enterprise Vault does not use single instance storage. It archives the item with its Enterprise Vault metadata as a single saveset file.