This white paper describes the deduplication feature that was introduced in version 9.0 of Compliance Accelerator and Discovery Accelerator. This feature provides the following methods for identifying similar items and duplicate items, before they are accepted into the review set:
- Metadata analysis, which provides a quick way to identify similar items by assessing their metadata properties only (author display name, number of attachments, and so on). This method is available in both Compliance Accelerator and Discovery Accelerator.
- Content and metadata analysis, which provides a more robust way to identify duplicate items by assessing both their metadata and their full HTML content. This method is available only in Discovery Accelerator cases where you have enabled analytics.
The white paper describes these two methods in detail and explains the advantages and disadvantages of using them. It also provides some examples to illustrate the differences between the two types of analysis.