In the data protection world, a number we frequently see and hear are deduplication rates. We hear of dedupe rates ranging from 50:1, 20:1, to 10:1. Recently, I heard someone say that 50:1 is 5 times better than 10:1. Their fuzzy math made me cringe, and I knew it was time to address this.
To clarify deduplication rates, we need to examine: 1) the factors that influence deduplication rates and 2) the math.
Deduplication Factors
Deduplication rates are like automobile miles per gallon (mpg): Your Results Will Vary. The factors that affect deduplication results are:
- Types of data (unstructured versus structured data)
- Change rate of data (what percent of data changes)
- Frequency and type of backup (how often are you backing up the data? (i.e. daily, weekly, fulls or incremental)
- Retention (how long are you keeping the dedupe data)
...