Demystifying Data Deduplication
Everyone knows that enterprises today are awash in data, which has created a staggering storage growth problem. That, in turn, has spawned some serious backup and recovery challenges. So far, efforts to tackle these challenges have fallen short.
This Tech Brief looks at how Symantec’s approach to data duplication enables organizations to reduce backup storage while providing rapid recovery in the event of a disaster.
Shrinking the footprint
Although deduplication technology has existed for some time now, many organizations have yet to fully realize the operational and storage efficiencies to be gained through it.
Deduplication is a method of retaining only one unique instance of backup data on storage media. Redundant data is replaced with a pointer to the unique data copy. Deduplication occurs on both a file level and a file segment level. When two or more files are identical, deduplication stores only one copy of the file. When two or more files share identical content, deduplication breaks the files into segments and stores only one copy of each unique file segment.
Here’s why data deduplication matters: Say 500 people in your company receive an email with a 1MB attachment. If each recipient saves that attachment locally, it’s replicated 500 times on desktops around the network. During backup, a system without data deduplication would store the data in that one attachment 500 times—consuming 499 MB more backup space than necessary.
In contrast, data deduplication backs up just one instance of the attachment’s data and replaces the other 499 instances with pointers back to that copy.
Symantec’s approach to deduplication
Symantec’s deduplication strategy is based on the idea that deduplication needs to be everywhere. That’s why Symantec has built deduplication technology into its information management platforms: NetBackup, Backup Exec, and Enterprise Vault.
Deduplication everywhere lets you choose at which point in the backup process to perform deduplication:
- Client deduplication. Data is deduplicated at the client/source before being sent across the network.
- Media server deduplication. Data is deduplicated at the media server/target before being sent to disk or tape.
- Integration with deduplication appliances. NetBackup integrates with deduplication appliances via the OpenStorage API.
Deduplication everywhere provides significant return on investment. Symantec has found that organizations can reduce storage consumption by up 80% with deduplication across physical and virtual backups while still providing rapid recovery of applications in the event of a disaster. Additional benefits include:
- Optimized search/discovery
- Reduced bandwidth
- Faster backup
- Faster recovery
Data deduplication changes the economics of backing up to disk by reducing the amount of data retained on disk. The Enterprise Strategy Group1 has confirmed that deduplication can be used to reduce disk capacity by a factor of 26:1 for a daily full backup policy retained for 30 days with a 1% daily change rate.
Symantec provides deduplication options that let you deduplicate data everywhere, as close to the source of data as you require. For today’s information-driven enterprises, that means being able to protect completely and recover anywhere.
For more information, see Symantec Is Deduplication.
1 Lab Validation Report: NetBackup from Symantec, Enterprise Strategy Group, June 2008