Deduplication Everywhere is a Beautiful Thing
Having personally witnessed the evolution of data deduplication technologies over the years, I can say that deduplication has come a long way both in terms of maturity of vendor offerings, and end user sophistication. Until about two years back, I used to start my discussions with customers by explaining what is deduplication, what it can do for them, how the deduplication algorithm works etc. Fast forward to 2011, and I rarely have to touch upon these topics. Most users now have a fairly good understanding of the basics of deduplication. What I still find missing, however, is a deeper understanding of how the power of deduplication can be leveraged in multiple places to solve data protection problems. As an example, many users still swear by virtual tape libraries or target based deduplication approaches for implementing deduplication. Some users have been sold on the concept of deduplicating data at the source using backup clients. Both of these methods work well in their respective use cases, however, backup environments today are hardly homogenous to be covered by a single deduplication approach. There are different data types as well as backup and recovery SLAs that the IT managers need to be mindful of.
A few years back, the NetBackup team coined the term “Deduplication Everywhere”, which seemed odd to many users at that time. Many wondered, what exactly is deduplication everywhere? The answer is really very simple. Deduplication everywhere is the ultimate empowerment of the backup admin to choose where they want to deduplicate data in their backup stack, regardless of the form factor.
While many vendors continue to struggle with providing a truly flexible deduplication solution, Symantec has lived up to the promise of providing deduplication everywhere via its information management portfolio.
A good deduplication strategy involves deduplicating data both in archive and backup stores. Symantec’s Enterprise Vault product allows users to eliminate duplicate data on the primary storage which can then be sent to low cost storage for longer term retention. Symantec’s message is simple—backup is for recovery, archiving is for discovery. If you don’t need the data for immediate recovery, it does not belong in your backup store.
Once you have removed data from your primary storage, you need to think about your options for deduplicating data with backups—this is the data you truly need to protect for disaster recovery purposes.
A good portion of the data in most environments consists of file and folder data. This data lends itself really well to deduplication at the source. The backup client eliminates the duplicate data at the source, which helps reduce not just the storage footprint, but also helps cut down on bandwidth utilization, and makes the backups go lightning fast. Clearly, the backup environments also consist of many other data types such as database and applications which may not lend themselves well to deduplication at the source. There could be several reasons why this may be the case—the data change rate could be high, it may not be possible to install a backup client on the machine etc. Deduplication at the target has historically been a very attractive option for these types of data types. For a long time, NetBackup has tightly integrated with the several leading storage vendors via the OpenStorage (OST)program. With the release of NetBackup 7 last year, the NetBackup platform also started offering an intermediate location to deduplicate data which can help reduce utilization of infrastructure resources—this is referred to as Media Server deduplication. The data that cannot be deduplicated at the source can be moved to the media server layer where it gets deduplicated before being stored on disk.
The beauty of it all is that all of these options are available in one single integrated solution which is the NetBackup platform.
As a parting thought, I’ll just leave you with one suggestion. Before making a buying decision on a deduplication solution, think through what kind of data you are backing up, what kind of backup and recovery SLAs you need to meet, and what kind of flexibility are you getting from the solution you are considering. Deduplication is certainly not one size that fits all, but having the ability to deduplicate data wherever you want via one single solution is a beautiful thing.