Video Screencast Help
Scheduled Maintenance: Symantec Connect is scheduled to be down Saturday, April 19 from 10am to 2pm Pacific Standard Time (GMT: 5pm to 9pm) for server migration and upgrades.
Please accept our apologies in advance for any inconvenience this might cause.

Deduplication everywhere—not data everywhere

Created: 28 Aug 2012 • Updated: 25 Jun 2013 • 2 comments
Neal Watkins's picture
0 0 Votes
Login to vote

Keyboard finger trouble again! I just sent around a 10MB slide presentation to 25 people on cc when I meant to only send it to one person. That’s another 250MB added to data storage somewhere down the line. Still, it’s a drop in the ocean compared with the amount of data growth worldwide. Pick up any analyst report, magazine, or survey and even the most conservative estimates put the rate of growth at around 60 percent per annum; and it is even higher for companies with data-intensive applications, in high-growth markets, or with widely distributed data centres or staff.

As I travel across EMEA talking to customers—right the way from top 20 global organisations down to SMBs—more and more of them think they have found the ‘holy grail’ to controlling this data growth. They think they’ve finally found a mechanism to stop buying storage for all that data emanating from their physical and virtual platforms, mobile, and Cloud. It’s called deduplication.

“Deduplication puts our data on a diet,” they argue. “Drop in some deduplication appliances and we’re looking at around 90 percent reduction in the volume of data being stored. With budgets under pressure like never before, it’s a no brainer.”

If only it was all this simple. The fact is that most of today’s deduplication appliances have limited impact since they only address the end of the information management lifecycle. Deduplication has traditionally been delivered as an appliance that sits behind the intelligence of the backup application and alleviates backup storage issues. Their principal focus is to reduce storage needs by eliminating redundant data through special intelligent compression techniques and applying those techniques solely at the target.  The downside is your backup infrastructure needs to be saturated with data to ensure you can deduplicate at the final storage deduplication target.

The real answer to deduplication isn’t just target deduplication: it’s deduplication everywhere. It’s our mantra at Symantec. Deduplication everywhere lets you choose at which point in the backup process to perform deduplication. It might be client deduplication, where data is deduplicated at the client/source before being sent across the network. Or it might be media server deduplication, where the data is deduplicated at the media server/target before being saved to disk or tape.

End-to-end deduplication really can mean an end to those data growth woes. By only backing it up once, you can stop buying storage, recover information faster and improve your return on virtualisation. The latest integrated backup deduplication appliances, for example, can help build out a comprehensive backup solution. They offer enter­prise backup in a box and can act as a standalone backup server or as part a lager chain of media servers.

Before choosing a deduplication solution, think through what kind of workload you are backing up, what kind of backup and recovery SLAs you need to meet, and what kind of flexibility are you getting from the solution you are considering. Don’t think of deduplication in terms of ‘one size fits all’, but choose the right integrated solution and you’ll end up with deduplication everywhere—not data everywhere.

Comments 2 CommentsJump to latest comment

StorageWonk's picture

Neil –

You raise useful points here, but if you do a follow-up to this blog you might want to point out that there are fundamental differences in de-duplication technology that make some implementations wholly inappropriate for certain use cases (most obvious of these is the issue of in-line versus postprocessing, but there certainly are others).

A more basic issue however – understandable because of your interest in de-dupe as an appliance – is the blog's inherent assumption that de-dupe (and eventually, rehydration) is exclusively a part of the backup and recovery process. I'm sure you appreciate that this is not necessarily the case, and that de-dupe at the source provides plenty of value in itself irrespective of the backup process. There are of course trade-offs with this as with just about every other technology decision, and your mileage will vary.

If you plan on following up this blog, it would be interesting to understand Symantec's position regarding special requirements for Exchange stores, and the relative value of de-duping versus single-instancing there in backup and archiving environments.

Cheers.

Mike

+1
Login to vote
Neal Watkins's picture

Mike,

Thanks for following the blog and your additional comments.  I completely agree with your points around use cases.  I typically think about workloads in terms of applications, databases and data types whether structured or unstructured.  When I refer to deduplication at the source we are talking about dedup built into the backup and recovery client allowing the processing and matching to be efficiently managed on the front end.  Most client systems today are not resource constraint in terms of memory and CPU.  One of the greatest constraints is the network infrastructure which is required to move all the data to a target based deduplication appliance.  This is why we speak about deduplication everywhere covering the client/source and the target giving the best possible approach.

Since Dedup is built into the backup and restore process any complexity associated with the rehydration is masked from the backup administrator since it is seamlessly built into the overall process.

In terms of Exchange stores application centric single instancing does provide great benefit,  but is application centric and applies to a data repository rather than distributed client source data and appliances holding unstructured data.  Great benefit from both technology, which takes us back to the workload discussion.

Thanks for following the blog!

Best regards,

Neal

Neal Watkins

+5
Login to vote