Video Screencast Help
Symantec to Separate Into Two Focused, Industry-Leading Technology Companies. Learn more.
Backup and Recovery Community Blog

A Tangled Web of De-Dupe We Weave

Created: 17 Jul 2009
Alex Vasquez's picture
0 0 Votes
Login to vote

It was the best of times and the worst of times... In data centers, the world around de-duplication is pretty much a "check box" item anymore.  That is, it's pretty much standard functionality within most data protection platforms available today... Or it's the next exit on a product's roadmap.  So it's generally agreed that de-dupe technology is a good thing with a strong usage base in the storage and data protection areas of any Enterprise.  Great, we agree.

The issue, though, is HOW do you deploy a de-dupe technology.  Rather, what method do you go with?  Do you go appliance based aka Data Domain or Quantum's DXI offering (I sure hope Symantec doesn't swat this post)?  Or do you go with something more based on software that is also "storage agnostic" such as PureDisk or EMCs Avamar product? 

That's what I'm asking myself now: What is the best solution for us?  As it stands we are experiencing a high level of growing pains with DataDomain.  The anwer, of course is more storage and with more storage you're really just buying more DataDomain units and we use the Gateway product.  So we're at this fork in the road of where we are looking to buy more DataDomain.  However, we're looking at other options aka PureDisk. Before coming to the company I'm at now, I successfully deployed PureDisk at my previous place of employement.  I was quite happy with it.

There are pros and cons to each.  What are they?
In my opinion here are pros and cons for a DataDomain solution versus PureDisk:


  • Out of the box functionality. Plug it in, make configuration changes and go!  Quick to deploy.
  • Simple management of each DataDomain unit, from the command line.
  • Nice intergration with Netbackup via OST.
  • Very strong Algorithm used to de-dupe data, industry leading, I'd say.


  • If you need more storage, you must commit to another DataDomain unit, which can be pretty costly.
  • That simple management I spoke of previously is really nice... IF you're only managing one DataDomain.  But each unit must be managed separately, so if you need to throttle back your throughput for your DDR units then you must do so on each DDR unit.  I realize there's a work around in Netbackup, but, still, that's pretty weak.  In other words, no central management of multiple DataDomain Restorers (DDRs)
  • Not ideal for Scaling with your environment.  See the first bullet point.


  • Great integration with Netbackup, no need for OST when using Puredisk.  Everything communicates well and is managed through Storage Life Cycle policies for the purpose of replicating the data.
  • Central point of management.  You may have one Storage Pool Authority (SPA) and five Content Routers (CRs) but you have only one place to go to manage your Puredisk environment.  I believe you can run up to 32 CRs off of a single SPA and each CR can manage up to 16TB of protected data in the next version of Puredisk, which as of this writing will be 6.6.
  • Ability to configure Puredisk jobs so that they can dedupe on the server (client) side or post-process as the data hits the media servers.
  • Nice scalability.  If you need more storage, you simply allocate more storage and mount the new storage to your Puredisk servers. Be sure you get your licensing.  I'm watching you. =)
  • Storage agnostic. Puredisk doesn't care if you're using Netapp, EMC or Hitachi or your Mama's WD MyBook. (Though, I would opine that MyBooks are unsupported and my inclusion of it on this list doesn't and shouldn't suggest otherwise.  IT's merely there for the comedy.)


  • Initial set up is not friendly.  It requires a lot of hands on configuration before you do anything with it.  Lots of install steps.
  • Costs.  Licensing is priced based on front end storage presented to Puredisk.  If you need more storage, then one may have to procure more storage from their favorite vendor, Netapp or EMC.  Server costs also work their way in here, should you need more Content Routers.
  • The web GUI.  If the Puredisk web GUI was a person and I saw it at the bar, I would not buy the GUI a drink. If we passed each other on the street, i would not look at the GUI.  That's how I feel about the web GUI in Puredisk 6.5 and lower.  It's not intuitive and it angers me as each minute action causes a page refresh.  Needless to say, it's not a RESTful application.  (It should be noted that eventually the web GUI is going away and should be gone by Netbackup 7.0 when there will be full intergration between Netbackup and Puredisk.)

In the end what factors would make you pick one solution over the other?  As I ponder the questions before me: Buy more DataDomain or look to a different DeDupe technology I have ran through my personal pros an cons for each and make my cases for both.

What about your environment? What makes sense for you?  Why? 
Post your comments and discuss.