Storage Informer
Storage Informer

Deduplication Technologies

by on Jun.03, 2009, under Storage

Putting The Pieces Together: Deduplication Technologies

EMC logo Putting The Pieces Together: Deduplication Technologies

Disclaimer: everything that follows is my opinion. In no way should it be mistaken for any sort of official EMC position. Any resemblance to that is purely coincidental. Amidst the chaos and confusion of the EMC offer to buy Data…

Disclaimer: everything that follows is my opinion. In no way should it be mistaken for any sort of official EMC position. Any resemblance to that is purely coincidental.

Amidst the chaos and confusion of the EMC offer to buy Data Domain, I think the biggest unanswered question has been: why would EMC want to have three or four or five different deduplication technologies?

Truthfully, I think the question has a profoundly simple answer: because backup sucks.

Mmmm, irony.

Don’t believe me? Show of hands: how many people actually like their existing backup application?

Given the name of this blog, and that I have spent the last 15 years plus on backup and recovery, I think I can appreciate the irony as well as any.

Even more ironic when we remember that Data Domain’s original slogan was: tape sucks.

And by the way, there is a back handed answer in there as to why it makes sense for Data Domain to be acquired by somebody. If you claim that tape sucks, and you try to “fix” this with another piece of hardware, all you are really doing is building a better tape drive. You are trying to be StorageTek, only better. Where better means faster and cheaper. But does a faster and cheaper tape drive make backup suck less? Maybe. But probably not nearly as much as it could–if you had a broader perspective and reach with your technology roadmap, one that includes CDP, primary storage, a backup application, and some virtualization capability. More on what I would do with all that later. But one final question: if your objective is to fix backup, completely, and you think that you need access to all those components to do that, who is going to be in a better position to do this? EMC? Or NetApp?

Having said that, the biggest obstacle to fixing backup is not technology. It is inertia. It is cultural. It is fear of change. It is ingrained process. It is the fact that we have done things one way for so long that the reason we are going things has been forgotten.

(Another aside: if you want to fix backup, and I mean really fix it, then the first thing you should ask is: what am I trying to achieve? When, where, and why do I need backup images of my data?)

An example. Many customers with defined practices say they need tape off-site. My belief is that a long time ago, the only way to get a copy of your backup data off site safely, securely, and reliably was to put it on tape. However, it is easier to say “I need a tape off site” than it is to say “I need a secure, safe, reliable image of my data off site.” Unfortunately, the words became the practice, and it is frequently the case now that even though deduplication can safely, securely, reliably (and cheaply) get a image off site, it is not on tape, so it is not good enough.

My conclusion is this: as long as the primary barriers to fixing backup are NOT technological, customers will require data deduplication at multiple places. Primary storage. Backup source. Backup target. Replication. And some backup is still best done to disk/virtual tape without deduplication. There is no one size fits all.

And even if you remove the cultural and procedural barriers to change, you still need access to all those technologies to fix backup.

You still need primary storage deduplication. At EMC that is provided natively on the Celerra platform.

You still want source deduplication for (some) backup. This is Avamar. And despite the contentions of virtually all commentaries on the value of a target deduplication technology acquisition for EMC, there is a very significant set of use cases for source deduplication. It is a uniquely powerful and useful technology that will continue to have a role for the short, medium and long term. No target deduplication solution will ever be able to make the same powerful value statement that a source deduplication solution does so long as there is anything remotely resembling a traditional backup application in the mix. (I feel the need to qualify this in case somebody realizes just how good a thing EDM was–but that is also a different story!) Only source deduplication offers massive bandwidth savings, massive reductions in time to complete backup jobs, and the ability to increase the density of server consolidation. The more clouds you see, the more virtualization becomes the prevalent deployment model for servers, the more source deduplication makes sense.

You still want target deduplication. For those people that can’t or won’t change their backup application. And for those folks that don’t meet the use case of source deduplication (their data set is too big, for example).

And you still want backup without deduplication (or with post-process deduplication). Again, despite the protestations of the few, the hard reality of the fact is that there is NO single, general purpose deduplication device that can scale to meet the needs of the enterprise. Nothing that can meet the needs of the very large backup job that must complete within a defined backup window (where the current standard of 1.5 TB/hr/dedup appliance is off by an order of magnitude or more). In EMC terms, this need is met by the DL4x06 line.

So why do we have four different deduplication technologies? Because we need them. And we need them because customers ask for them. And because there is no other good alternative right now.

As we go forward, and the existing processes, procedures, and technology in legacy backup becomes more obviously broken, all of these pieces will also be required.

The difference is that those vendors that have them all as part of their portfolio will have an extraordinarily powerful way of fixing backup. And more than just backup: data protection more generally (as well as primary storage). Bringing it all together. Unifying the process, procedure, software, and infrastructure in a way that can fundamentally fix things. A fundamental fix that no single point solution can provide.

What if I could radically simplify my software? What if I could deduplicate at the source or the target transparently? What if a single device could be the repository for CDP, source, and target replication? Lots of what ifs there, but at the root is the notion that having some level of ownership in each element is an advantage to the delivery of the final vision.

Of course it also acknowledges that data deduplication will be a very important core capability across storage infrastructure.

And finally, it acknowledges that each of the pieces will have a very important role going into the future.

Update your feed preferences


:, , , , , , , , , , , , , , , , , , , ,

Leave a Reply

Powered by WP Hashcash

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...