Storage Informer
Storage Informer

How Much Is Too Much?

by on Jul.24, 2009, under Storage

How Much Is Too Much?

EMC logo How Much Is Too Much?

You can never be too thin or too rich. (Although I don’t think whoever came up with that had Nicole Richie or the Sultan of Brunei in mind.) And in the world of backup, you can never be too fast….

You can never be too thin or too rich. (Although I don’t think whoever came up with that had Nicole Richie or the Sultan of Brunei in mind.)

And in the world of backup, you can never be too fast. It is just not possible.

Rich Colbert and Daniel Budiansky have posts regarding the importance of speed to backup. You can read those here and here.

The premise of their approach seems to me two-fold: one, the DD880 is fast enough that it obsoletes post-process deduplication technologies; and two, that this speed is high enough that for the first time in-line deduplication will no longer be a bottleneck in the backup process for the vast majority of customers.

Now I don’t entirely disagree with these arguments. For a long time I have been saying of a DL4206 and DL4406 when asked of performance: “They are so fast that I can virtually guarantee that they will not be the bottleneck in your backup process.” At up to 2,200 MB/s, we can see that there are many opportunities for backups to degrade before the EDL becomes the problem: data must be read from disk, through a client, across a network, sometimes mediated by an database or application interface, to a backup server, meta-data must be processed, and then it must traverse another network before it is written to a target. There are so many obstacles here (including the presence of an OS, a filesystem, different disk architectures, etc.) that there is very little chance that the EDL will be a bottleneck. Certainly not for a single backup server or storage node or media server.

Those customers that do find it to be a bottleneck almost certainly have many media servers, or storage nodes. One customer that I can think of that is asking for 4,500 MB/s of sustained backup bandwidth has over 40 TSM servers that are driving this workflow.

With the introduction of the DD880, what we see is significant decrease in the number of backup organizations that will experience the backup target as a bottleneck. This stands in contrast to the DD690 and the DL3000 (both with a rough maximum speed of 400 MB/s for in-line deduplication); where many customers were forced to buy multiple systems in order to match their throughput requirements.

Or they considered tiering their backup infrastructure with a DL4000 with post-process deduplication.

But architecturally, those were really the only two choices: buy many slower systems to deduplicate in line, or buy fewer, tiered backup targets. (For arguments’ sake I think we could stipulate that the ratio would be in the order of 6-8 to 1 — if you wanted to be as fast as a tiered device, you would need 6-8 in-line systems.)

The DD880 changes this dynamic.

But it is important to note that it changes it quantitatively, rather than qualitatively. And what I mean by that is this: we have narrowed the scope for tiered devices. By increasing the speed of the in-line deduplication target, we have reduced the scope of the use case for tiered backup (the DL4000 with deduplication). The number of customers that will be interested in this approach will be smaller–because the performance DD880 is sufficiently fast, that for their requirements that they can do their backup to a single target (not the multiple systems that would previously have been required).

Another dimension to this discussion: how many backup targets are you willing to manage? If a DL4406 is twice as fast as a DD880, you need to ask this question (assuming you need 2,000 MB/s plus of backup bandwidth): do you value a single target for management, with delayed deduplication, or do you value in-line deduplication more, even if that entails multiple targets?

Let me be clear: I don’t think there is any one right answer here! I think that different organizations are likely going to weight priorities differently, and will have different answers to that question. The important thing here is that it is a question you will likely have to answer for yourself if you have the requirement for a very large amount of backup bandwidth: a single tiered target? or multiple in-line targets?

So, I don’t agree 100% with Rich when he writes: “speeds and feeds are no longer an inline versus post-process argument … speeds and feeds are no longer a dedupe versus non-dedupe argument either.” I think (because I am a sucker for precision) that it is the case that the DD880 has dramatically reduced the number of cases in which this decision needs to be made. The number of customers that need to think about an in-line device like the DD880, and a tiered device, like the DL4000, is much smaller than it was before the introduction of the DD880.

However, no technology stands still. The DL4000 line will continue to get bigger, faster, and better. For the time frame into which I think I have a useful amount of insight, there will continue to be a real gap (2-3x performance?) between straight VTL technologies, with tiered deduplication, like the DL4000, and pure in-line technologies, like the DD880. So for the foreseeable future, there will always be a certain number of organizations at the very high end of the market that will have to weigh these considerations: in-line versus tiered, and one versus many.

For the rest of the world, the potential complexity of your backup environment was just reduced: one device will now suffice. And for the first time in a long time, we can truthfully say to many customers that it is very likely that not only will a VTL not be a bottleneck to your backup environment, but that an inline deduplication appliance will not be a bottleneck either.

It is about time.

Update your feed preferences

URL: http://emcfeeds.emc.com/rsrc/link/_/how_much_is_too_much__345611683?f=84f8d580-01de-11de-22d1-00001a1a9134

:, , , , , , , , , , , , , ,

Leave a Reply

Powered by WP Hashcash

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...