Storage Informer
Storage Informer

Tag: Deduplication

How Much Is Too Much?

by admin on Jul.24, 2009, under Storage

How Much Is Too Much?

EMC logo How Much Is Too Much?

You can never be too thin or too rich. (Although I don’t think whoever came up with that had Nicole Richie or the Sultan of Brunei in mind.) And in the world of backup, you can never be too fast….

You can never be too thin or too rich. (Although I don’t think whoever came up with that had Nicole Richie or the Sultan of Brunei in mind.)

And in the world of backup, you can never be too fast. It is just not possible.

Rich Colbert and Daniel Budiansky have posts regarding the importance of speed to backup. You can read those here and here.

The premise of their approach seems to me two-fold: one, the DD880 is fast enough that it obsoletes post-process deduplication technologies; and two, that this speed is high enough that for the first time in-line deduplication will no longer be a bottleneck in the backup process for the vast majority of customers.

Now I don’t entirely disagree with these arguments. For a long time I have been saying of a DL4206 and DL4406 when asked of performance: “They are so fast that I can virtually guarantee that they will not be the bottleneck in your backup process.” At up to 2,200 MB/s, we can see that there are many opportunities for backups to degrade before the EDL becomes the problem: data must be read from disk, through a client, across a network, sometimes mediated by an database or application interface, to a backup server, meta-data must be processed, and then it must traverse another network before it is written to a target. There are so many obstacles here (including the presence of an OS, a filesystem, different disk architectures, etc.) that there is very little chance that the EDL will be a bottleneck. Certainly not for a single backup server or storage node or media server.

Those customers that do find it to be a bottleneck almost certainly have many media servers, or storage nodes. One customer that I can think of that is asking for 4,500 MB/s of sustained backup bandwidth has over 40 TSM servers that are driving this workflow.

With the introduction of the DD880, what we see is significant decrease in the number of backup organizations that will experience the backup target as a bottleneck. This stands in contrast to the DD690 and the DL3000 (both with a rough maximum speed of 400 MB/s for in-line deduplication); where many customers were forced to buy multiple systems in order to match their throughput requirements.

Or they considered tiering their backup infrastructure with a DL4000 with post-process deduplication.

But architecturally, those were really the only two choices: buy many slower systems to deduplicate in line, or buy fewer, tiered backup targets. (For arguments’ sake I think we could stipulate that the ratio would be in the order of 6-8 to 1 — if you wanted to be as fast as a tiered device, you would need 6-8 in-line systems.)

The DD880 changes this dynamic.

But it is important to note that it changes it quantitatively, rather than qualitatively. And what I mean by that is this: we have narrowed the scope for tiered devices. By increasing the speed of the in-line deduplication target, we have reduced the scope of the use case for tiered backup (the DL4000 with deduplication). The number of customers that will be interested in this approach will be smaller–because the performance DD880 is sufficiently fast, that for their requirements that they can do their backup to a single target (not the multiple systems that would previously have been required).

Another dimension to this discussion: how many backup targets are you willing to manage? If a DL4406 is twice as fast as a DD880, you need to ask this question (assuming you need 2,000 MB/s plus of backup bandwidth): do you value a single target for management, with delayed deduplication, or do you value in-line deduplication more, even if that entails multiple targets?

Let me be clear: I don’t think there is any one right answer here! I think that different organizations are likely going to weight priorities differently, and will have different answers to that question. The important thing here is that it is a question you will likely have to answer for yourself if you have the requirement for a very large amount of backup bandwidth: a single tiered target? or multiple in-line targets?

So, I don’t agree 100% with Rich when he writes: “speeds and feeds are no longer an inline versus post-process argument … speeds and feeds are no longer a dedupe versus non-dedupe argument either.” I think (because I am a sucker for precision) that it is the case that the DD880 has dramatically reduced the number of cases in which this decision needs to be made. The number of customers that need to think about an in-line device like the DD880, and a tiered device, like the DL4000, is much smaller than it was before the introduction of the DD880.

However, no technology stands still. The DL4000 line will continue to get bigger, faster, and better. For the time frame into which I think I have a useful amount of insight, there will continue to be a real gap (2-3x performance?) between straight VTL technologies, with tiered deduplication, like the DL4000, and pure in-line technologies, like the DD880. So for the foreseeable future, there will always be a certain number of organizations at the very high end of the market that will have to weigh these considerations: in-line versus tiered, and one versus many.

For the rest of the world, the potential complexity of your backup environment was just reduced: one device will now suffice. And for the first time in a long time, we can truthfully say to many customers that it is very likely that not only will a VTL not be a bottleneck to your backup environment, but that an inline deduplication appliance will not be a bottleneck either.

It is about time.

Update your feed preferences

URL: http://emcfeeds.emc.com/rsrc/link/_/how_much_is_too_much__345611683?f=84f8d580-01de-11de-22d1-00001a1a9134

Leave a Comment :, , , , , , , , , , , , , , more...

PHD Virtual Extends esXpress Support To VMware vSphere 4

by admin on Jul.23, 2009, under Storage

PHD Virtual Extends esXpress Support To VMware vSphere 4

————————————————————————————————————-

—————————————————————————————————————

PHD Virtual Technologies, provider of the esXpress data protection and recovery solution for virtual machines, today announced that esXpress has been extended to support VMware vSphere 4.

This new release of esXpress version 3.6 also includes significant enhancements for all versions of VMware’s ESX platform version 3.0.2 and above. An optimized deduplication engine dramatically increases backup speeds and fuels performance for file-level restores, as well as VMDK restores and data archival via a Windows Share.

esXpress, with new support for vSphere 4, performs backup and recovery using the virtual environment itself. By creating virtual backup appliances (VBAs) – small virtual machines – the solution can be deployed in minutes on VMware servers, and provides the most scalable environment for backing up virtual machines. New performance enhancements include:

  • Improved file level restore speeds are now up to four times faster
  • Data Restoration and Archival via Windows’ Shares are now up to four times faster
  • Improved PHDD deduplication image-level restore speeds up to twice as fast
  • Accelerated deduplication engine provides initial backups that are seeded at double the previous rates

esXpress continues to support up to 16 concurrent backup/restore streams per host and all backups can be self-restored without using esXpress or other proprietary virtual machine infrastructure. esXpress’ block level backups are de-duplicated source side, ensuring data is compressed and deduped before it every leaves the host. This ensures that network traffic is kept to a minimum even while backing up over a WAN link.

URL: http://feedproxy.google.com/~r/Virtualizationdotcom/~3/nldcfLnLt3k/

Leave a Comment :, , , , , , , , , , , , , , , more...

Cloud Backup?

by admin on Jul.23, 2009, under Storage

New Poll: Cloud Backup?

EMC logo New Poll: Cloud Backup?

I have added a poll in the right-hand column: would you back up to the cloud? The assumption here is: the cloud in this case is a public cloud or a service provider cloud–not your own private cloud. I haven’t…

I have added a poll in the right-hand column: would you back up to the cloud? The assumption here is: the cloud in this case is a public cloud or a service provider cloud–not your own private cloud. I haven’t qualified this further with any questions about the size of organization you work for, what level of encryption or security you might want in order to be able to say “yes” or anything else of that nature. Basically, the question is: would you be willing to treat backup software as a service, and use somebody else’s infrastructure at a different site to do your backup?

For anybody that wants to add some explanation to their answer, please feel free to comment to this post.

I am going to leave the other poll (deduplication ratios) up, as I am curious to see how the answer changes over time, or if it does. As of this post, 240 respondents have answered. About 2/3 of the respondents are from the US, and the results are: 22% say 5:1, 32% say 10:1, 29% say 20:1 and 18% say 50:1 . I will check the results again in 6 months or so.

Update your feed preferences

URL: http://emcfeeds.emc.com/rsrc/link/_/new_poll_cloud_backup__921090660?f=84f8d580-01de-11de-22d1-00001a1a9134

Leave a Comment :, , , , , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...