Deduplication almost single-handedly created the all-flash array (AFA) market, but does dedupe have a future? So far, in this series, we’ve made the case that you can achieve similar capacity efficiency through better storage software. Here, we’ll look at other market influences that may cause deduplication to fall out of favor entirely. To get the entire picture, please join us for our virtual whiteboard session, “Better Storage Beats Dedupe,” now available on-demand. We discuss better ways to achieve storage capacity efficiencies that are effective across all workloads and do have a future.
Does Dedupe Have a Future with 100TB Drives?
Drive density, for both flash and hard disk technologies, is increasing rapidly. 30TB flash drives are already on the market, and Micron expects to deliver 100TB SSDs next year. So, why is this a problem for deduplication? The deduplication algorithm’s overhead forces most vendors to require at least 24 drives to get good performance. That means they will be selling a system that is 2.4PBs to start. Then, assuming that the claims of 3:1 efficiency are accurate, that means 7.2PBs of total capacity—which is much more than most data centers need.
The cost per GB of these drives will be only slightly less than the cost per GB of 16TB drives. With high-density drives, the customer saves the most money if using fewer drives reduces physical data space consumption and power costs. Requiring a minimum of 24 drives won’t allow the customer to realize those savings. In fact, given that most of these next generation drives will deliver more than 200,000 IOPS per drive…only getting 300,000 IOPS or so out of the entire system means vendors will force customers to waste millions of potential IOPS. AFAs that count on deduplication face this challenge today with 16TB and 30TB drives; a 100TB drive exacerbates the situation.
Within the next two years, a single storage system with 12 100TB flash drives, driven by efficient storage software, may deliver over a petabyte of capacity and more than one million IOPS; all in a 2U form factor. However, legacy software is the roadblock to the future, and deduplication hangs around its neck like an albatross.
Does Dedupe Have a Future with Optane?
Intel Optane storage, at first glance, seems perfect for deduplication because of its premium cost. The problem is that the performance of Intel Optane will expose the latency of the deduplication engine even more so than flash does today. It is reasonable to assume that within the next year we will see a new version of Optane that is twice as fast as today’s version, making the situation even worse. Customers that are choosing Optane now are customers that are concerned about performance, with a particular emphasis on low latency. StorONE has customers using our Optane-powered All-Flash Array.next to deliver .2 milliseconds of latency to their applications, even while synchronous mirror functions are active. However, there is no way to achieve these levels of low latency with deduplication active.
Does Dedupe Have a Future with Encryption?
Deduplication and encryption have always been at odds. Vendors that provide both have to be careful when encryption occurs and when deduplication occurs. The simultaneous processes and their management put a greater burden on the storage system, lowering efficiency even further.
A significant challenge for vendors that have built their whole company around the concept of deduplication, as evidenced by the inability to turn it off, deals with application-side encryption. Most databases can encrypt before sending data to the storage system, and even VMware is adding similar functionality. VMware encrypts data as I/O comes out of the virtual disk controller in the VM. A module immediately encrypts it in the kernel before sending it to the kernel storage layer, making it almost impossible for the storage system to deduplicate.
Most deduplication and encryption processes work by ensuring that the same storage software performs both functions. An application or operating environment does the encryption before writing data to the storage system. If the application encrypts data before writing it to the storage system, similar data will appear different to the deduplication engine. Customers with storage systems that leverage deduplication may find their efficiency rate drop to almost zero or be forced to turn off application-side encryption and let the storage system perform the function.
Do Storage Systems With Dedupe Have a Future?
Dedupe is a technology whose time has come and gone. It made sense when the available processing power was more capable than the performance of storage media. Now storage media is one of the fastest elements in the data center. These technology advancements combined with the rapidly lowering cost of storage media means deduplication creates more problems than the capacity it claims to save.