Better storage software can provide features that deliver superior capacity efficiencies, which offer an alternative to deduplication. In this series we will examine how better storage software, can deliver these efficiencies without negatively impacting performance, raising costs, or placing data at risk. This first article will look at an unexpected source, RAID, as an alternative to deduplication. In part two, we’ll cover how capabilities like high per drive performance, next-generation snapshot technology, and advanced tiering technology can further improve efficiencies. Finally, we’ll conclude with deduplication’s future to see if it has a role in storage infrastructures over the next two years.
Understanding the Total Cost of Dedupe
The primary goal of using deduplication on primary storage systems is to make advanced storage technologies like flash SSDs more affordable. A 3:1 efficiency rate enables 100TBs of storage to look like 300TBs of storage. The problem is that delivering a 3:1 effective rate requires high-end CPUs and more RAM. This way the algorithm can work without noticeably impacting performance. The capacity savings has to be enough to offset the costs of the additional hardware. When flash storage was $14 per GB, justifying the cost of additional hardware resources was easy. Today, now that flash is $.30 per GB, it is almost impossible. To learn about the total cost of deduplication, read our white paper, “Exposing the High Cost of Deduplication.”
Fast Rebuild Speed is Better than Dedupe
Most primary storage systems on the market today use traditional RAID algorithms to protect against media failure. An efficient RAID algorithm takes away much of the capacity gains that deduplication claims to deliver. The number one reason for this is slow rebuild times, even on AFAs. While very few vendors report their rebuild times, customers repeatedly tell us that the time it takes to return to a protected state after media failure is measured in multiple hours. With high-capacity flash drives, we’ve seen reports of competitors’ systems taking ten or more hours. Our customers measure StorONE’s vRAID rebuild times in single digit minutes. vRAID is an ideal alternative to deduplication because it saves real capacity instead of “mathematical” capacity.
What does rebuild speed have to do with improving capacity efficiency? It has more to do with the impact of slow rebuilds on capacity efficiency. If you know you are going to be facing double digit hours in the rebuild process, then you also know your chances of having another drive fail during that process increases dramatically. Also, TLC Flash and especially QLC flash are vulnerable to continuous write IO, which is a big part of a rebuild. These double-digit hours rebuild times mean you are vulnerable to a second failure in your working set of drives, and your newly deployed hot spare is at particular risk. Additionally, of course, another drive failure means total data loss and performing a recovery from backup copies.
To overcome a double drive failure risk, most IT planners will utilize a double parity or even triple parity RAID technique. Because of increases in drive densities, you are now dedicating 30-45TBs (assuming 16TB flash drives) of capacity per RAID group just to provide redundancy.
StorONE’s vRAID rebuilds volumes, made up of flash drives, in less than five minutes while other production IO operations continue. Our customers rarely use more than one drive parity in their flash-based volumes. They know they will be back into a protected state in less than five minutes. Additionally, because we leverage a high-performance form of erasure coding, most of our customers only see a 15% to 20% overhead to redundancy.
No Hot Spares are Better Than Dedupe
RAID, to protect you from data loss, needs drive replacements to even start the rebuild process. Enterprise storage systems help you make certain reserves are ready by the use of global hot spares. Most data centers allocate two hot spares per media type and size. For example, suppose a system has 16TB flash drives and 8TB flash drives. In that case, the customer will allocate two drives for each media size. This is because most traditional RAID won’t allow the mixing of various drive capacities within a given group. This example means the typical customer is dedicating 48TBs of capacity just for hot spares.
At StorONE, our vRAID does not require hot spares, and it can mix media capacities within volumes. If a drive fails, we simply rebalance the data on the failed drive across the remaining drives. The elimination of hot spares enables us to simultaneously rebuild and use all of the available drive’s capacity. These capabilities also provide easy expandability in the future. With StorONE, add the highest density and most cost-effective flash drives on the market and enjoy their full capacity.
Conclusion
Deduplication is no longer a must have, in fact, it may be a “better of without it” feature. Better RAID can give back hundreds of TBs of capacity in areas where deduplication can’t. vRAID is just one example of how better storage software can provide an alternative to deduplication and improve capacity efficiency. In part two of this series, we cover how technology like high performance per drive, advanced snapshots, and intelligent auto-tiering can fulfill all of deduplication’s promises without the performance and data integrity risks.