StorONE Blog

Volume Level Erasure Coding to Avoid Storage Tradeoffs

We previously explored the tradeoffs that traditional storage snapshots require in terms of cost and performance, and how StorOne has written its snapshot algorithms to avoid forcing customers to choose between obtaining snapshots, or delivering on required levels of performance, and staying within the budget. In this blog, we will evaluate a similar problem that erasure coding presents, and how StorOne’s N+K approach enables customers to address this headache.

The Problem with Erasure Coding

Erasure coding increases data resiliency by breaking data (blocks, files or objects) into fragments, and then writing those fragments across multiple storage drives or nodes. Each fragment contains metadata about the other fragments, so that the original data can be rehydrated in the event of a drive or node failure, or in the event that a fragment becomes corrupted. For example, ten fragments may be written, but only eight may be required to rehydrate the data – meaning that two fragments can be lost without the original data also being lost. Erasure coding rebuilds are fast, because only the data itself (as opposed to the entire drive) is rebuilt, and also because multiple drives may work on the data rebuild at the same time.
The downside to erasure coding is that it all but halts storage input/output (I/O) operations and throughput, because it is heavily read and write intensive. In fact, it adds so much latency that most enterprises do not use erasure coding for primary storage, causing them to sacrifice on data resiliency. Most of the data fragments must be read for the data to be rehydrated in response to a read request.

StorOne N+K Erasure Coding

StorOne’s S1 Unified Enterprise Storage (UES) platform utilizes N+K erasure coding to facilitate data resiliency and fast storage drive rebuilds, without the CPU overhead and the resulting tradeoffs to storage performance of traditional erasure coding.
StorOne’s erasure coding technology first aggregates storage pools based on I/O and throughput performance as well as on capacity requirements. These pools are thin provisioned and pull capacity from multiple drives. Unlike a typical Redundant Array of Independent Disks (RAID) configuration, various capacity drives may be mixed and matched within the same pool.
The S1 UES platform then establishes virtual “N+K” arrays, which are assigned to the storage pools that have been created. “N” represents the number of physical storage drives over which the virtual array is spread out. Each array may be spread out over 1, 2, 4 or 8 drives. This ultimately determines read performance. Meanwhile, “K” reflects the number of simultaneous drive failures that may be tolerated, and impacts write performance. These arrays may be created per storage volume or logical unit, per file system, or per virtual machine, based on varying performance and data resiliency requirements. A larger number of drives (“N” value) increases throughput but also latency. As a result, the customer has flexibility to adjust the configuration according to the application’s throughput and latency requirements.
In addition to boosting performance, this approach enables greater capacity utilization compared to RAID configurations. Various capacity drives may be deployed within the same pool, and a standby “hot spare” drive does not need to be maintained. It also improves redundancy and avoids the need for tradeoffs to storage performance by avoiding the need to read and write to a singular drive. If a drive fails, data fragments may be rewritten to any drive that has spare capacity, which vastly boosts performance.

Conclusion

StorOne’s volume-based, N+K erasure coding technology minimizes the CPU cycles and memory that are required by every I/O operation, which frees up resources to be applied to serving the application itself. Furthermore, it dramatically accelerates drive rebuilds in the event of a failure, helping the business to get back online faster. The value to the customer is further increased because each storage drive’s capacity and memory may be more fully utilized.
Topics: Insidernewsdata storageErasure Coding

Request a Demo