Most storage systems create a write-cache using system RAM to accelerate performance. The write cache crutch enables these systems to improve performance. Like most crutches, however, it creates dependencies that put data at risk and complicate system design. The motivation for a write cache is simple. Most systems have poor performance when writing directly to storage media, be it hard disk or flash. The write cache crutches’ goal is to improve IOPS performance and hide inefficient media handling.
The Problems with The Write Caching
Write caching creates several problems for storage systems designers and concerns for customers. These problems are what make the write cache crutch. Write caches acknowledge a successful write before writing it to storage media. The problem is RAM has an unfortunate trait of losing data if it loses power. If the storage server crashes or loses power while writing data it can’t recover it.
Bandaids for The Write Cache Crutch
No vendor wants to leave a customer that exposed to data loss. To avoid data loss, vendors build elaborate redundancies that increase overall storage costs and management complexity. Vendors try to alleviate write cache weaknesses by mirroring it to another storage node. Under this design, a write-acknowledgment won’t occur until both nodes confirm they have received the data. A process that adds overhead and decreases some of the gains of the write cache crutch. Another challenge with this approach is maintaining cache-coherency. Vendors have to deal with “split-brain” to ensure they are using the right version of the cache during a failure. Managing through these challenges adds cost, complexity and, again, additional overhead to the system.
Some vendors use some form of non-volatile memory to get around the write cache crutch. The problem with using non-volatile memory is its expense. Most non-volatile memory solutions are very expensive and offer a very limited amount of capacity. A low capacity cache means a high chance of a cache miss. The storage system adds overhead managing the cache and searching through it, wasting much of the intended performance gain.
Solving The Write Cache Crutch Problem
The way to solve the problem is writing directly to storage media. Writing directly to storage media ensures zero data loss if there is a hardware or power outage. The challenge is writing directly to storage media while still maintaining performance. Writing directly to storage media in a way that doesn’t impact performance requires robust and efficient storage software.
StorONE’s S1 Enterprise Storage Platform delivers on these goals. Its DirectWrite capabilities enable writes directly to storage media in a way that doesn’t impact storage performance. DirectWrite works while important storage features like snapshots and media failure protection (vRAID) are active.
Eliminating HA Complexities of The Write Cache Crutch
DirectWrite also greatly simplifies the S1 Enterprise Storage Platform’s high availability implementation. With DirectWrite the S1 HA configuration doesn’t need to worry about cache coherency and split-brain. Every node in the cluster has direct access to the media and does not have to synchronize cache memory. Eliminating this overhead from S1’s design also improves overall performance by eliminating another complexity that other vendor’s need to manage.
Comparing Write Caching to DirectWrite
The write cache crutch should increase storage system performance. However, like everything else in storage system design, its inefficiencies lead to increased costs. These inefficiencies also lead to a very limited return on RAM memory investments. First, because of the expense of RAM and especially non-volatile RAM, the size of these caches are relatively small. This means the chances of them having the right data in the catch are low.
The write cache crutch also adds overhead as the software has to spend cycles managing the cache. Keeping the cache coherent in a high availability configuration also adds overhead. The net impact is the potential performance improvement by having a write cache is much smaller than one would expect.
DirectWrite, on the other hand, thanks to S1’s inherent efficiency, eliminates all the overhead of managing a write cache without sacrificing performance. The feature actually increases the efficiency of our S1 platform. It doesn’t have to manage a cache that wouldn’t show an effective hit rate. Even with the data assurance of DirectWrite, S1 is still able to deliver 80% to 90% of raw drive performance. This efficiency leads to the lowest $ per IOPS in the industry.
Conclusion
DirectWrite makes S1’s HA feature much easier to implement and manage. It also, most importantly, ensures a high level of data integrity. Storage purgatory is when the storage system acknowledges writes to the application but does not yet store it on permanent media. DirectWrite makes sure S1 doesn’t have a storage purgatory. S1 always writes data to permanent media prior to acknowledging the write to the application. If the storage software doesn’t guarantee high integrity all other advanced data protection capabilities are useless. This is why DirectWrite is the foundation of StorONE’s data protection suite.