Flash Storage Jargon Explained – Part 1 – The Write Cliff

The Write Cliff

Having spent almost 2 years working in the SSD/Flash Storage market, I have had a number of people ask the same questions over and over regarding the technologies involved and some FUD that floats around the market from time to time.  Over the course of this series (I have no idea how long it’ll be) I’ll aim to explain what the terms mean, whether they are myth or reality and how the current Flash Vendors avoid those that actually exist.

The first term to approach is The Write Cliff.  This is the phenomenon where the performance of an SSD drive drops dramatically, and suddenly over time, as illustrated by the representative chart below.

Image

The Write Cliff is not a myth; it exists but is largely a defunct issue in most SSD storage arrays.  It occurs when all of the cells within an individual drive have been written to at least once.  This is down to the way that SSD’s are written to, undergoing a Read-Erase-Write process where any incoming data is buffered until the cell it is destined to be written to has been “flashed” and the new data (along with any data being retained if the page is partially filled) re-written back to the cell.  It is important to clarify that while an individual cell can be written to, an entire page must be deleted.

This degradation in write performance, and often overall performance (depending on your queue depth) is most commonly seen in unmanaged SSDs once they’ve been in situ for a prolonged period of time (in my experience it can be anywhere from 9 to 18 months).

Modern arrays such as WHIPTAIL, Violin Memory et al. have algorithms in place to manage this process, ensuring that an (often significant) proportion of the reserved array space is held for Garbage Collection (GC) (the process by which an array is “tidied up” and the subject of another post), and that the Garbage Collection process is running almost continuously as a background process.

So how does this affect you, the consumer?

In reality, it doesn’t, IF you are purchasing a recognised SSD Array/PCIE Flash Solution and have sized it according to the vendors specifications.  But it is often thrown out there by these vendors in competitive situations as FUD in the form of disparaging the performance of the competitors array over its lifespan.

It may happen, and subsequently affect performance, if you fill your array to 90%+ of its capacity and the Garbage Collection process is having to work harder and more frequently than expected.  In this situation, you will see an increase in the CPU and Memory usage within the array and a drop in the write and read performance until the process has completed.  It is often wise to schedule maintenance time for the GC process to be run in it’s entirety (The best way to do this is to arrange it with the vendor’s support team, as they will often have a script to initiate a full GC run) as you’ll extend the life of the array and maintain the sparkling performance that you purchased it for!

It will most certainly occur if you decide that an appropriate solution is to simply purchase a number of SSDs and pop them into your server, SAN and see how it goes.

It’d be great to hear any real life stories involving falling off the edge of the Write Cliff.