Things like VSAN, Pernix Data FVP, XtremSW and ScaleIO from EMC, and ioVDI and ioTurbine from FusionIO have put a lot attention on flash in servers. The problem that a lot of people are finding is that you need to be careful exactly which flash you use. All SSDs and PCIe cards are not created equal…and it matters.
If you’ve looked at SSDs at all for server use you’ve probably seen several options. There are three main types of SSDs out there, MLC (sometimes called cMLC), eMLC, and SLC. SLC is Single-Level Cell and is the most expensive type. It’s actually falling out of favor for eMLC, Enterprise Multi-Level Cell. SLC SSDs were popular early on because they were fast and had a very long life, but at a great expense. eMLC is quickly bridging that gap but at a much lower cost. Finally there is just MLC, sometimes called cMLC in the server world which stands for Consumer Multi-Level Cell. Think of cMLC as the “value” grade SSDs. They work just great for desktops and notebooks but may not do as well in a high I/O server environment.
But you need to look beyond that. You need to think about things such as over provisioning space. As an SSD ages it wears out cells and those cells can no longer be used. Almost all SSDs have extra capacity that you don’t see that they use to replace these worn cells. Better SSDs have more extra space, cheaper ones have less. This extra space can also be used for other processes such as Garbage Collection. Garbage Collection is a process where an SSD “cleans up” written cells so that there are more zero’d cells ready to use.
I’m not going to get in to the deep inner workings of SSDs as that goes beyond the scope of the point I want to make. But if you’re interested Anand over at AnandTech has written some great detailed articles. Here is one that’s worth looking at. Yes, that article is old but it’s good and the information still applies though the NAND chips have gotten better.
Different manufacturer’s SSDs also use different controllers. The controller is what provides the interface from the SSD to the server and also handles I/O and Garbage Collection. Controllers matter. Good ones work well under high load. Bad ones start to have issues and cause unpredictable latency spikes under high load.
My point is that you may not always know who makes your SSDs if you order them from a server manufacturer. I’m going to use Cisco as an example here simply because that’s who we (Varrow) sell and I was able to track down information on their SSDs.
Cisco breaks out their SSDs in to two different tiers:
That doesn’t tell you a lot. What’s the difference? One costs more. One is probably faster. Helpful. At this time, 4/2/14, they break out as follows:
Before I compare the two drives I want to stress one thing. Often server manufacturers don’t tell you who makes the actual SSD you are ordering. The information above was not on the Cisco data sheets. I had to go get someone to find out for me. If you plan to buy drives from your server vendor ask the right questions and make sure you get what you need.
If you look at Micron’s site about the P400e you’ll see that they target this for enterprise customers. But should you use it for something like VSAN? The answer is no. The P400e is spec’d out to handle 1 full drive overwrite per day. The recommendation from the VSAN team is that you should use drives that are spec’d out for 5 full overwrites per day or more. If you use a lesser drive you may end up “killing” that drive prematurely. Plus, as SSDs start to run out of cells they get slower and latency goes up. So it may not just die one day but you may start getting odd performance.
The Samsung 1625 is spec’d for 10 full overwrites per day by Cisco or 18 by Samsung (for 5 years) so you’d be fine. So the answer is simple. Just buy the Enterprise Performance drives, right? Well..maybe….
One of the selling points of things like VSAN or ScaleIO is the ability to use “commodity” hardware to keep costs down. But is that correct? Again…as I love to say…it depends. This was greatly exemplified by The Register this week in an article they did here on VSAN Ready Node pricing. They built some example VSAN nodes (certified servers built for VSAN by major server vendors) and showed crazy pricing. Many eyes were rolled by those in the industry…but it proves one point. If you go with SSDs from your server vendor you pay a lot for them.
An example is the Samsung 1625 based SSDs from Cisco. So you can check my numbers I’ll use pricing on CDW’s public page. The 400GB Enterprise Performance SSD is $4,000. For a single SSD. Run some numbers and you quickly see that your price for “commodity” hardware goes way up.
But you can also bring your own. Be sure to follow your software vendor’s hardware compatibility list when you do this. The downside to this option is obviously that you won’t get support from your server vendor but you can judge your feelings on that yourself. At Varrow we are big fans of a couple of SSDs, namely the Kingston E100s and the Intel S3700s. Both are eMLC drives with great life expectancy. Our VSAN, ScaleIO, and Pernix FVP labs all use those two drives. To compare the 400GB Kingston E100 is $1,263. That’s a huge difference.
In time I hope these options cause the server vendors to rethink their SSD pricing. In the past SSDs were the exception for server DAS…but with these new in-server flash products we’re going to see that drastically change.
The purpose of this article is to get you thinking about which SSDs you want to use if you’re looking at VSAN/ScaleIO/FVP/ioTurbine/Whatever. The type of NAND…the controller…the over provisioning all matters. I’ve personally killed cheap SSDs using these technologies in my home lab. You don’t want to do that in a production environment.
You also want to look at your options when building out your servers. Do you use OEM drives or bring your own? Again, that’s your decision and there are pros and cons to both.