It’s about damn time. For those playing along at home you may remember that in May of last year EMC bought XtremIO for $430M. It’s been a long wait. EMC has been testing and shipping XtremIO arrays under DA (Direct Availability) for a while and now it’s time for it to go GA (General Availability). In this post I’m going to skip the comparison to other arrays on the market. I’m going to tell you what was released, how it works, and what it does. Check back later for my opinion on how XtremIO stacks up against the competition.
Right now there is a single model of XtremIO array. There will be others coming in the future but right now it’s a single option. Basic specifications:
- 6U for the first X-Brick (more on X-Bricks in a minute)
- 5U for each additional X-Brick
- 750W Power
- 4 x 8Gb FC Front-End Ports
- 4 x 10Gb iSCSI Front-End Ports
- 7.5TB Raw usable capacity
An XtremIO array is built up of 1 to 4 X-Bricks right now. Support for more X-Bricks is coming later. We should see support for 8 pretty soon. The picture above shows the first X-Brick. The first X-Brick has two battery backup units. The second X-Brick will take one of those and then as you go beyond that each one gets a single battery backup. I just mention this for sizing out rack units. Within each X-Brick are two 1U X86-based controllers and a 25-slot DAE that holds the SSDs. Communication between X-Bricks is done via Infiniband. When you add your second X-Brick you’ll also add a 1U Infiniband switch as show here.
Note that the controllers are X86-based 1U servers. There really isn’t any secret sauce in the hardware. It’s all good commodity gear. The magic happens in software. This is a good thing. Now you can get new features and innovations with software updates without having to wait for custom ASICS to be created to provide them.
The diagram above lays out the overall architecture. Each controller has 256GB of RAM and 16 physical CPU cores. The two controllers talk to each other via Infiniband RDMA. In a single X-Brick solution they are just directly connected to each other. Once you expand the cluster they are connected via the Infiniband switch. Both controllers are connected to 25 eMLC SSDs via SAS 2.0.
The controllers are Active/Active. LUNs can be accessed via any controller and there is no need for ALUA.
Surprise, surprise….. XtremIO is fast. What else would you expect from an all-flash array? What is compelling about it is that it’s always fast and as you scale it gets even faster. EMC has been very vocal about the fact they are publishing real world numbers and not “hero numbers” as they put it. Their benchmarks have been done with an array that is 80% full, which is often a problem for flash-based storage, mixed read/write workloads, and latency numbers measured end-to-end. To give you an idea check these numbers showing 4K 100% Random 50/50 Read/Write and 4K 100% Random 100% Read profiles:
- Single Brick – 150K IOPS / 250K IOPS
- Two Bricks – 300K IOPS / 500K IOPS
- Three Bricks – 450K IOPS / 750K IOPS
- Four Bricks – 600K IOPS / 1M IOPS
Notice something? Yeah…linear scaling of performance. You not only gain capacity as you scale but you also get performance. Think of it this way. You just set your target IOPS that you need and buy the number of bricks that it takes to get you there. Well..and then make sure you have enough capacity….details.
Also…all of those IOPS numbers were at <1ms latency. Fast.
Bells & Whistles (Features)
One huge benefit of XtremIO is that the software features are always on. The benchmarks I noted above are with all the knobs turned on. You won’t enable anything here and see a drop in performance.
There is no setup and tuning of XtremIO. No LUNs. No RAID Groups. No pools. No stripe sizes. No tiering. Nothing. You have a pool of very fast storage. How big do you want that LUN to be? That’s all you really need to do. In fact, the first time that Joe Kelly and I saw a preview of these boxes I told him that storage management just got really, really boring.
We’ll talk more about how data is deduped and written in a bit but XtremIO balances I/O across all SSDs in the cluster. You won’t find hotspots anywhere nor will you find SSDs that are almost full while others that are almost empty. This sample chart shows that. Notice what little variance there is between disks?
Everything written to the XtremIO is thin provisioned. It also uses a set 4KB allocation for all IOs. No fragmentation or reclamation penalty.
Support for VAAI is included. As you’ll learn in a bit you’ll see why it works so well. A VAAI clone is just a simple metadata update. VAAI features supported:
- Zero Blcoks/Write Same
- Clone Blocks / Full Copy / XCopy
- Atomic Test & Set
- Block Delete / UNMAP / TRIM
Capacity and the Secret Sauce
As mentioned above each brick is 7.5TB usable..right now. That’s not a lot. XtremIO has full in-line deduplication. It’s always on. It’s in-line. And it’s global across all X-Bricks in the cluster. It works by fingerprinting all data as it comes in the array. The array uses a set of hashing mechanisms that allow it to quickly find redundant data while avoiding hash collisions.
So as data flows in to the array it is broken down in to 4KB chunks, fingerprinted, and then compared to the metadata that the array already has. If a matching fingerprint is found then the array just notes the second copy of data in the metadata table. If it’s not found the new data is written and a new entry added to the metadata table. As data is written it is spread across all available X-Bricks and drives to evenly balance all load.
The great thing about this, and the secret sauce, is that the metadata information is held by all X-Bricks in the cluster. It’s never accessed off of disk (well..after array boot). All lookups are handled out of RAM and therefore very, very fast. This also provides for very fast VM cloning and other metadata-only operations.
How much capacity can you expect? Well…that’s the thing. With deduplication you’re never really sure. You’ll get at least the raw usable capacity for the number of X-Bricks you have but beyond that it will depend greatly on the type of data you are storing. It’s common to see from 3:1 all the way to 10:1 but some data types may result in a very low dedupe ratio. During initial deployments I think it will be common to see people doing PoCs to verify what they can expect from deduplicating their data.
Can you trust XtremIO with your data? I think so. Everything in the system is redundant.
- Redundant controllers
- Backup power supplies
- Dual (redundant) power supplies
- Dual (redundant) InfiniBand ports
- Dual SAS Controller Modules
- Dual iSCSI and Fibre Channel ports on each storage controller
- Fail up to 6 SSDs per X-Brick
- N+2 row and diagonal parity
Note that you can’t fail 6 SSDs at one time. You can fail up to two SSDs in an X-Brick at one time. Each time you fail an SSD (or two) a data rebuild will be performed and when that is complete you can fail another (or two) up to a total of 6 per X-Brick, as long as you can withstand the loss of capacity from each drive loss. Think about that. With a four X-Brick cluster you could fail up to 24 SSDs..that’s almost one complete X-Brick’s worth of drives!
XtremIO’s software data protection capabilities are referred to as XDP, XtremIO Data Protection. XDP protects data by using advanced techniques, not traditional RAID. XDP has far less capacity and I/O overhead. The capacity overhead is only 8%.
XDP is very efficient. With many all-flash arrays an SSD is “locked” when a write occurs. Meaning, you can’t also be reading data. XDP never locks an SSD which provides much more consistent performance. There is also no garbage collection and re-striping of data as with most other systems. XDP writes data to the “most empty” stripe it can find. This greatly reduces back-end read/write activity and SSD wear.
There is no configuration needed for XDP. There are no hot spares. The system just does a content aware rebuild and continues on. Once a rebuild is complete there is no more performance impact like with traditional RAID while you have a failed drive.
Finally, you can perform non-disruptive upgrades to the system. Hosts stay connected though they might see a fast blip in I/O for the final failback. But no downtime needed for software updates.
Thoughts & Conclusion
It’s been a long wait for EMC to finally release the XtremIO product line but I think the wait is worth it. The initial offering is very compelling and checks most of the boxes that we want to see. It’s a fast purpose-built system that scales linearly while providing very efficient storage. It’s not a legacy array filled with SSDs.
What’s missing? Well… XtremIO is block only (iSCSI and FC). No NAS here so you’ll need another solution for that. It also does not currently have any native replication capability. You’d need another solution for that. Replication is coming, just not in the initial release.
Managing an XtremIO cluster is very simple. The UI is easy and very, very intuitive. It also provides a great deal of information and feedback on the health, performance, and capacity of the system. I was hoping to have a few demo videos of that available for this post but I couldn’t get access to an XtremIO in the lab in time. Check back for that soon.