Yesterday VMware officially released their DRaaS offering, vCloud Hybrid Service – Disaster Recovery (vCHS-DR). This is a simple service that allows you to replicate VMs from a vSphere environment to a virtual data center within VMware’s vCHS hybrid cloud environment and then perform DR tests or a full DR failover. Since vCHS was first announced many people thought that a DRaaS offering was going to be the killer use case that started the adoption of hybrid cloud services for many organizations.
Time will tell if that’s the case but at least there is now an option for people to consider. But is it right for you? Let’s walk through what VMware is doing.
The current vCHS-DR offering is really targeted at smaller customers or those that want to protect a few select virtual machines, but I expect the use cases to widen over time. The reason that this will be most attractive to these customers is due to the technical implementation of vCHS-DR right now.
First, replication is handled via a modified implementation of vSphere Host-Based Replication (HBR). While HBR works well it’s not nearly as robust as other replication technologies. There is no compression or deduplication so it uses more bandwidth than other replication technologies such as EMC RecoverPoint or Zerto. Second, there is very little in the way of orchestration when you fail over to the hybrid cloud. A common request when failing VMs to another site is the ability to change their IP address without human intervention. Right now this is not possible as vCHS-DR doesn’t use an orchestration engine such as Site Recovery Manager. And third, fallback isn’t automatic or simple. To fail VMs back from vCHS-DR to your local vSphere environment you must use the vCloud Connector to copy the data back after shutting down the VMs in vCHS, so you’ll take an extended outage. Plus this replication of the entire VM back to your site can be very time consuming as there is no sort of hashing. It will replicate the entire VM.
The RPO that you can obtain varies from 15 minutes to 24 hours due to the async replication that HBR uses. The range between 15 mins and 24 hours will depend heavily on your VM’s I/O workload profile (more writes = longer RPO) and bandwidth to the vCHS site (less bandwidth = longer RPO). Unfortunately it can be hard to estimate this until you actually start replicating while the VMs are operating.
Given these limitations I don’t see someone using vCHS-DR to protect a couple hundred virtual machines. Maybe 10 or 20, but that’s probably it right now.
Buying vCHS-DR is very easy. You start with a core service that includes:
From there you can add on more capacity, IP addresses, and DR tests. The core starts at $835/month and you can see more detailed pricing that VMware has here. The number of VMs you can replicate up depends solely on the amount of disk space they use. The number of VMs you can power up for a failover or test is constrained by the RAM and CPU (but mostly RAM).
One thing that I do like is the 7-day DR tests that are available. You can also run failed VMs for up to 30 days with the base SKUs.
We (Varrow) were on the beta for vCHS-DR and impressed at the integration of the system in to the local vSphere environment. It’s easy to use, manage, and monitor. The only “odd” thing is that if you have an existing VDC in vCHS you can’t fail over in to that instance. You get a separate instance just for your DR target. You can connect the two if you want but you’ll need to setup a bit of connectivity between your Edge devices. Again, not hard but something to be aware of if you already have systems there.
One gap that you may encounter is around “pilot light” systems, such as AD or DNS. You can’t run VMs in your DR VDC so you can’t replicate AD up to it in case of a failover. If you have a regular VDC that’s not a big deal as you can just have an AD server there but if you’re only using vCHS-DR then it’s something to consider. The other option would be to use “cross connect” between the DR VDC and existing equipment in the vCHS-hosting datacenter, but that’s not a common situation.
Another common question is around seeding VMs in vCHS. Obviously the time to do the initial sync up depends on the size of the VMs and your connection (private or Internet VPN). vCHS-DR also gives you the option to ship them a seed disk to speed that process up.
That question depends greatly on what you’re looking to achieve. If you’re a small environment or one that has a select number of VMs they want to protect then it can work well. If you’re larger and want an all-in-one solution for virtual machine DR you may be underwhelmed until the system is more robust…and you can bet VMware is working hard on that. This is v1.0 and it does what it promises well.
In my opinion, pricing is very reasonable given that you get a DR target site without any real management. It integrates well in to existing environments but has some obvious gaps around replication efficiency, orchestration, and base infrastructure VMs.
While the new offering is very much a v1.0 it’s solid and is applicable to the customers that we’re seeing ask for an off-site/hybrid DR solution. Conversations around “getting out of the datacenter business” are coming much more common, especially with smaller customers. The first thing they want to remove is the secondary site that rarely gets used but costs them a lot of money. This is the perfect solution for that and I think it could quickly become the “gateway drug” that lets them move their primary datacenter workloads to a standard vCHS instance.
This isn’t the first DRaaS offering that’s out there. A number of service providers provide solutions built on Zerto and other tools. There are pros and cons to each. While they may provide more efficient replication and performance they are not always as “polished” as vCHS and pricing can be more confusing. That’s why I think that VMware priced this as reasonably as they did…to hook customers on the vCHS/hybrid model. Let them dip their toes in first with DR that won’t impact production so they get used to it and then work to bring the rest of the workloads over.