A Sneak Peek At VMware EVO:RACK Cluster Appliances
Contrary to some of the rumors going around ahead of the “Project Marvin” EVO cluster appliances launch at VMworld this week, server virtualization juggernaut VMware does not want to be a hardware vendor. It does, however, want all of the benefits of being a hardware vendor with the EVO machines without any of the drawbacks of being a player in the cut-throat systems, storage, and switch businesses.
To that end, VMware is taking a page out of the SAP HANA playbook with the EVO family and is perhaps getting some inspiration from Oracle’s Exa family of “engineered systems” for database and middleware workloads. In both cases, the hardware and software stacks are strictly defined, which greatly simplifies the testing, certification, and support of the software stack on that hardware. With the dizzying array of processor, memory, disk, flash, and networking options available on X86 servers – even those from a single vendor – the support matrix quickly gets out of hand. This slows down the rollout of new software and also makes support operations more complex and more costly, and this is particularly true of any piece of software that is new and changing fast, which is the case with SAP’s HANA in-memory database and with VMware’s Virtual SAN, the latter of which underpins the EVO family and gives it so-called hyperconverged SAN-style storage that runs on the same server nodes that host the ESXi hypervisor and its virtual machine guests.
Thus far, the EVO family includes two broad lines. The EVO:RAIL clusters, which EnterpriseTech described in detail earlier this week, span from four to sixteen server nodes and offer from 100 to 400 virtual machines running applications or from 250 to 1,000 virtual desktops if the end users are middle-of-the-road users who do not demand too much compute or video performance or very high I/O operations per second for their virtual PCs. At the time that VMware’s Mornay Van Der Walt, who ran the Project Marvin effort, gave EnterpriseTech a prebriefing, the company did not divulge the fact that there would be a second and more scalable cluster type, called EVO:RACK, leaving a little something for his boss, VMware CEO Pat Gelsinger, to unveil during his keynote earlier this week at the VMworld event in San Francisco.
Down in the solutions exchange expo at VMworld, some of the partners who are building EVO:RAIL machines were showing off their designs, including Dell, Supermicro, NetOne, Inspur, and Fujitsu. (VMware parent EMC will also create EVO:RAIL cluster appliances, but was not showing off its nodes at the show.) Frankly, all of these EVO:RAIL four-node enclosures looked pretty much the same, and that is absolutely so by design on the part of VMware. The company wants its server channel partners to participate in the EVO program, acting as franchisees and pushing the complete EVO solution on hardware that is further commoditized by being absolutely standardized without any room for modification and therefore differentiation that might warrant a premium price. With EVO, VMware is pitting the hardware vendors against each other for deals that will likely involve hundreds to thousands of nodes in large enterprises, and the competition will drive down hardware prices and therefore the overall price of the EVO solution. If hardware costs less than it might otherwise without such pressure, that extra margin can come from the software and support in the EVO stack.
How much of this margin will be shared with VMware’s channel partners remains to be seen, but it is helpful to remember that VMware gets the vast majority of its sales through channel partners so it has to let them make some money or they will peddle hyperconverged software alternatives. And in many cases, such as with Dell with its partnership with Nutanix and Hewlett-Packard with its deal with SimpliVity, EVO partners will sell multiple hyperconverged systems and let the market decide. This assumes, of course, that HP will eventually be an EVO franchisee, something it has not yet announced but could when it inevitably refreshes its ProLiant server line in the wake of the launch of the “Haswell” Xeon E5 v3 chips from Intel, expected at Intel Developer Forum in early September. This is a safe bet because it is VMware, after all, that has 500,000 customers with approximately 40 million VMs installed atop the ESXi hypervisor, and many enterprise customers will want to stick with the VMware stack, even if it does command a price premium.
That Old Time RACK And RAIL
The EVO:RACK machines are targeted at much larger workloads, and as far as EnterpriseTech has been able to determine, VMware is going to allow a little more variation in the prescribed hardware configurations than it does with EVO:RAIL cluster appliances. This is probably being done out of necessity, given that the large enterprises at which EVO:RACK is aimed have their preferred hardware partners even though they might be willing to go with a new architecture – such as Open Compute systems – for a greenfield workload. This is precisely what drove the ramp of Cisco’s Unified Computing System blade servers in their early years. Virtual desktop infrastructure had matured and become practical as well as affordable at the same time that Cisco was ramping up UCS shipments. VDI was a perfect greenfield application on which to test the new machines, and to this day, VDI is one of the key drivers of UCS revenues.
VDI is also expected to be a key driver for both the EVO:RAIL and EVO:RACK cluster appliances as well as generic private cloud setups. Sources at VMware tell EnterpriseTech that the EVO:RACK machines will be aimed at these two workloads as well to start, but eventually there will be variants that have the hardware and software tuned to support analytics workloads and to run platform cloud frameworks atop the infrastructure cloud. It is not clear if VMware will deploy the Pivotal versions of Hadoop and Cloud Foundry on these machines as part of the EVO:RACK stack, but it probably will do that and once again gear the hardware and tune up the software to all work well together. It is possible that other platform cloud alternatives could also be certified for these future EVO:RACK machines, just like other Hadoop distributions and even other analytics tools could be certified and given their own EVO:RACK variants. VMware is not saying at this point.
What we have learned about the EVO:RACK machines, which are in technical preview now among a select number of customers, is that these will be much beefier machines than the EVO:RAIL setups, and they will push to the scalability limits of the underlying Virtual SAN software developed by VMware and of the vCenter management domains. VMware is not saying if either vCenter or VSAN scalability will be extended with the impending ESXi 6.0 hypervisor, which is in a private beta now and which could be launched at VMworld Europe in Barcelona, Spain, in October. The company is not even saying if EVO:RACK will ship with ESXi 5.5 or 6.0 as its foundation, but the limits on the initial EVO:RACK specifications seem to suggest that ESXi 5.5 and 6.0 will have roughly the same scalability on vectors that matter to the EVO:RACK.
In his opening keynote, Gelsinger said that the EVO:RACK cluster appliances would put one ESXi cluster into a rack and would scale across ten racks. A Virtual SAN cluster is designed to scale across 32 server nodes, and it is no surprise that this is the target scalability for a single rack in at least one EVO:RACK setup that was being previewed in the expo, which came from Taiwanese motherboard and system maker Quanta Computer. The ten rack scale out of EVO:RACK is also no surprise, since the vCenter management console can federate across a maximum of ten vCenter instances, presenting a maximum of 320 nodes as a single management domain. Each 32-node cluster would have its own VSAN rather than creating a VSAN that spans those 320 nodes, so this is best thought of as a cluster with ten pods of storage. Moreover, ESXi does not allow for the live migration of virtual machines across vCenter clusters that are federated – it no doubt will at some point, and rumors suggest this could happen with ESXi 6.0 – so the fully loaded EVO:RACK really would operate like ten separate clusters with the exception of having a unified management console.
There is nothing wrong with this. A pod size of between 1,200 and 2,000 virtual machines is plenty big enough as a single cluster, which is what VMware sources tell EnterpriseTech is the target for the initial release of EVO:RACK. (The number of VMs per rack is based largely on how fat or skinny they are in terms of virtual compute, memory, and storage capacity.) Large enterprises are going to be plenty satisfied if they can manage federated clusters with 12,000 to 20,000 VMs. This is, by any measure, an extreme scale setup.
Gelsinger said that there would be two variants of EVO:RACK. One that is compliant with Open Compute rack and server designs, and he hinted that there would also be variants from the VCE partnership between EMC, VMware, and Cisco Systems based on its UCS platform. Gelsinger also confirmed that there would be multiple hardware partners for EVO:RACK, so it is very likely that HP, Dell, and a few of the other upstarts that are also doing EVO:RAIL clusters will participate. He added that the EVO:RACK configurations would also allow for external storage to be used in the systems.
The initial OCP-compliant EVO:RACK setup is coming from Quanta, and it includes a 21-inch wide rack. The server nodes are crammed four to a chassis, just like in the EVO:RAIL machines, which fit into a 19-inch rack. Those extra two inches allows Quanta to squeeze a skinny bay to the left of the server node’s motherboard that can hold up to four 2.5-inch drives. This extra room for disks in each node is freed up because the OCP rack design pulls the power supplies out of the server node enclosures and puts them on a central power shelf. With the EVO:RACK setup aimed at VDI workloads, two of these bays will have solid state drives to be used as cache for VSAN. For more demanding virtualized workloads, PCI-Express flash cards will plug into rise cards, VMware execs explained.
The Quanta F30A nodes are based on current two-socket “Ivy Bridge” Xeon E5 v2 processors, but Quanta and VMware executives on the floor were not at liberty to say which specific chip would be used. In the Quanta product line, this server enclosure is known as the F03A enclosure and it is 2 Open Unit (OU) high, which is 3.78 inches instead of the shorter 3.5 inches using standard rack units. The extra height allows for better air cooling and also for regular main memory sticks to be used instead of low-profile memory, which costs more. The F30A nodes can support all of the Xeon E5 chips up to and including the 130 watt high-bin parts from Intel. Each node also has two of its own SATA Disk On Module (SATADOM) units for storing virtual machine and operating system images.
For VSAN storage, each of the four-node enclosures is mated to two Quanta JBR disk enclosures, which have a very clever “hidden drawer” design that is arguably a better piece of engineering than Facebook’s own OpenVault disk enclosure. The JBR array has fourteen 3.5-inch drives in a front bay, and when you pop the catches and pull it out, it reveals a second bay that is attached to the chassis and that sits behind the fans in the front set of bays. The JBR array has room for 28 disks in total in its 2OU enclosure, and you can lash up to 84 drives into a single array by daisy chaining multiple units together. Each unit has two SAS controller modules. In any event, for the EVO:RACK, the JBR array is partitioned into four arrays, one dedicated to each compute node.
The idea is to use 900 GB or 1 TB SAS drives in the JBR enclosures, which delivers around 160 TB of usable VSAN capacity per rack. This is nowhere near the upper capacity limit of VSAN, which tops out at around 4.4 PB across 32 nodes using 4 TB SATA drives. But given the nature of the workloads and the size of the flash drives used in the nodes, this is an appropriate balance of disk and flash storage and compute capacity, VMware’s techies explained. Using 4 TB or 6 TB drives is too much capacity given the flash in the nodes and will hurt VSAN performance. (Of course, the flash could be increased on the nodes with the addition of both flash SSDs and PCI-Express flash cards if need be for a fatter VSAN configuration.)
The Rackgo X700 OCP rack is 42OU high and it has room enough to get 24 server nodes and their associated storage into a single rack. The machines have one array on the bottom, then the server nodes and one array on top, and you keep layering this way like a cake or a parfait until the rack is full. The power shelf is in the center of the rack, as are two Quanta T3048-LY2 switches. This switch is based on the Broadcom Trident+ ASIC and has 48 downlinks, running at 10 Gb/sec, to the servers as well as four 40 Gb/sec uplinks that will feed into aggregation or spine switches that will link the multiple EVO:RACK OCP cabinets to each other.
Mike Yang, general manager of the Quanta Cloud Technology server and storage sales subsidiary of the Taiwanese manufacturing giant, tells EnterpriseTech that the EVO:RACK system his company has created will support around 1,400 VMs per rack. It will be available worldwide, with initial interest particular strong in the United States but also in China, where VMware’s software is used at the core of applications in banking, manufacturing, and other key industries just as it is in the rest of the developed world. Quanta expects that the uptake among service providers wanting to build out VMware-compatible clouds will be particularly strong. “Service providers used to buy their machines from tier one suppliers, but now they have figured out that coming to us is more beneficial,” says Yang.
Enterprise customers are no doubt also paying attention to Quanta, which is hard to ignore given its dominant position manufacturing servers and storage for hyperscale datacenter operators.
Yang said that Quanta is looking at doing a version of EVO:RACK based on standard-sized 19-inch racks, but has not made its mind up on this yet. He also said that Quanta had no interest at this time in forging the smaller EVO:RAIL setup because Quanta had already designed its own VDI cluster based on its own Stratos microservers.
The exact feeds and speeds of the what VMware called the standard EVO:RACK machine were not available at VMworld. Whoever was building that cluster was not revealed, but VMware’s techies told us that the machine didn’t get to the show on time because of some unknown delay. What they could tell us is that this so-called standard configuration would have 32 nodes in a rack and its storage would use a mix of flash and 1.2 TB SATA drives to deliver around the same 160 TB of usable VSAN capacity. There will be 33 percent more nodes in this setup, but don’t jump to the conclusion that there will be 33 percent more compute. The machine no doubt has denser packaging but could be using Xeon E5 chips with fewer cores. And in fact, that is probably exactly what will happen.
EVO:RACK will ship sometime in the first half of 2015, and the basic infrastructure cloud version will have the ESXi hypervisor, which has the VSAN software embedded in it, as well as the vCenter management console bundled on. VMware’s vCloud Automation Center tool is also loaded up on a virtual machine on the box, and is tweaked specifically for infrastructure self-service portals. vCenter Operations Manager, which is used to orchestrate the movement of VMs around a cluster, is also loaded up, and so is the LogInsight log management tool, which chews on data from applications, operating systems, virtual machines, and the hypervisors to proactively monitor the performance of these layers in the stack. With the VDI variant of EVO:RACK, VMware’s Horizon View desktop image broker is loaded on and vCloud Automation Center has self-service profiles for provisioning virtual desktops instead of raw virtual machines. The EVO:RACK console interface is coded completely in HTML5 and brings the management aspects of all of these tools into a seamless whole that minimizes the pointing and clicking. There is a single sign-on for all of the federated vCenter consoles, and it all looks like a single management domain across those ten racks.
Two EVO:RACK setups from Quanta were on in the VMworld datacenter and used to support about half of the hands-on lab demonstration virtual images that attendees fired up to do their training sessions. Each rack was setup with two vCenter consoles, which each handled somewhere between 600 and 800 concurrent VMs, depending on the day of event, which ran from Sunday through Thursday. That wasn’t pushing the upper limits of EVO:RAIL performance or capacity, but it did stress the machinery a little.