Cavium ThunderX ARM Chip Rumbles Into Hyperscale
It is not a coincidence that Cavium, a maker of MIPS processors aimed at networking equipment and other embedded uses chose the Computex 2014 conference in Taipei, Taiwan to announce its entry into the ARM server chip space. Many of Cavium's customers for its ThunderX processors will come from Taiwan, where a large percentage of the world's servers are designed and manufactured these days – and usually for the hyperscale datacenter operators who will be on the cutting edge of the ARM adoption curve for servers.
With the ThunderX line, Cavium is looking to leapfrog the 64-bit X-Gene 1 processors from Applied Micro and the "Seattle" Opteron A-Series from AMD, both of which have eight ARM cores, and put the memory of now-defunct Calxeda, the original upstart in the ARM server space, to rest. And considering the specifications of the ThunderX line, which will include a family of devices that will span from networking and storage equipment up to two-socket servers with a maximum of 96 ARM cores per system, Cavium has as good a shot as any of the ARM vendors of competing with Intel in the datacenter and taking a bite out the $128 billion market for servers, storage, and switching that the company sees developing in 2014.
Cavium is, like Applied Micro was originally and as AMD has just become, a full licensee of the ARM architecture. And what that means is that these three companies can create custom cores for their processors and do not have to stick with the Cortex-A53 and Cortex-A57 designs from ARM Holdings, which licenses the ARM intellectual property. Cavium, Applied Micro, and AMD can add features to the cores so long as they maintain compatibility with the ARMv8 instruction set. The ThunderX chips will be fully compliant with the ARMv8 spec and will also adhere to the Server Base System Architecture guidelines announced in January. The SBSA is design to maintain a level-set of compatibility across diverse ARM processors, so operating systems can more easily work across these devices without a lot of customization.
It is a delicate balance encouraging diversity and compatibility.
As you can see from the block diagram above, the ThunderX chip will put a lot of different components on the die, and this is made possible by the adoption of 28 nanometer chip etching processes. (Interestingly, GlobalFoundries has the job of making the chips, not Taiwan Semiconductor Manufacturing Corp.) The design includes 78 KB of L1 instruction cache and 32 KB of data cache per core. The cores share up to 16 MB of L2 cache, which are fed by up to four memory controllers that can support either DDR3 or DDR4 main memory. These are 72-bit memory controllers that can run memory sticks at up to 2.4 GHz and support up to 512 GB of memory per socket. The chip maxes out at a whopping 48 cores per die, and the cores can spin up to 2.5 GHz. The design includes a set of circuitry called virtSOC that virtualizes compute, memory, and I/O on the chip, exposing the virtualized components to hypervisors like KVM and Xen. The chip will have "hundreds of integrated hardware accelerators" on the die that provide precise functions for security, storage, networking, and other aspects of a system.
The ThunderX chips will also sport NUMA clustering, based on the Cavium Coherent Processor Interconnect (CCPI) that will allow for two processors to be lashed together to create a shared memory space. The two-socket machine will span 96 cores and 1 TB of memory. For workloads that go beyond a single system, the ThunderX chips will sport a low-latency Ethernet fabric that allows for thousands of sockets to be linked in 2D and 3D torus configurations. The fabric will have hundreds of gigabits per second of aggregate bandwidth of I/O, and the networking will be configurable in 100 Gb/sec, 40 Gb/sec, and 10 Gb/sec chunks. ThunderX chips will also have SATA 3.0 and PCI-Express 3.0 peripheral controllers on the die.
Here is what the two-socket machine based on ThunderX will look like:
The ThunderX line is actually comprised of two different chips which come in four possible variations, with different features turned on and off depending on the targeted workloads. This is what the lineup looks like for the high-end CN88xx processors:
All four will reach up to 48 cores on a die, but different features will be turned on and off depending on the workloads. (And presumably the price of the chips will reflect this.) ThunderX_CP will have all the cores fired up, but not all of the Ethernet bandwidth and limited SATA ports. The ThunderX_ST will include all of the SATA ports as well as support for Remote Direct Memory Access (RDMA) over Converged Ethernet and various accelerators for data compression. This one is aimed at Hadoop big data munching and servers that implement block and object storage. The ThunderX_SC includes the hardware accelerators for network and security workloads that are in Cavium's Nitrox family of embedded processors; these are aimed at security appliances for equipment makers who want to move away from MIPS and towards ARM. The ThunderX_NT is for high-bandwidth network virtualization appliances and media servers and has the full-on 100 Gb/sec fabric on the die turned on.
And here is how the features map on the two different processor types aimed at servers:
The low-end server CN87xx variant will not have the NUMA clustering, 100 Gb/sec Ethernet, or Ethernet fabric features and it will have 8 or 16 custom ARM cores. Cavium says the CN87xx chip will be aimed at cold storage, distributed content delivery, dedicated hosting, and distributed memory caching. The chip will have multiple 10 Gb/sec ports and multiple SATA 3.0 and PCI-Express 3.0 interfaces and only two memory controllers.
Both of the ThunderX chips will start sampling early in the fourth quarter. Motherboard and system maker Gigabyte has committed to making system board to support the ThunderX chips, and was talking up a multi-node chassis that would put four two-socket ThunderX nodes in a 2U enclosure plus two dozen 2.5-inch disk drives and four 10 Gb/sec internal chassis links between the nodes and four 10 Gb/sec uplinks to the top-of-rack server. Gigabyte will also have a Micro ATX motherboard aimed at 1U rack servers designed for the ThunderX chips. Hewlett-Packard will be putting the ThunderX chips inside of its Moonshot hyperscale systems, too. And just to keep things interesting, Cavium has joined the Facebook-backed Open Compute Project that has fostered an ecosystem of minimalist hyperscale server designs; the company said it would be contributing its own two-socket motherboard design to the OCP, allowing others to take it and remake it as they see fit.
Canonical, SUSE Linux, Red Hat, and MontaVista are all working to make their respective Linuxes compatible with the ThunderX chips and the Xen and KVM hypervisors will also work on them. The GNU compiler set has been ported to them as well, and Oracle and the OpenJDK community are bringing optimized Java virtual machines to the chip, too. The open source OpenStack cloud controller will also be tuned to run on them.
It looks like AMD and Applied Micro just got some serious competition. Your move, Marvell and Samsung and Nvidia.