Supermicro Revs Up Low Latency HFT Servers
If you have the need for speed, Supermicro has a server for you.
Supermicro is known to many as a motherboard supplier, but the company has a large and growing systems business aimed at a variety of markets. Supermicro has several system designs aimed at hyperscale datacenter operators, including its FatTwin and MicroCloud, which are all about compute density. While density is important in the financial services market, high performance and low latency are often more important and that means making different hardware component and design choices. For these financial services companies, which are often engaged in high-frequency trading or proprietary trading, Supermicro has engineered a family of machines known as Hyper-Speed, and the latest iterations of these servers were on display this week at The Trading Show, a financial services technology event hosted in New York.
Bert Shen, product manager for technical services at Supermicro, gave a presentation that explained why many financial services firms stay on the cutting edge of X86 hardware these days. The reason is simple: by doing so, they can get a performance advantage over their peers, and in the HFT world, lower latency and higher throughput translates into more money. A few of the top investment banks and a number of the largest hedge funds and proprietary trading firms are using these Hyper-Speed systems, according to Shen, and the base is growing as an arms race between companies has taken off.
So just how much better is each successive generation of X86 server hardware? The answer, Shen demonstrated, is enough to make a big difference and to prove the point, Supermicro ran the sfnt-pingpong benchmark test, which generates traffic and measures the round trip latency of a system, on the three most recent generations of two-socket Intel Xeon servers. This test used a 64 byte packet size.
The baseline two-socket machine used 3.46 GHz "Westmere-EP" Xeon X5690 processors, which have six cores. The server had a Solarflare SFN6122F two-port 10 Gb/sec Ethernet adapter card that slides into its PCI-Express 2.0 slot. This was the fastest of the Westmere Xeons and precisely the CPU that HFT shops go for. Jumping to a two-socket machine using "Sandy Bridge-EP" Xeon E5-2600 v1 processors did two things. First, there were more cores in the box, and second, the server moved to PCI-Express 3.0 peripherals and an on-die PCI controller. Configured with the Xeon E5-2687W workstation processor, which has eight cores running at 3.1 GHz, and a Solarflare SFN7122F card that was designed to push the PCI-Express 3.0 bus, the Sandy Bridge machine had a 43 percent lower median latency on the sfnt-pingpong test compared to the Westmere box Equally importantly, the jitter of the machine – a measure of the irregularity of the latency due to signaling issues inside the electronics of the box – was also reduced by 43 percent.
You might not think that moving up to the brand new "Ivy Bridge-EP" E5-2687W v2 processor, which came out in September, would change all that much. But Shen tells EnterpriseTech that the Sandy Bridge parts had some jitter issues, which Intel was able to fix with Ivy Bridge. The E5-2687W v2 has eight cores running at 3.4 GHz, which is a little more CPU oomph, and the system has the same Solarflare SFN7122F card. And still it delivered 25 percent lower latency and 30 percent less jitter running the sfnt-pingpong test.
In another test, Supermicro compared a standard Sandy Bridge Xeon E5 server to its prior HFT design, running the Netperf benchmark. These tests were run on untuned operating systems, and the intent was to show the effect of having a motherboard designed to eliminate crosstalk in the wire traces and having traces with matched lengths and running on as few layers as possible to make the jitter as low as possible. The HFT machines also have customized BIOS firmware that allows for Turbo Mode to be elevated a little on the Xeon chips and to cut down on the number and length of interrupts in the system. These also cut down on jitter, reduce latency, and boost performance.
So here are the results for the Netperf test on the two machines running the UDP and TCP protocols:
On the unoptimized server on the left, with the TCP protocol the round trip latency is 26.324 microseconds with a transaction rate of 37,987. The optimized HFT setup on the right, with the exact same processors but with a custom motherboard and BIOS, could do 49,807 transactions and had a round trip latency of 18.316 microseconds. That is a 30.4 percent reduction in latency. And if you look at the standard deviation in the latency of the machines, you will see a 70 percent drop and that is a measure of the reduction in jitter..(Admittedly, this is not a great photo.)
Shen said that it will cost somewhere between $10,000 and $15,000 to upgrade each HFT server, depending on the generation and including Solarflare networking cards, which are not cheap. The Intel "tick" upgrades, which are socket compatible, cost less than the "tock" major upgrades, where the microarchitecture and sockets change. It takes on the order of weeks to deploy and validate the new machines, which is relatively zippy but in a world where a processor generation is about a year to a year and a half, this is still a relatively large amount of time to spend on testing.
"It is very important for you to start testing very early so you can deploy your trading solution at the same time as the product introduction," explained Shen. "Or even better yet, if you work with Supermicro, you can even do that before the Intel product launch." Depending on the processor cycle, traders might be able to get a three to six month lead time over their competitors. "You are going to have a tremendous advantage."
There's one other thing to consider as well. In the past two generations of Xeon chips, there was about a 30 percent jump in power efficiency between generations, which cuts down on co-location costs by allowing for more servers to be crammed into the racks.
The fastest Hyper-Speed system available from Supermicro is the SYS-6027AX-72RF-HFT3, and it is not just aimed at HFT shops but any computational finance, electronic design automation, seismic processing, or simulation workload where high thread performance and low latency in the system are important.
This machine comes in a 2U chassis and supports two of the Xeon E5-2687W v2 processors, and have huge heat sinks on them so they can be air cooled. The system board has eight RDIMM DDR3 memory slots, and comes with a base 64 GB installed, expandable to 512 GB. The system has ten hot-swap 3.5-inch drive bays, and can be equipped with a variety of SAS and SATA drive drives as well as flash drives. The server has five PCI-Express 3.0 slots and one PCI-Express 2.0 slot for peripheral expansion, and typically customers put in two network cards from either Solarflare or Mellanox Technologies. The server is powered by two 1,280 watt, platinum level (95 percent efficiency) power supplies.
Shen estimates that a barebones Hyper-Speed machine would sell for around $2,000 to $3,000, depending on the memory not including the processors, network cards, or drives. Add those in, and that is where the price goes from $10,000 to $15,000.
Two-Socket Xeon Versus One-Socket Core i7
While there are some HFT machines that are based on souped-up desktop Core i7 processors from Intel, which clock faster than the Xeons and, importantly, allow for the processors to be overclocked unlike the Xeons, Shen says there are issues with these machines and that is why Supermicro's Hyper-Speed machines are based on two-socket Xeons.
The desktop components do not have the same rigorous testing as server components do, and someone has to qualify and tune them for servers. Moreover, they do not support ECC scrubbing on main memory, and according to studies done by Google and the University of Toronto back in 2008, there is a probability of a bit being flipped in one out of twelve memory sticks in a machine per year. The bit flip could change a stock price, or crash the system, and while there are software patches to scrubbing and check data, this eats up some of the extra computation that comes from the higher clock speeds. Server motherboards also have baseboard management controllers, which allow for remote management of servers and which is obviously important for machines running in co-location facilities. You can't drive from Wall Street to the datacenter out in New Jersey to hit reboot.