Covering Scientific & Technical AI | Sunday, November 10, 2024

How the OCP NIC 3.0 is Making Hyperscale Data Centers Cooler 

Managing and storing more data has forced hyperscale data center companies to reexamine not just how they build a data center but also the thousands of servers housed inside – including the connectors for server components.

Hyperscale data centers have the purchasing power and platform to work directly with manufacturers and engineer the optimized server. Rather than figuring out how to use the latest widget designed for the mainstream data center market, hyperscale data center companies are designing the hardware themselves to meet their demanding specifications, many of which have been released as open source hardware through the Open Compute Project (OCP).

OCP, in conjunction with the University of New Hampshire InterOperability Lab, has projects underway to integrate and validate new form factors for using PCIe in servers, not just as a path towards PCIe 4.0 and 5.0 data rates but also to address server needs. Let’s review the limitations, such as thermal considerations and data rates, typically faced by server manufacturers.

A main concern of hyperscale data centers is keeping costs down while scaling to meet exponential traffic growth. One method could be using mature (slower, cheaper) technologies, but massive data growth doesn’t allow for that. Instead, bleeding edge technologies like 400G Ethernet and PCIe 4.0 and 5.0 are necessary, forcing engineers to think outside the box. One example of this is the OCP NIC 3.0 connector for add in cards.

The OCP NIC 3.0 connector and specification design began years ago and has evolved into a form factor that will likely be ubiquitous in the hyperscale data center. The design is mainly attributed to OCP NIC 3.0 contributors Amphenol, Broadcom, Dell, Facebook, HPE, Intel, Lenovo, Mellanox, Microsoft and others.

Source: Open Compute Project

Two primary considerations for which these companies created the new specification were improvements of mechanical and thermal aspects of the NIC. The OCP NICs supports up to a x16 lane configurable up to four x4 lanes, which opens up a variety of potential implementations. NIC designs must work in both the straddle and right angle configurations, allowing for more flexibility for the integrator. The OCP NIC can also communicate via NC-SI sideband, which allows out-of-band management to further unify and control a whole data center. The ability to communicate with any NIC enables control of traffic as well as power management.

OCP NICs were designed to have larger heat sinks with the intent to create more efficient data centers through control of the airflow; based on data and feedback from the NC-SI sideband it can be applied more specifically, or turned off to save power. It’s also possible to get an extra slot per system with this connector type, allowing for a greater density of ports on each server. All of these factors are designed to drive down data center operating costs.

The thermal and mechanical changes to the typical NIC should not change the electrical characteristics of the signal received by the server chips. The OCP group still requires these cards and root complexes to meet the relevant PCIe transmitter and receiver conformance specifications through a test fixture specifically designed to breakout the new connector type with similar loss characteristics at the PCIe Compliance Load Board (CLB) and Compliance Base Board (CBB). This allows vendors to still head towards PCIe 4.0 and 5.0 speeds through the PCIe connectors and now the OCP connector.

While this form factor has benefits for particular use cases, there are others for which this connector type doesn't make sense. They are not cross compatible; the NC-SI out-of-band signaling requires extra lanes, which prohibits a user from implementing the familiar solution of plugging it into a standard and broadly used PCIe CEM connector. One wouldn’t expect to see this form factor in something like a home desktop, but for applications and data centers that have a high density and volume of ports, this form factor makes more sense. PCIe and the CEM connector are not going anywhere, but the OCP NIC 3.0 connector won’t just be a flash in the pan by any means.

There is a lot of interest in the OCP NIC standard and form factor because the hyperscale data center community collaborated in creating this connector. The OCP NIC working group deliver of the specification happened because of the combined effort of many members of the hyperscale data center industry. Their purchasing power allowed for a server design that is highly tuned for mass deployment, optimization and automation, and should be something to keep an eye out for as data centers and data rates continue to grow.

Michael Klempa is Ethernet and storage technical manager covering PCIe, OCP and Ethernet testing services at the University of New Hampshire InterOperability Laboratory.

AIwire