Covering Scientific & Technical AI | Wednesday, November 27, 2024

Going Green in the AI Era: an Imperative for Performance-Intensive Applications 

As leading-edge AI and IT workloads push computational demands ever higher, sustainable data center strategies can save costs and boost performance, according to this contributed perspective piece.

Across the globe, the same pattern is seen – data centers are slowly becoming a detriment to the environment. While energy usage has stabilized around 2-3 percent [i] [ii] of the world’s electricity, thanks to innovations in cooling methods, the amount of e-waste is growing to over 50 million metric tons per year according to the UN [iii]. Furthermore, only 20 percent of this hardware is formally recycled.

As computer technology continues to evolve, computers are becoming the most powerful we have seen, but unfortunately also the most power-draining. The semiconductor industry is developing impressive improvements in processing performance, with two significant negative consequences. The escalating performance increases have come at the cost of increased power draw and heat generation, and now older hardware is becoming outdated faster than ever. This contributes to concerns over power usage and the e-waste of data centers.

So, what can data centers do to offset these adverse effects? Businesses need to prioritize green choices within their data centers. With new advancements, the push for greener data centers has moved from an environmentally conscious choice to a cost-effective one. Looking at data center power usage and costs today, we can see that even 10 percent efficiency gains will have a massive effect on companies’ financial and ecological bottom lines. If global data center energy consumption stays consistently around 300 TWh for the next year [iv] [v], then based on the global average business electricity costs of $0.127 per kWh [vi], 10 percent more energy efficiency means enterprises worldwide will manage to save $3.8 billion on datacenter energy costs.

Perhaps the most significant way to reduce the environmental damage from fossil fuels used for data center electricity is to simply not use fossil fuels. Renewable energy programs for commercial customers include generation through the utility itself, third-party power purchase agreements (PPA), or renewable energy credits (REC). Thus, data center operators can select fossil-free power for their needs, reducing any greenhouse gas released into the atmosphere.

Enterprises are starting to realize improved efficiencies and cost savings while committing to preserving the environment. In fact, simply reducing power utilization can increase profits by decreasing operating expenses simultaneously. By reviewing their facilities’ Power Usage Effectiveness (PUE), server density, inlet temperatures, and amount of e-waste, these companies begin to see how suboptimal data center construction, operations, and maintenance affect their bottom line and the environment. Many data centers are operating with inlet temperatures of the air that is pulled into the servers lower than the manufacturer specifies. By simply turning up the data center temperature, significant air-cooling power usage can reduce costs.

By looking into longer-term considerations and upgrading only critical components for their facilities, data center operators realize improved performance, lower costs and a decrease in environmental footprint.

Keeping Cool with Liquid

One solution for these enterprises to consider is upgrading their cooling systems – and transitioning from air to liquid cooling solutions. The latest hardware and compute systems are causing significant cooling issues for most data centers. Air-cooling methods have limits to how much and how fast they can dissipate heat. This problem cannot be solved simply by speeding up the cooling fans, and those limitations are starting to be reached.

Liquids are simply more effective and efficient at removing heat than air, thanks to having higher thermal heat conductivity. And liquid cooling designs are also better able to displace the heat outside of the system, rather than relocating it inside the server case like air cooling often does. This is due to the contained designs, where heat is dispersed into the liquid in one area and then ultimately dispelled out of the liquid (and the system) in a separate location. Although implementing liquid cooling for existing air-cooled servers is possible, it usually is not feasible. Working with server vendors, many, but not all, servers can be modified to use liquid cooling technologies, which reduce power usage, as 1) Fans will run at a slower speed, and 2) Significant reduction in data center air-conditioning.

With most datacenters not equipped for liquid cooling deployments, this means that the initial capital expenditure costs to install the infrastructure can be high. But the operational costs are far lower. Traditional air-cooled setups tend to have very poor PUE, spending immense amounts of energy powering the air-conditioning and ventilation systems to expect the air. From a power-cost-to-processing-performance perspective, they’re inefficient. But with direct-to-chip liquid cooling, for example, the power consumption of a data center can be reduced by 40 percent.

Beyond being cost-effective and reducing greenhouse gas usage from electricity, liquid cooling also offers performance improvements. With improved cooling, CPUs, GPUs memory, and more won’t hit thermal limits and throttle as quickly. This is especially important for enterprises focused on cutting-edge HPC and AI applications that need ever-higher performance levels.

Density, Disaggregation & Data Center Design

data center cold aisle through the transparent curtains

The next level of consideration is to look at the design of the racks and the data center itself. A modern data center consists of hundreds to thousands of racks of servers and storage systems, so finding small efficiencies per rack can add up significantly over the entire data center.

One of the easiest improvements for enterprises to invest in is increasing the density of their racks. The more servers available means more overall compute, storage, and memory to utilize for business applications – but also allows for cost efficiencies. In a smaller square footage, it becomes easier to operate hot/cold aisles with the hot exhaust air, concentrated into a smaller space where it can be vented and cooled faster. This enables the HVAC to operate more efficiently, reducing power usage while improving effectiveness. A data center should be designed with hot and cold aisles, which minimize the mixing of the exhaust (hot) air and the inlet (cold) air. This increases the overall air-conditioning power usage.

The mechanical design of the servers themselves also should not be overlooked. If there are any blockages of airflow, then the fans will have to work harder to do the same amount of cooling – increasing electricity usage while potentially impacting performance. Moving components around within the server chassis can minimize obstructions and maximize airflow throughout the system. In addition, keeping the fan speed as low as possible while keeping the microprocessors in a safe operating state can save energy on a server-by-server basis.

New disaggregated server system designs contribute significantly in this aspect. Such servers have a modular design, composed of independently upgradeable sub-systems like the memory, storage, compute, power, etc. Even reducing the number of cables both internal to a system and externally can reduce airflow blockage. Pay attention to network and power cables typically in the system's rear so that airflow is not blocked. Investigate systems that do not depend on or have significantly fewer cables than standard servers. This allows enterprises to be selective in identifying which aspects of the servers are the throttle point and replace just those systems – while also preserving hardware that doesn’t need to be replaced.

This more efficient approach to hardware resources breaks the 3-5 year “forklift upgrade” data center model, enabling a more sustainable infrastructure that encourages upgrading and replacing only the lacking elements of the servers and systems. This methodology allows for significant cost efficiencies that are incredibly environmentally friendly. The reductions in capital expenditures and hardware repurposing/recycling are accompanied by a substantial reduction in e-waste produced from disposed of servers. Disaggregated servers allow the individual upgrade of sub-systems, which do not require the entire server to be replaced. By only replacing necessary sub-systems (CPU, Memory, Storage, or Networking), the amount of e-waste will be reduced significantly. Based on workload requirements, investigate and determine when to perform different types of upgrades on each sub-system.

Environmental Education

It’s possible to simultaneously do what’s right for the planet and the bottom line. Still, to achieve a greener future, we must work together to make intelligent choices and execute methods that will significantly affect the industry’s footprint. In addition, there is still a lot of education that needs to be done in the industry, helping companies realize the importance and benefits of more eco-friendly data centers.

There are numerous technologies and solutions available to enterprises that counter the adverse environmental effects of data centers, and they deliver the double advantage of optimizing performance while doing so. If we take the proper actions today, our data centers don’t have to harm the environment.

[i] Green Data Centers are Imperative for Enterprise Success | Blog | Digital Realty

[ii] https://www.iea.org/reports/data-centres-and-data-transmission-networks

[iii] https://www.unep.org/news-and-stories/press-release/un-report-time-seize-opportunity-tackle-challenge-e-waste

[iv] https://www.science.org/doi/10.1126/science.aba3758

[v] Data centers 2018. Efficiency gains are not enough: Data center energy consumption continues to rise significantly

[vi] Pricing of Electricity by Country » (Updated February 2022)

About the Author

Michael McNerney is VP of Marketing and Network Security at Supermicro. Michael has over two decades of experience working in the enterprise hardware industry, with a proven track record of leading product strategy and software design. Prior to Supermicro, he held leadership roles at Sun Microsystems and Hewlett-Packard.

AIwire