How To Win The Green500
“We hit a new world record. We got 2.499 gigaflops per watt, and trust me, I tried everything I could to break 2500. I could not do it without taking the network out. Which, interestingly, did take us up to 2503.”
That was the summary by Glenn Brook of the National Institute for Computational Sciences (NICS) at the University of Tennessee. Brook, who headed the project that created Green500 winner Beacon, described how it snagged the number one slot during the SC12 Supercomputing conference in Salt Lake City on November 14.
That win beat out one of the industry favorites, the new Titan, a Cray XK7 system at Oak Ridge National Laboratory, which won the number one position in the Top500 list of most powerful supercomputers earlier in the week. Titan, in fact, ended up in the number three slot on the Green500, behind the SANAM system at the King Abdulaziz City for Science and Technology in Riyadh, saudi Arabia (Titan scored 2,143 MFLOPS/W, while SANAM measured 2,351 MFLOPS/W.)
Why did a relatively small cluster system (112 teraflops) using Intel Xeon CPUs and Xeon Phi co-processors, beat out the world's most powerful supercomputer (17,590 teraflops with NVIDIA GPUs and AMD Opteron processors) in energy efficiency? Because Beacon was created to do just that.That win beat out one of the industry favorites, the new Titan, a Cray XK7 system at Oak Ridge National Laboratory, which won the number one position in the Top500 list earlier in the week. Titan, in fact, ended up in the number three slot on the Green500, behind the SANAM system at the King Abdulaziz City for Science and Technology in Riyadh, Saudi Arabia (Titan scored 2,143 MFLOPS/W, while SANAM measured 2,351 MFLOPS/W.)
In 2011, NICS established the Application Acceleration Center of Excellence in order to look at emerging technologies and how they can be applied to scientific simulations. The Center started experimenting with Intel's Xeon Phi co-processors, then in development. The Center provided feedback to Intel on how chips worked with scientific applications, “some of which actually helped shape part of the Xeon Phi 5110P that we're currently using today in Beacon,” said Brook.
The Center then got an award from the National Science Foundation to create a small cluster of Intel MICs and to port and optimize code to that cluster. The original cluster was just 16 nodes. The project was labeled Beacon. Nine months later the success of that project snagged some state funding to expand the project.
“One of the things we wanted to do was to look at energy efficiency,” said Brook. So why not see how well their design could do on the Green500? However, he adds, “At the very beginning we weren't entirely positive if this was a good idea. So we did some back of the envelope calculations, and convinced ourselves that this was kind of a good idea to try out.”
With a presumed power efficiency of about 70%, the team calculated that it could build a system that would operate at about 2.1 gigaflops/watt. “That put us on par with the best system on the last Green500 list,” said Brook. “That's what motivated us to get behind this and move it.”
“But the hardware is just one piece of it,” said Brook, “and in fact it turns out that it's not necessarily the largest piece. Our team had a lot of talented and dedicated people working tirelessly for weeks to make this happen.” That team included people working in India, Germany, and all four U.S. time zones. “We literally passed the baton around and worked around the clock.”This is the first time that an HPC system using Intel chips has won the Green500 top position. That was partly due to the 5110P, which Intel says is not only a high-performance co-processor but adds that, at 225 watts TDP, the passively cooled chip is one of the most energy efficient on the market.
Appro joined in to help build the computer, making this a three-organization effort: NICS, Intel and Appro. The system is based on the Appro GreenBlade platform. Brook described the design as a “balanced co-design effort.” They considered both the hardware and the software load together when creating the machine. That included a custom implementation of HPL for the Xeon Phi clusters, an implementation designed specifically for energy efficiency. The team applied dynamic power management techniques wherever possible on both memory and processors, and minimized power by turning off any unused devices such as USBs.
The end results: 71.4% efficiency. Beacon reached a peak performance of 112,200 gigaflops running the LINPACK benchmark while consuming 44.89 kW of power, falling just short of 2.5 gigaflops per watt.
Unless, of course, you don't mind crashing the network.
You can see the complete list, starting with the first 100, at Green500.org.