With Launch of Xeon ‘Broadwell’ Intel’s Advanced Scale Framework Coming into Focus
Intel has talked publicly for more than a year about its next-gen Scalable System Framework (SSF), an integrated compilation of processor, fabric, storage and memory technologies Intel says will deliver more balanced high-performance systems and serve as the platform for exascale levels of computing. With the unveiling last summer of the Omni-Path Fabric interconnect, SSF started to take tangible form. Now comes the Intel Xeon processor E5-2600 v4 (code named “Broadwell”), the first processor within SSF, which Intel said today will deliver up to 44 percent performance improvements in advanced scale, HPC and data center application environments.
Delivered in more than 27 SKUs (all of them available now), the E5-2600 v4 family was designed for two-socket servers to address the workload requirements of workstations, clusters and data centers. The "workhorse" processor covers a wide range of applications with the high-core count variants being particularly valuable to the financial services, life science and geosciences industries.
Source: Intel Corp.
With an estimated 98 percent of HPC servers running on Intel chips, according to industry watcher IDC, there is a massive infrastructure of user organizations, application developers, server vendors and service providers closely watching Intel’s every step, none closer than a processor launch. One Intel insider said “there are probably hundreds of thousands of back orders already in place” for the new Xeon processor.
Rob Enderle, principal at industry analyst Enderle Group, agrees, noting that service providers such as Google and Amazon “are quite literally chomping at the bit.”
“It often takes Intel some time to understand the unique nature of a new market and then design and fabricate a set of tools for that market,” he said. “For HPC that time has passed and the result is a uniquely tuned product for HPC loads. This part is really a showcase for Intel’s massive design capabilities which are largely unique in the market given their level of processor R&D funding. The end result should be a massive performance advantage in this segment once the products are validated by the various system builders. If the parts perform as expected, the end result should be significant improvements in and advantages to those services that can implement the technology quickly.”
Built on 14nm process technology, the E5-2600 v4 delivers improved performance over the prior generation processor through a combination of a larger per-socket core count (up from 18 to 22 cores), faster memory speeds (up to DDR4-2400), and improved performance-per-core,Intel said. This in part is due to an enhancement to the core of these processors that reduces vector floating point multiply latency by 40 percent (from five cycles to three cycles). Intel said that with its integration within SSF, the Omni-Path Fabric delivers up to 24 percent higher messaging rate when used in combination with the new processor.
“With new users coming onboard using high-performance computing to solve different kinds of problems,” said Hugo Saleh, director of marketing for Intel’s High Performance Computing Platform Group, “we know we need to simplify the procurement, the deployment and the maintenance of these systems. Our solution for that is the Intel Scalable System Framework. We're working with our ecosystem and our partners to bring systems small and large, for both on-prem and off-prem, cloud type deployments, to really build out these contours of supercomputers that combine the right ratio of compute, fabric, storage and memory for optimized solutions for different workloads.”
According to Charles King, principal analyst at Pund-IT, Inc., the introduction of new Intel manufacturing processes typically results in feature and performance improvements, noting that the Broadwell architecture should result in systems that are faster, more flexible, power efficient and secure than previous solutions.
“That will impact a wide range of business applications, including traditional high throughput HPC solutions and speed-sensitive high frequency trading processes,” King said. “If previous Intel iterations offer a guide, I expect to see a variety of new performance records announced by Intel OEMs developing Xeon-based solutions. More broadly speaking, these new chips should both enhance and extend Intel's market leadership position in data center-bound silicon.”
Intel will soon publish a variety of reference architectures and reference designs for the Scalable System Framework based on the Broadwell microarchitecture as well as Omni-Path fabric and the Lustre parallel file system, according to Saleh, that will detail baseline HPC system hardware and software requirements to help enable software application portability.
With new advances in the Xeon microarchitecture, coupled with the FMA (Fused Multiply-Add) instruction introduced in the previous generation of the chip family, come per-core vector performance improvements that have implications for the scalability of mod/sim, deep learning and other applications, a major concern in the commercial/industrial HPC user community.
“This is important for applications that cannot be recompiled as well as those that do not scale well or that have hit a scaling bottleneck,” said Saleh. “For example, a vector floating-point multiply now only takes three clock cycles as opposed to the five clock cycles required by the previous generation microarchitecture. For HPC, that is going to be a pretty big improvement.”
Similarly, the performance of radix-1024 and scalar divides have been improved plus the ADC (Add with carry), SBB (Subtract with Borrow), and PCLMULQDQ (Carry-Less Multiplication Quadword Instruction) now complete in one clock cycle, according to Intel.