Inside SAP’s Own HANA Accelerated Systems
How much iron does it take to run enterprise application giant SAP? Less than you might think, and even less now that the company has moved all of its internal systems to its own HANA in-memory database. The setup at SAP may also say a lot about how little iron it might take to run the key analytics and back-end systems of the world's major corporations if companies follow SAP's lead to in-memory processing.
The move to in-memory also has caused some cultural and operational changes, some of which were not expected but have, say SAP executives, made better information more readily available.
A decade ago, about the time that Oracle started getting serious about enterprise application software, its main rival, SAP, made the switch from Oracle databases for its back-end systems to IBM's DB2. The company has had a variety of systems over the years, including back-end databases running atop HP-UX systems from Hewlett-Packard. Mike Golz, who is chief information officer for SAP's North American operations, did not want to be more specific about past systems when he talked to EnterpriseTech about the company's current setup.
This is no surprise. Most public companies are hesitant to talk specifics about the hardware models and software versions they choose, or the companies that supply them, and SAP is no different given the fact that it has to partner with all of the key operating system, database, server, storage, and networking gear providers to get its code certified to run atop these components of a modern system.
But Golz was happy to talk about the current SAP systems now that the company has moved its entire customer relationship management (CRM) and enterprise resource planning (ERP) stack to HANA. These systems support over 67,000 end users, who in turn support the company's 233,500 customers and tens of thousands of partners worldwide. They counted the €16.8 billion that SAP brought in during 2013, including the €664 million from HANA, as well as paying the bills and the employees. (That HANA business, incidentally, grew 69 percent last year and is the fastest-growing product in SAP's 40-year history, even surpassing the R/3 system that started the ERP craze back in the early 1990s.) The Business Warehouse parallel data warehouse at SAP has also gone in-memory, and ironically, because so many complex questions can now be answered directly out of the in-memory ERP system, without effecting transaction processing rates, the role of that data warehouse will be changing.
SAP's own HANA use started three years ago, when it added a few standalone machines running HANA to accelerate specific CRM and BusinessObjects analytics routines. Then a little more than two years ago, SAP created a "side-car system" called Co-Pa, which was an accelerator to do calculations across all of the company's invoices to analyze the profitability of each deal. Such calculations were too complex and heavy to run in the relational database system without impacting the company's transaction processing response times, so the invoice data was sucked out of the production systems, analyzed in HANA, and then summary results were dumped back into the DB2-backed ERP system.
As HANA matured, SAP got serious about it. In January 2012, the company moved the entire Business Warehouse data warehouse off DB2 and onto HANA. The Business Warehouse cluster has a total of eight server nodes based on Intel's Xeon E7 processors, plus one spare for failover. These four-socket machines in the HANA cluster use ten-core "Westmere-EX" Xeon E7 v1 processors and each have 512 GB of main memory. All HANA systems require Xeon E7 processors and also require SUSE Linux Enterprise Server 11, and SAP makes no exceptions for itself. The nodes have a total of 4 TB of memory and 320 cores for processing. The HANA database server has four SLES-based application servers that interface with customers and that run NetWeaver BW 7.30. The company uses 10 Gb/sec Ethernet links between the nodes, but doesn't want to talk about what mix of server, switch, and storage brands it uses.
In March 2013, SAP took the next step and ported its CRM 7.0 suite from relational databases to HANA in-memory systems. The database server behind the CRM system is a single eight-way Xeon E7 v1 server with 4 TB of main memory and 80 cores. (The redundant system in the hot backup site is configured the same, for a total of 160 cores.) This system runs SUSE Linux Enterprise Server 11, as do the ten application servers that hang off of each CRM database machine, running the actual CRM code. Golz says that SAP could have got by with a smaller memory footprint if it wanted to, with the CRM database being only 1.1 TB after compression. But it is better to have spare capacity.
In August last year, SAP moved the ERP 6.0 system running DB2 to a similar HANA setup, again with an eight-socket Xeon E7 v1 machine running SLES and with 4 TB of main memory.The net database for ERP ranges in size from 1.8 TB to 2.1 TB, after it is compressed by HANA. The ERP database system has seven application servers wrapped around it that run Microsoft's Windows Server.
SAP has a wide area network link from its primary datacenter in Sankt Leon-rot, Germany, which is south of Heidelberg, to a backup datacenter in an undisclosed location that has fully replicated, hot backup ERP and CRM systems ready to go if the primaries fail for any reason. SAP does not have regional datacenters for internal systems its various geographies, a practice that was common many years ago but which is becoming less so these days. All of SAP's systems are in one place for the entire SAP global community, with spares in a hot backup site. This strategy is possible because of the relatively cheap and widespread Internet access that is available globally these days.
The move from DB2 to HANA is doing more to change SAP's operations than moving from a database created by an outsider to one that is created and supported by SAP itself. And it is about more than "killing batch and going real-time," as Golz puts it.
Here is an example. In the past, as each week came to a close, SAP's managers and sales people had a small window in which they had to get their data in about deals that were in process and deals that had just closed. The window for reports to be discussed on Monday morning closed on the prior Thursday morning.
And there is a lag of a few days in the data because it took time to get it loaded into the systems and then into the data warehouse for summary reports. But this is no longer the case. Because of the speed of HANA, the whole discussion of the currency of the data in reports is "going away." Now, SAP's managers can query the actual sales data in the ERP and CRM systems, and more importantly perhaps, sales people know this and they are much more careful about clarifying the state of their deals because there will be no excuses about the lag between when data was entered in the systems and pushed through to the data warehouse to generate reports.
"Now, there is a continuous process of clarifying things and not trying to do it at quarter end, but rather spreading it out and clarifying things as they happen," explains Golz. While speed is important, and many key functions are running dramatically faster in HANA as they were in DB2, the immediacy of asking questions in HANA is what is transformational.
"The whole mechanics of ETL and getting data out of the transaction systems and putting it into a business warehouse and running reports against the warehouse because your transaction systems are not fast enough – that whole topic of building cubes and logic compilers and reports in the warehouse, over time basically goes away as more and more reporting and analytics move back to the transactional system. So the BW system that we have is now really limited to information where BW has a functional advantage of processing the data."
For example, if you look at financial reports, the data warehouse is great at reporting where there are time-dependent hierarchies, currency conversions, and certain kinds of simulations, says Golz. You do other kinds of queries directly on the transaction systems.
"This is where the big savings lie, because over time, these business warehouse systems get massive. And the amount of effort that goes into building the models, building the reports, extracting the data, checking the data – it takes whole armies of people to get the data from A to B – all of that goes away if you go straight against the transactional system and the limitation of data volume and response time goes away. This is a massive simplification in the landscape."
The thing to remember is that most enterprise data warehouses are not on the same scale as the 300 PB at Facebook or the 250 PB at Yahoo. These are two of the largest ones that EnterpriseTech knows about, and they are so large that relational databases cannot handle them and they certainly cannot be crammed into main memory even across a massive cluster of servers.
Most production databases at large enterprises weigh in at tens of terabytes, and data warehouses are on the order of tens to hundreds of terabytes. This is now well within the capacity of a small cluster of HANA machines – or one big one once Hewlett-Packard and SGI get their HANA-tuned, high-end NUMA machines out the door later this year.