Covering Scientific & Technical AI | Monday, December 2, 2024

Gauging Resource Management as Hadoop Clusters Grow 

Among the challenges to broader adoption of Hadoop in the enterprise are supporting multiple users and departments on a single cluster along with leveraging systems handling diverse workloads. Other hurdles include the efficient allocation of resources and costs across departments.

Pepperdata, Sunnyvale, Calif., a specialist in real-time cluster optimization, proposes to tackle these barriers to Hadoop adoption with a new platform designed to measure the actual cost of analytics and other workloads across distributed systems. The approach is said to boost Hadoop adoption in the enterprise by improving utilization and predictability. It also includes a "chargeback" feature designed to measure and allocate the costs of workloads across distributed systems.

“We’ve reached an inflection point where more Hadoop jobs from various departments are stomping on each other," Sean Suchter, Pepperdata CEO and co-founder, asserted. The company's solution released on Monday (Sept. 28) targets the "software gap" in Hadoop with the goal of scaling cluster performance.

The company further argues that with Hadoop accounting for more of the enterprise IT budget, efficiency must be improved for this shared resource. The trick is to ensure that one department's use of Hadoop on shared infrastructure does not slow down other departments. Hence, Pepperdata argues that "detailed visibility" is critical as enterprises seek to offer "internal Hadoop-as-a-service."

The company's visibility approach centers on its chargeback feature that allows IT administrators to track Hadoop usage in real time either by user, workflow or department. That capability is growing in importance as Hadoop adoption shifts from single instance applications in production environments to multi-tenant deployments.

The measurement and resource allocation tool addresses what the company considers the "third phase" of Hadoop deployment characterized by enterprises running more workloads on a single cluster. Hence, Pepperdata argues that greater Hadoop predictability and performance are needed by enterprises than is provided by the out-of-the-box versions of Hadoop.

Pepperdata said its chargeback feature adds reporting of company-wide and department-specific usage of Hadoop on IT infrastrucutre. That, it says, allows IT departments to allocate the costs of running multi-tenant clusters. The functionality also can be used to "measure infrastructure investment and build internal service offerings that can be charged back to relevant business groups."

Hadoop cluster sizes have been growing steadily in enterprises as measured by the soaring number of server nodes. Scaling of Hadoop clusters in the enterprise has roughly paralleled that of early adopters like Internet companies and other hyperscale datacenter deployments. Hence, vendors like Pepperdata have identified a growing requirement for new features designed to remove some of the pain points in scaling Hadoop across the enterprise.

Enterprise customers typically start out with dozens of server nodes for prototype projects. As these projects move move to production, Hadoop clusters often grow to hundreds of nodes as the datasets expand. As enterprise users find more data to correlate and more application use cases for that data, the clusters continue to grow and can reach as high as 1,000 or 2,000 nodes.

Those numbers are expected to grow as Hadoop services scale across organizations.

About the author: George Leopold

George Leopold has written about science and technology for more than 30 years, focusing on electronics and aerospace technology. He previously served as executive editor of Electronic Engineering Times. Leopold is the author of "Calculated Risk: The Supersonic Life and Times of Gus Grissom" (Purdue University Press, 2016).

AIwire