Covering Scientific & Technical AI | Sunday, December 22, 2024

Apache Unveils Hadoop 2 

Apache Software Foundation, which oversees the 150 or so open source projects under the famous Apache umbrella, this week announced Hadoop 2 – the latest version of the popular software framework for distributed computing.

Apache Hadoop enables data-intensive distributed applications to work with thousands of nodes and exabytes of data, providing foundation for many of the world's big data analytics applications. The framework connects thousands of servers to process and analyze data at supercomputing speed.

The project's latest release has been more than four years in the making, and has now achieved the level of stability and enterprise-readiness to earn the General Availability designation, according to a foundation statement.

Apache Hadoop VP Chris Douglas said, “With the release of stable Hadoop 2, the community celebrates not only an iteration of the software, but an inflection point in the project's development. We believe this platform is capable of supporting new applications and research in large-scale, commodity computing.”

Many companies, such as Microsoft, IBM, Teradata and SAP, have integrated Hadoop into their services. Yahoo!, an early pioneer, hosts the world’s largest known Hadoop production environment to date, spanning more than 35,000 nodes.

New in Hadoop 2 is the addition of YARN, which sits on top of HDFS and serves as a large-scale, distributed operating system for big data applications, enabling multiple applications to run simultaneously for more efficient support of data throughout its entire lifecycle.
Features include support support for:
 - Apache Hadoop YARN for running both data-processing applications (e.g. Apache Hadoop MapReduce, Apache Storm etc.) and services (e.g. Apache HBase)
 - High availability for Apache Hadoop HDFS
 - Federation for Apache Hadoop HDFS for significant scale compared to Apache Hadoop 1.x
 - Binary compatibility for existing Apache Hadoop MapReduce applications built for Apache Hadoop 1.x
 - Support for Microsoft Windows
 - Snapshots for data in Apache Hadoop HDFS
 - NFS-v3 Access for Apache Hadoop HDFS
AIwire