Covering Scientific & Technical AI | Wednesday, January 8, 2025

NSF ACCESS: Open Storage Network Welcomes New Campus Computing Partners 

Jan. 6, 2025 — Several institutions involved with the U.S. National Science Foundation (NSF) ACCESS program are leading efforts to expand the Open Storage Network (OSN), which strives to provide low-cost, quality, sustainable distributed storage cloud for the research community.

Specifically, the OSN is enabling more pods – storage nodes – at additional sites throughout the U.S. and welcomes new campus computing partners.

This map illustrates the current 17 active OSN pod sites. Source: Open Storage Network

“The past decade has seen rapid growth of data sets from scientific instruments, simulations, internet postings and other sources – allowing new insights through big data analytics and more recently training of AI models,” said Johns Hopkins University Bloomberg Distinguished Computer Science Professor Alex Szalay, who is the founder of the OSN project. “This torrent of data has created a need for a platform that supports simple and cost-efficient storage and sharing of large volumes of data.”

In 2018, the OSN set out to address that need with a pilot that was sponsored by the NSF. Since then, the platform has evolved into a large-scale production storage system that supports allocations of up to 50 terabytes at no charge to researchers via ACCESS, as well as paid participation for entities with larger needs.

“While the OSN started as a way to store and share data across geographically distributed sites, it has also been useful as a way to share data between different groups on individual campuses,” said Christine Kirkpatrick, who is director of the Research Data Services Division at the San Diego Supercomputer Center (SDSC) at UC San Diego. “OSN continues to expand with several new pod sites for projects by the principal investigators writing campus computing storage into their proposals.”

She explained that data on the OSN is accessed via the S3 RESTful API – a facto standard that supports easy access to shared data across geographic and administrative boundaries for both open and protected data sets. Equally important is the variety of software utilities that support different types of access, including high-speed data transfer via utilities such as Rclone or Globus, gateways that map OSN storage to a local network file system API, application libraries that provide direct access to OSN storage for R, Python, Julia and other programming environments – as well as research data management platforms that use the network’s back-end storage.

“Use of the S3 API makes it possible to structure the OSN as a distributed network of storage Pods, where each participating site houses one or more pods containing one or two petabytes (PB) of storage, connected to Internet2 at speeds ranging from 10 to 100 gigabits per second,” Goodhue said. “While every pod conforms to a standard system design and runs that same software stack, the OSN also supports ‘virtual pods’ where the backing store is a subset of a larger system.”

Over the past few years, the OSN has grown to 17 sites housing more than 35 petabytes of storage and continues to grow. The OSN Pod hardware design and software stack are operated, maintained and enhanced by a distributed engineering and operations team drawn from participating sites. This collaborative approach allows the pooling of expertise and avoids dependence on any single individual or institution for continued success. Oversight and governance are provided by a leadership team that also draws from participating sites.

“We built the Open Storage Network on the premise that a collaborative effort, rooted in the research computing community, could address an acute need for simple and cost-efficient storage and sharing of scientific data. Results so far have exceeded expectations,” Szalay said.

SDSC, which is part of the School of Computing, Information and Data Sciences at UC San Diego, has been leading OSN program for years along with collaborators at Massachusetts Green High Performance Computing Center, the National Center for Supercomputing Applications (NCSA), Renaissance Computing Institute, Johns Hopkins University and Rice University.

To learn more, please see the detailed OSN webinar slides regarding becoming a campus computing storage partner.


Source: Kimberly Mann Bruch, SDSC

AIwire