Covering Scientific & Technical AI | Monday, January 27, 2025

Citus Data Adds Fast Data to Big Data on PostgreSQL 

Citus Data today announced that one of the most popular databases in the world, PostgreSQL, is able to deliver simple, elastic scale for operational workloads across large datasets. The company is releasing an open-source transparent-sharding extension “pg_shard,” which can be used with both standard PostgreSQL as well as with CitusDB. This drives elastic scale on PostgreSQL for more workloads than ever before. The CitusDB database already enables massive parallelism for analytic workloads; pg_shard extends that scalability to low-latency writes and short-requests.

“PostgreSQL continues to grow as one of the most popular databases in the world. It is the preferred relational database in many organizations and therefore, while its strategic importance and popularity grows, so must its ability to tackle ever larger datasets,” said Umur Cubukcu, CEO and cofounder of Citus Data. “With pg_shard and CitusDB, Citus Data is making PostgreSQL scale horizontally in a simple way, while keeping the benefits of a mature, open ecosystem and taking its relational advantages to the next level.”

With the world of data ever growing, organizations are swiftly realizing that the key to competitive advantage is harnessing and leveraging the value of diverse data types. Organizations that use a horizontally scalable PostgreSQL infrastructure benefit from the key advantages of a modern data architecture, including:

  •      Working with multi-structured data, including native, first-class support for JSON;
  •      Massively parallel, high performance analytics using SQL;
  •      Real-time data ingestion;
  •      High availability through built-in replication;
  •      Elastic scalability on cost-effective, standard hardware;
  •      Mature and function-rich software, with familiar tooling and relational semantics; and
  •      Open, extensible ecosystem.

 

Having an elastically scalable, open RDBMS that runs on commodity machines makes a great complement to any commodity scale-out infrastructure, whether it is in a public, private or a hybrid cloud. CitusDB and the new open-source transparent sharding extension, pg_shard, allows users to tackle heavy mixed analytic and short-request workloads using PostgreSQL without breaking the bank, and without sacrificing relational power. The pg_shard extension to PostgreSQL is easy to use; in particular, it requires no changes to the application layer, no middleware for users to manage, and no additional training.

Pg_shard overcomes difficulties found in other approaches to database sharding and elastic scaling for real-time workloads. Two of the most common approaches are manually partitioning the RDBMS into horizontal partitions, or using a NoSQL approach. These approaches require new skillsets in the organization, significant efforts in reengineering the database setup, and notable changes to the application layer. And even after all of the manual or migration work, they do not provide an acceptable solution for running interactive analytic queries across shards. Pg_shard and CitusDB are built to address all of these issues.

“This is unique,” says Cubukcu. “When dealing with large data volumes – especially machine-generated data such as clickstream data and user event logs, customers want to maintain their relational semantics for massively parallel analytics. They also want to ingest data in real-time and keep the reliability of a familiar, open-source RDBMS ecosystem – and now they can. We are making it easy and simple to bring fast data together with big data on PostgreSQL.”

Pg_shard is built as an open-source PostgreSQL extension for transparent sharding, allowing users to simply run it on existing PostgreSQL instances without being forced to switch to a new database backend or make changes on the application side. In addition, it is simple to add more machines as the user’s data grows. With built-in replication, users automatically benefit from high availability should any part of the network fail. By staying with an open ecosystem, users can leverage the powerful open framework and functionality around a major project like PostgreSQL.

AIwire