NoSQL vs. NewSQL: Choosing the Right Database for Your Business
For decades, relational database management systems (RDBMS) have been the go-to solution for storing and retrieving data. But due to the rapid growth of data volume just about every business is experiencing and the high cost of scaling up (i.e., using increasingly larger, specialized servers, which can easily cost millions of dollars), traditional relational databases such as Oracle, IBM DB2, and MySQL to manage this big data has driven many businesses to seek affordable scale-out solutions (using commodity servers to scale out horizontally with a low cost per server, added one at a time).
There are two broad categories of scale-out databases that serve as an alternative to traditional RDBMSs (NoSQL and NewSQL). To better understand the pros and cons of each category, I’ll review four popular platforms in the NoSQL or NewSQL categories.
NoSQL Databases
NoSQL or “Not only SQL” refers to any database that stores and models data in a format other than relational tables. These databases are popular because they cost-effectively scale out on commodity hardware and can handle a greater variety of data formats (such as hierarchical and graph).
NoSQL solutions will appeal to businesses that want to store large volumes of semi-structured data (including social media feeds, emails, and text documents). Popular NoSQL databases include MongoDB, a document store, and Apache Cassandra, a partitioned row store.
MongoDB
MongoDB has become a popular database because it’s easy to use for developers. However, MongoDB’s strength in flexible schema can also be a curse, as developers can create possible data corruption in accessing or aggregating data.
Good For: MongoDB would be a good choice for simple web applications (i.e., profiles, content management systems, form data, messaging data, log data) that do not share data across documents (à la rows).
Not Good For: Without support for transactions, joins, and SQL, MongoDB would not be good for existing SQL or OLTP applications, or more complex web applications that share data across documents.
Apache Cassandra
Apache Cassandra is an open-source, partitioned row store inspired by Google’s Big Table and Amazon Dynamo, and developed by Facebook. Cassandra is best known for its fast ingest ability and replication across multiple datacenters.
Good For: Cassandra would be a good option for simpler web applications (for example, do not share data across rows) that require high ingest and want high availability across datacenters.
Not Good For: While Cassandra theoretically provides strong consistency with its “tunable” consistency framework, in practice almost everyone uses eventual consistency for performance reasons. It’s not a good choice for applications (e.g., enterprise, web, or mobile) that need strong consistency, SQL, or more complex functionality.
NewSQL Databases
NewSQL refers to a new scale-out architecture and class of modern RDBMSs that seek to provide the scalability of NoSQL systems for operational workloads, while remaining ACID (Atomicity, Consistency, Isolation, Durability) compliant. They can differ greatly in their architectural approach.
NuoDB
NuoDB is a NewSQL database designed to switch between on-premise deployments and the cloud. Its three-tiered approach splits its architecture into administrative, transactional, and storage layers. NuoDB also has a proprietary scale-out and storage layer, as well as an unusual, complex transactional model.
Good For: As a NewSQL solution, it’s optimal for powering real-time, operational applications requiring joins and transactions. It also excels at geographical distribution of computing nodes.
Not Good For: NuoDB is not the right choice for applications with non-relational data or for a company that does not want to maintain another non-Hadoop infrastructure, especially one where the distributed computing functionality is proprietary and opaque
Splice Machine
Splice Machine is a Hadoop RDBMS built on top of HBase/Hadoop and Apache Derby, a popular, Java-based ANSI SQL database. Splice Machine differentiates itself from NoSQL solutions with its ability to support full ACID transactions, allowing for thousands of concurrent users to access and alter data simultaneously. Unlike NuoDB, it uses the open-source (instead of proprietary) Hadoop platform as its scale-out layer.
Good For: Splice Machine would be a good choice for existing SQL applications and emerging digital applications that have data dependencies across multiple rows or documents. It works well for any company that wants to leverage an existing Hadoop deployment.
Not Good For: Splice Machine would not be a good choice for applications with non-relational data.
Summary
The question of whether to use a NoSQL or NewSQL solution may seem overwhelming at first, but after closer evaluation of each solution's strengths and weaknesses, businesses can confidently select the right database to fit their needs:
About the Author:
Monte Zweben is co-founder and CEO of Splice Machine. A technology industry veteran, Monte’s early career was spent with the NASA Ames Research Center as the deputy chief of the Artificial Intelligence Branch, where he won the prestigious Space Act Award for his work on the Space Shuttle program. Monte then founded and was chairman and CEO of Red Pepper Software, which merged in 1996 with PeopleSoft, where he was VP and general manager, Manufacturing Business Unit. In 1998, Monte was the founder and CEO of Blue Martini Software. Blue Martini went public on NASDAQ in 2000, and is now part of JDA. Following Blue Martini, he was the chairman of SeeSaw Networks, a digital, place-based media company. Monte is also the co-author of Intelligent Scheduling and has published articles in the Harvard Business Review and various computer science journals and conference proceedings. Monte currently serves on the Board of Directors of Rocket Fuel as well as the Dean’s Advisory Board for Carnegie-Mellon’s School of Computer Science.