1. Introduction
As software developers, we are living in the exciting new era of innovations, particularly in the areas of data storages and data processing. The landscape of available solutions, historically dominated by relational database management systems, has been changed dramatically by the so called NoSQL and later NewSQL movements.
Those new kinds of data storages were born to address the needs of modern distributed application architectures, which required unprecedented (at that time) scalability and availability guarantees. The vertical scalability reached the levels where the cost of hardware or/and supporting infrastructure became unreasonable high so the switch to the horizontal scalabilityinstead was inevitable.
In this article we are going to look into the general problem of managing data in distributed architectures, briefly discuss the traditional relational database management systems and then switch our gears to NoSQL / NewSQL solutions.
By no means is the intention of this article to define the winners or emphasize on how bad or good the particular data storage solution is. Rather to outline more options which could help you in building complex distributed systems, picking the right tools for the job knowing ahead no universal solution exists.
2. Distributed systems: the CAP theorem
In 2000, Eric Brewer came up with theorem which is known these days as CAP theorem. It asserts that a distributed system cannot provide all three of the following desirable characteristics at the same time:
- Consistency: A read operation sees all previously completed write operations. In context of distributed system it means that all nodes see the same data at the same time.
- Availability: Read and write operations always succeed. In context of distributed system it means that every request receives a response about whether it succeeded or failed.
- Partition tolerance: The system continues to operate despite arbitrary partitioning due to network failures which prevent some nodes from communicating with others.
We are not going to discuss CAP theorem in detail as it is worth its own article but rather emphasize on its importance in distributed systems design. Although it caused quite a lot of debates over the years (eventually leading to follow-up article from Eric Brewer in 2012 titled CAP Twelve Years Later), most if not all NoSQL / NewSQL solutions are built on the trade-offs outlined by the CAP theorem.
Along this article we are going to highlight how each data stores accounts for CAP theorem, whenever it is appropriate and relevant details are available.
3. Relational data stores
For quite a long time relational database management systems were the de facto standard with respect to data store solutions. From time to time new players were trying to challenge that (for example, object databases) however none of them were able to really disrupt the market. SQL (and its vendor-specific dialects) was a de-facto standard for querying relational data stores and essentially was a must-know language for any software developer out there.
But things started to change radically a few years ago, when the growth of data volumes became exponential, hitting the limits of relational database management systems design.
3.1. MySQL / MariaDB
MySQL is the one of the oldest open-source relational database management systems and is the one of the most widely used data storage at the moment as well. By all means MySQL was and is the great choice as data storage engine for most of the modern applications due to its stability and reliability.
However the road of MySQL took a steep turn in 2009, when Oracle Corporation entered into an agreement to purchase Sun Microsystems, the owners of MySQL copyright and trademark. This acquisition raised a lot of concerns about the future ofMySQL and, as the result, it was forked to what we know today as MariaDB.
To address the scalability and availability demands of modern distributed systems, both MySQL and MariaDB offer clustered editions, MySQL Cluster and MariaDB Galera Cluster respectively. Referring to CAP theorem description, a single MySQL Cluster or MariaDB Galera Cluster is a CP system.
To finish up with MySQL, it is worth to mention quite a few case studies:
- MySQL at Twitter, Another look at MySQL at Twitter and incubating Mysos
- How Twitter Stores 250 Million Tweets a Day Using MySQL
- WebScaleSQL: A collaboration to build upon the MySQL upstream,
3.2. PostgreSQL
PostgreSQL is as widely used as MySQL and essentially those data storages are the most popular open source relational database management systems. Along with many similarities with MySQL, PostgreSQL traditionally is few steps ahead offering better extensibility and a lot of advanced features.
The one of the features which PostgreSQL lacks at the moment is out-of-the box clustering support although there are a couple of options available.
3.3. Others
Despite the fact that we are going to talk about open source solutions along this article, it is worth to mention the big players in the space of commercial relational database management systems, led by Oracle Database, Microsoft SQL Serverand IBM DB2. Depending on the price you are willing to pay, those data stores offer quite decent performance, scalability and reliability characteristics, satisfying the requirements of mostly any distributed system.
4. Why NoSQL?
I think it is fair to say that the NoSQL / NewSQL movement emerged from an urgent need to fill the gaps in the architecture of the modern, highly scalable, responsive and available distributed systems, which deal with massive data volumes. At some point, the “one size fits all” approach backed by relational database management systems became a limiting factor (or even a blocker) for the systems to meet their functional requirements.
Despite huge differences of NoSQL / NewSQL solutions from each other, there are a few things they have in common: simpler design, understanding of CAP theorem trade-offs and support of the horizontal scalability. Essentially, it comes at a price that it is tough or near impossible to find a general-purpose NoSQL / NewSQL data store which fits perfectly to all the needs of your applications. The choice rather depends on the problem NoSQL / NewSQL data store must solve. That is why the combination of relational database management systems and one or more NoSQL / NewSQL data storages is quite often these days.
Over the rest of the article we are going to take a look at many of the most popular NoSQL / NewSQL solutions, split into seven different categories: Key/Value data stores, Columnar data stores, Graph data stores, Document data stores, Time series data stores, Hybrid data stores and Specialized data stores. Each of those data stores is worth at least one dedicated book so we are going to pay our attention to key design decisions and use cases.
5. Key/Value data stores
Key/Value data stores are designed for storing, querying and managing the data as associative arrays (also known as dictionaries or hashes). They are a perfect fit for distributed (or replicated) caching solutions, however many of those data stores go beyond simple key/value manipulations.
5.1. DynamoDB
The intention of this article is to cover only open source solutions in the NoSQL / NewSQL space, but DynamoDB is going to be an exception. The reason for that is the great influence of its underlying principles (described in the paper Dynamo: Amazon’s Highly Available Key-value Store) which have been an inspiration for many open source alternatives.
Amazon DynamoDB is a fully managed NoSQL data store service that provides fast and predictable performance with seamless scalability. Although it is most widely known as a key/value data store, it also is able to manage data in document formats (JSON, XML and HTML). Referring to CAP theorem, DynamoDB is an AP system.
Recommendations on when to use:
- High volume special events
- Social media applications
- Batch-oriented processing
- Reporting
- Real-time analytics
To finish up with DynamoDB, it is worth to mention quite a few case studies:
5.2. Memcached
Memcached is an in-memory key/value data store for small chunks of arbitrary data (strings, objects) which could represent the results of database calls, API calls, or rendered pages. In many respects Memcached is one of the key/value data stores pioneers, very widely used primarily as a caching solution intended to speed up dynamic web applications by alleviating database load. Memcached does not have any out of the box clustering support and because of this design simplification Memcached is very fast and effective but not very reliable (which could be unacceptable for many distributed applications these days).
Recommendations on when to use:
- Caching Layer
- In-memory Session Store
5.3. Redis
Along with Memcached, Redis is one of the most widely known and used key/value data stores. Although thinking about Redis as a key/value store is a valid assumption, Redis does much more, giving away to the hands of the developers the power of complex data structures. To quote http://redis.io:
“Redis is an open source, BSD licensed, advanced key-value store, used as database, cache and message broker. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.”
For distributed deployment, Redis supports clustering (also known as Redis Cluster). Interestingly, from the CAP theoremprospective, Redis Cluster is neither a CP nor a AP system but something along these lines.
Recommendations on when to use (for more details please refer to How to take advantage of Redis just adding it to your stack):
- Caching Layer
- In-memory Session Store
- Distributed locking
- Real-time analytics
- Publish/Subscribe & Queues
- Counters
- Leader boards
To finish up with Redis, it is worth to mention quite a few case studies:
5.4. Riak
Riak KV is a distributed, scalable and fault-tolerant key/value data store. It supports advanced local and multi-cluster replication that guarantees reads and writes even in the event of hardware failures or network partitions. One of the distinguishing features of the Riak KV is complex querying support, which is built on top of full-text search, secondary indexes and map/reduce. From the CAP theorem point of view, Riak KV is AP system which supports tunable consistency and availability levels.
Recommendations on when to use (for more details please refer to Riak: Use Cases):
- Content Management
- Session Storage
- Serving Advertisements
- Log Data
- Sensor Data
- User Profile / Settings / Preference Store
5.5. Aerospike
Aerospike is flash optimized, in-memory key/value data store for mission critical applications requiring blazing speed, cost efficient scaling and no downtime. Aerospike’s clustered architecture is designed to reliably store terabytes of data with automatic fail-over, replication, cross data-center synchronization and linear scalability. In terms of CAP theorem, Aerospikeis by and large an AP system, which additionally tries to provide a high consistency guarantees.
Recommendations on when to use (for more details please refer to Aerospike: Use Cases):
- Caching Layer
- User Profile Store
- Recommendation Engine
- Fraud Detection and Intervention
To finish up with Aerospike, it is worth to mention quite a few interesting resources:
5.6. FoundationDB
FoundationDB is a member of NoSQL / NewSQL family which combines traditional scalability of the key/value data storeswith true multi-key ACID transactions. The presence of SQL layer to access the stored data makes FoundationDB quite different from all other key/value data stores we have covered so far. Sadly, FoundationDB used to be an open source project but it suddenly ceased offering downloads and removed its public repositories after reportedly being bought byApple in March, 2015.
6. Columnar data stores
The columnar data stores are designed for storing, querying and managing structured data. Much like relational database management systems, they use tables, rows, and columns however the names and format of the columns may vary from row to row within the same table. Many columnar data stores are built on a principles described by Google in the paperBigtable: A Distributed Storage System for Structured Data.
The columnar data stores do have a very wide range of applicability so it is very hard to come up with the precise list of recommendations. However, there are a few unique criteria which every of the data stores discussed in this section has, making it a good choice for certain scenarios.
6.1. Accumulo
The Apache Accumulo is a highly scalable structured NoSQL data store based on Google’s BigTable design. Apache Accumulo supports efficient storage and retrieval of structured data, including queries for ranges, provides supportmap/reduce processing and featuring automatic load-balancing and partitioning, data compression and fine-grained security labels. It is worth mentioning that Apache Accumulo operates over the Hadoop Distributed File System (HDFS). Referring to the CAP theorem trade-offs, Apache Accumulo is a CP system.
Recommendations on when to use:
- Data access requires fine-grained security
- Fast read/write access is required for Apache Hadoop integration
- And many, many more …
6.2. Cassandra
As mostly all data stores in this category, Apache Cassandra is build on foundations of Google’s BigTable (and Amazon’sDynamo) and is the right choice when you need scalability and high availability without compromising performance. It offers the linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure, which makes it the perfect platform for mission-critical data. It is totally decentralized with no single points of failure.
One of the killing features of Apache Cassandra is support for multiple datacenters replication, so demanding capability for lowering the latency and disaster recovery. From the CAP theorem stand point, Apache Cassandra is an AP system with tunable consistency levels.
Recommendations on when to use (for more details please refer to Apache Cassandra: Use Cases):
- Write-heavy workloads
- Product Catalog / Playlist
- Recommendation / Personalization
- Fraud Detection
- Messaging
- IOT / Sensor Data
- Real-time analytics
- And many, many more …
To finish up with Apache Cassandra, it is worth to mention quite a few case studies:
- Cassandra In Production: Things We Learned
- Tuning Cassandra @ Target
- Cassandra Data Modeling Best Practices, part 1 and part 2
6.3. HBase
The Apache HBase is the great option when you need random, real-time read and write access to very large volumes of data. Heavily inspired by Google’s BigTable, it is built on top of Hadoop Distributed File System (HDFS), much like Accumulodoes. Linear scalability, strictly consistent reads and writes, automatic and configurable sharding, automatic failover, integration with Apache Hadoop are only a subset of the most important features to highlight with respect to Apache HBasedesign. Getting to CAP theorem basis, Apache HBase is a CP system.
Recommendations on when to use (for more details please refer to Apache HBase: Use Cases):
- Real-time analytics
- Batch Processing
- Time Series Data
- And many, many more …
To finish up with Apache HBase, it is worth to mention quite a few case studies:
- Why we’re using HBase, part 1 and part 2
- Facebook’s New Real-time Messaging System
7. Graph data stores
The family of graph data stores is based on graph theory and as such its data model is built around nodes (vertices), edges and properties to represent and store linked data. This kind of data stores is particularly powerful in solving graph-like problems, for example finding the shortest path between two nodes.
7.1. Neo4J
Neo4j is an open-source NoSQL graph data store which has been publicly available since 2007. Neo4j provides full set of scalable and reliable data store characteristics, including ACID transaction compliance, cluster support, and runtime failover, making it suitable to use for managing graph data in real production scenarios. The commercial offering of Neo4j also complements the set of features with high-availability, live backups, and comprehensive monitoring. From the standpoint ofCAP theorem, Neo4j is CA system.
Recommendations on when to use (for more details please refer to Neoj4: Use Cases):
- Master Data Management
- Network and IT Operations
- Real-Time Recommendations
- Fraud Detection
- Social Network
- Identity and Access Management
- Graph-Based Search
7.2. Titan
Titan is a scalable graph data store optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-node cluster. Titan is a transactional database that can support thousands of concurrent complex graph traversal executions in real time. In addition, Titan provides linear scalability, data distribution and replication, fault tolerance, multi-datacenter high availability and hot backups. Titan does not implement own storage mechanism but instead could be backed by Apache HBase, which makes it a CP system or by Apache Cassandra, which makes it an APsystem.
Recommendations on when to use:
- Batch Processing and Analytics
- Network and IT Operations
- Fraud Detection
- Social Network
- Graph-Based Search
To finish up with Titan, it is worth to mention quite a few case studies:
7.3. FlockDB
FlockDB is a distributed, fault-tolerant graph database which came out of Twitter. By its design, FlockDB is much simpler than other graph databases because it tries to solve fewer problems. It scales horizontally and is designed for real-time, low-latency, high throughput environments. FlockDB is built on top of MySQL storage layer, utilizing another Twitter’s framework called Gizzard for sharding and replication support. Sadly, it seems like development of FlockDB has been put on hold since 2012.
To finish up with FlockDB, it is worth to mention quite a few case studies:
7.4. GraphBase
GraphBase is a commercial, distributed NoSQL graph data store indented to store massive, highly-structured data and manage large graphs. It claims to provide tiny memory and storage footprint, support for built-in traversal heuristics and graph-based transactions. There are not many details available from the vendor to make the trusted assumptions whatGraphBase is in terms of CAP theorem.
Recommendations on when to use (for more details please refer to GraphBase: Use Cases):
- Master Data Management
- Interactions within large networks of people and/or things
- Social Network
- Complex natural models (biological, economic, environmental…)
- Large-scale Intelligence gathering
7.5. InfiniteGraph
InfiniteGraph, another player in the niche of commercial graph data stores, is the distributed and scalable graph database from, designed specifically to traverse complex relationships and provide the foundation for making real-time business decision. InfiniteGraph naturally supports partitioning and distribution preserving the ability to query data across node boundaries. Due to limited information available, it is tough to classify InfiniteGraph in terms of CAP theorem so better not to make any guesses here.
Recommendations on when to use:
- Fraud Detection
- Surveillance
- Prescription Analytics
- Network Security Information and Event Management (SIEM)
- Large-scale Intelligence gathering
8. Document data stores
Another class of NoSQL data stores, known as document data stores, is designed for storing, querying and managing documents (semi-structured data). Instead of relying on usual key / value or table / column / row representations, the data records are stored as full-fledged documents which are organized into collections. It is very natural and easy to dial with model, particularly in the modern web applications where JSON documents is a dominant data exchange and representation format.
8.1. Couchbase
Couchbase positions itself as a NoSQL document data store for interactive web applications. It supports a flexible data schema (as most of the document data stores), is easily scalable, and provides consistent high performance and high availability. Couchbase is a good choice for the web applications where low–latency and high throughput are the critical requirements. The outstanding capability provided by Couchbase is native mobile readiness which allows to build mobile applications with offline support via an embedded database and automatic synchronization. As per CAP theorem trade-offs,Couchbase is an AP system.
Recommendations on when to use (for more details please refer to Couchbase: Use Cases):
- Content Management
- Fraud Detection
- Profile Management
- Digital Communication
- Personalization
- Product Data Management
- Real-Time Analytics
8.2. CouchDB
CouchDB is a document data store built for web: it manages the data as JSON documents, has built-in HTTP API to access and query documents directly from web browser, and uses JavaScript language to index, combine and transform documents. CouchDB is scalable, highly available, partition tolerant, eventually consistent system with automatic conflict detection. Following CAP theorem design constraints, CouchDB is an AP system.
It is worth to mention that CouchDB and Couchbase, although are different products, do have a lot of things in common that sometimes make the choice which one to pick harder. The article Couchbase vs. Apache CouchDB: A comparison of two open source NoSQL database technologies does a fair look on the history of these two solutions and key differences between them.
Recommendations on when to use:
- Content Management
- Product Data Management
- Profile Management
- Personalization
8.3. MongoDB
MongoDB is a horizontally scalable and highly available document data store which manages and stores JSON-style documents, grouped into collections. Among many other features, it supports dynamic schemas, advanced monitoring, complex querying, secondary indexes (including full-text search support), replication, automatic failover and sharding. One of the distinguishing features of MongoDB is ease of getting started and use, by such promoting a faster development process. From the CAP theorem point of view, MongoDB is CP system.
Recommendations on when to use (for more details please refer to MongoDB: Use Cases):
- Operational Intelligence
- Product Data Management
- Inventory Management
- Content Management
- Log Data
- Reporting
To finish up with MongoDB, it is worth to mention quite a few interesting resources:
9. Time series data stores
Time series data stores (also known as TSDBs) is the special class of NoSQL solutions which are highly optimized for handling time series data: arrays of data points ordered by time. Although it is generally possible to use other classes ofNoSQL / NewSQL data stores (for example Document data stores or Columnar data stores) to manage time series data, it is not an adequate replacement in most cases due to the nature of the problem being solved.
9.1. InfluxDB
InfluxDB is a distributed time series, metrics, and analytics data store. Although at the moment of this writing the support of clustering, replication and high-availability is considered to be in an alpha state, there are a lot of other great stable features which InfluxDB offers:
- Powerful SQL-like query language
- Native HTTP(S) API for data ingestion and queries
- Retention policies for data
- On the fly aggregation
Probably the most distinguishing feature of the InfluxDB comparing to other time series data stores is the absence of external dependencies: just download, unpack and run. Although InfluxDB has been designed to be highly available and eventually consistent data store, from the CAP theorem prospective it is neither CP nor AP system.
Recommendations on when to use:
- Metrics / Events data
- Sensor Data
- Real-time Analytics
To finish up with InfluxDB, it is worth to mention quite a few case studies:
9.2. OpenTSDB
OpenTSDB is a scalable time series data store which is designed from the ground up to store and serve massive amounts of time series data without losing granularity. OpenTSDB is built on top of Apache HBase (see please HBase for more details) and as such inherits most of its characteristics: linear scalability, automatic replication, efficient scans and high write throughput.
Recommendations on when to use:
- Metrics data
- Sensor Data
- Real-time Analytics
9.3. Druid
Druid is a distributed, scalable, high-performance, column-oriented, real-time analytic data store built for interactive analytics. Among its most important features are high availability, low latency data ingestion, flexible data exploration, and fast data aggregation. In many respects, Druid is OLAP data store and its focus is more on storing aggregated data (however joins are not supported at the moment). Druid‘s native query language is JSON and under the hood Druid uses different storages for metadata and actual data.
Recommendations on when to use:
- Real-time Analytics
- Metrics / Events data
To finish up with Druid, it is worth to mention quite a few case studies:
- Building a Data Pipeline That Handles Billions of Events in Real-Time
- Announcing Suro: Backbone of Netflix’s Data Pipeline
9.4. Prometheus
Prometheus is a service monitoring system and time series data store designed for reliability. It works exceptionally well for collecting any purely numeric time series and its support of multi-dimensional data collection and querying is a particular strength. Perhaps, the most crucial Prometheus feature is that it is not a strictly distributed system: every server runs independently of each other and only relies on its local storage for core functionality.
Recommendations on when to use:
- Metrics data
To finish up with Prometheus, it is worth to mention quite a few case studies:
10. Hybrid data stores
There is a particular set of solutions which belong to more than one class of NoSQL / NewSQL data stores we have seen so far. Those could be classified as hybrid data stores and typically they provide multiple ways to store and manage data (for example, the same NoSQL data store may be used as a key/value data store for caching and as a document data store for content management).
From operational standpoint, single hybrid data store may replace two or more specialized data stores as such reducing the complexity of the distributed system deployment, monitoring and maintenance. Similarly, from use cases prospective hybrid data stores may be a great fit and serve equally well with respect to data store(s) being replaced.
Interestingly enough, from CAP theorem prospective, in most cases hybrid data stores follow a different trade-offs depending on the way there are being exploited by applications.
10.1. ArangoDB
ArangoDB is distributed, high performance NoSQL data store with a flexible data model to manage documents, graphs, and key/values. Essentially, it is designed to be a general-purpose data store by offering most of the features typically needed for modern web applications. Those include (but are not limited to) schema-less data model, general HTTP REST API, SQL-like query language (AQL) which supports complex filter conditions and joins, ACID multi-collection / single-document transactions. ArangoDB also has a dedicated JavaScript framework for data-centric microservices called Foxx with a variety of microservices available (e.g. user- or session-service). From scalability prospective, ArangoDB supports asynchronous master-slave replication and sharded clusters.
Recommendations on when to use: as the in-place replacement for specialized data stores when different data model projections (documents, graphs, key/values) are required.
10.2. OrientDB
OrientDB is a linearly scalable high-performance operational NoSQL data store which brings together the power of graphs and the flexibility of documents. The strongest points of OrientDB are full support of ACID transactions, multi-master replication, sharding and use of SQL (with some extensions) to manipulate trees and graphs. Presence of HTTP REST API is also worth to mention.
Recommendations on when to use: as the in-place replacement for specialized data stores when different data model projections (documents, graphs) are required. However, OrientDB can go beyond that (as OrientDB: Use Cases highlights) and could be used for:
- Time series
- Chat rooms
- Queue systems
10.3. HyperDex
HyperDex is the next-generation distributed key-value and document NoSQL data store, designed from the ground up with reliability, robustness and performance in mind. It values strong consistency, fault-tolerance, scalability, rich API and ease of use. Although HyperDex claims to have full ACID transactions support, this feature is not a part of the standard HyperDexdistribution and seems to constitute the commercial offering.
Recommendations on when to use: as the in-place replacement for specialized data stores when different data model projections (documents, key/value) are required.
11. Specialized data stores
In this last section of the article we are going to take a look on some NoSQL / NewSQL solutions which do not fall into particular well-defined class or category. They have been designed to address a specific use cases faced by modern web applications, driven by responsive and reactive design principles.
11.1. RethinkDB
RethinkDB is a scalable, distributed, highly-available NoSQL data store built from the ground up for the real-time web. It inverts the traditional polling architecture when data (and changes) is being polled by applications from the data store. Instead, it offers a pushing architecture when updated query results are being continuously pushed to applications in real-time.
To distinguish itself from the data stores which just push raw data changes, RethinkDB allows to subscribe to changes using advanced query language that supports joins, sub-queries, and massively parallelized distributed computation. From the CAP theorem point of view, RethinkDB is CP system.
Recommendations on when to use (see please RethinkDB: Use Cases):
- Collaborative web and mobile apps
- Streaming analytics apps
- Multiplayer games
- Real-time marketplaces
- Connected devices
11.2. Crate
Crate is a distributed NewSQL data store. Based on the familiar SQL syntax, it comes with high availability, resiliency, and scalability and allows to query large volumes of data in real-time. It offers dynamic cluster resizing, auto-rebalancing / auto-sharding and replication, multi-index queries, distributed aggregations and sorting, full-text search, and simple cluster management. Although in the heart of Crate lays distributed query analyzer, planner and execution engine, the large set of its features heavily relies on built-in functionality of other excellent solutions, Apache Lucene and Elasticsearch.
Recommendations on when to use (see please Crate: Use Cases):
- Aggregation Across Large Data Sets
- Data Analytics
- Full text search
- Geospatial queries
12. Conclusions
NoSQL / NewSQL revolution brought to the world a lot of new fresh ideas and huge variety of excellent non-relation data stores to satisfy the needs of modern software systems in managing massive volumes of data. Although with time NoSQL /NewSQL data stores mature and are gaining more and more popularity, the relational database management systems are not going anywhere, they are here to stay, adapting to new challenges. Despite their constraints and endless alternatives, these days MySQL or PostgreSQL are still often a primary choice, successfully managing enormous volumes of data never seen before.
The goal of this article is not to identify the winners and losers but introduce a lot more options available. Picking the rightrelational database management system or NoSQL / NewSQL data store may significantly speed up development process, helps to reduce time to market, and addresses short or long term scalability requirements. It is very common these days to have many heterogeneous data stores working side by side to serve different application workloads and data access patterns.
But, be warned: to get most out of your data store it is worth to invest time understanding its design, constraints, tradeoffs, limitations, and most importantly, data access and durability guarantee. Discovering corrupted data or even complete data loss in production because of bugs, edge cases or pure misunderstanding of the key guarantees of shiny new data store of your choice is not worth following the hottest industry trends. Always do a conscious choice by learning and experimenting, particularly paying attentions to different failure conditions and heavy workloads.
Do you have any other NoSQL or NewSQL systems you would like to suggest? Let us know in the comments!
Commentaires
Enregistrer un commentaire