7 Steps to Understanding NoSQL Databases
Are you a newcomer to NoSQL, interested in gaining a real understanding of the technologies and architectures it includes? This post is for you.
The term NoSQL has come to be synonymous with schema-less, non-relational data storage schemes. NoSQL is an umbrella term, one which encompasses a number of different technologies. These different technologies aren't even necessarily related in any way beyond the single defining characteristic of NoSQL: they are not relational in nature; for right or wrong, Structured Query Language (SQL) has become conflated with relational database management systems over the years.
SQL vs NoSQL database architectures.
So, while I am not personally a fan of the term NoSQL, I can appreciate why others are, given that it quickly implies what it is we are talking about by explicitly stating what we are not talking about. As such, I grin and bear its usage.
On the implementation side, the most popular NoSQL Database Engines today can be found here. But if you're interested in learning more about the NoSQL world, keep reading below.
Step 1: Why NoSQL?
First, to get an idea of what it is we're talking about, and what will be explored in more detail in subsequent steps, read this introduction to NoSQL databases by Robert Rees of ThoughtWorks.
Step 2: NoSQL Basics
This article gives an overview of NoSQL, expands on what it is, discusses situations in which these technologies are relevant, and provides a very brief overview of the types of NoSQL architectures.
The following talk, by Martin Fowler of ThoughtWorks, overviews a lot of the material covered in more depth in later steps.
Step 3: Understanding Key-value Stores
Understanding key-value stores is not a giant leap if you already have a grasp of hash tables and hashing.
For a simple explanation, see this Stack Exchange thread:
Afterwards, see the first hour of this video from UC Berkeley:
Finally, the short third section of the following document, titled "Why implement a key-value store," reviews a few of the important practical points of when to select a key-value store.
Step 4: Understanding Document Stores
Just as hashing is a good fundamental to start with for understanding key value stores, JSON is a good starting point for document stores. See Jennifer Widom's quick overview video on JSON to lay the groundwork.
Microsoft Magazine's Julie Lerman writes a nice overview of document store databases; despite its age (it was written in 2011), it is still relevant and gets the job done. MongoDB also provides a much more concise overview of document stores. Take your pick of either of the following, depending on the level of depth and explanation you feel comfortable with:
Finally, if you come from the relational world and want an explicit and to-the-point comparison of document stores to the relational model, check out the following:
Step 5: Understanding Column-oriented Databases
For an introduction to what columnar storage is, and how it differs from row-oriented storage, read the following from the AWS Redshift guide:
Next, watch this video from Sam Madden of MIT's CSAIL group, which explains column-oriented databases in better detail:
Finally, choose on the the following 2 articles to read more about what sets column-oriented databases apart. The first is written by Alex Weber, and is a shorter read. Dennis Forbes writes the second, and it dives a bit deeper into the advantages that the architecture provides, and when it provides them. Choose the level of depth you're after.
Step 6: Understanding Graph Databases
For our graph database explanations, we will rely on material from Neo4j, likely the most-used graph database implementation of them all. First, read the following overview article:
After reading the above article, have a look at the following short video (a Neo4j webinar) by William Lyon, which provides some additional insight into graph databases:
Finally, have a look at this article, which does a good job of comparing the graph model to the relational model:
Step 7: Bringing it All Together
Sample NoSQL database engines.
While outlining practice projects or lining up a series of tutorials to follow for the practical implementation of, and experimentation with, NoSQL architectures - given the vast number of architectures, implementations, and programming language permutations - this step will help prime the reader for the practical with a different approach.
The first thing to do, after gaining an understanding of the NoSQL architectures, is to see the implementation offerings out there. What follows are a few resources listing NoSQL data engines, their architectures, and descriptions. Remember, NoSQL is a catch-all term; given what you have seen above, these following resources are not comparing apples to apples, but instead providing guidance as to what is available for further investigation, depending on requirements.
- DB-Engines Ranking - a respected resource ranking database engines of all types
- List of NoSQL Databases - a comprehensive, categorized list of NoSQL engines
- A deep dive into NoSQL - this is a complete list of NoSQL databases, though it is dated by a few years; a good cross-reference resource
Now, for those interested in pursuing some practical experience with widely-used NoSQL implementations, the following is a select list of tutorials for working with the database engines appearing in the previously referenced Top NoSQL Database Engines post, using Python.
- MongoDB with PyMongo
- Getting Started with Apache Cassandra and Python
- Make Python Redis super easy with Redis Labs
- HappyBase User Guide
- Getting started with Neo4j and Python
If you are able to make it through all of this material, including the tutorials at the end, with a firm grasp of what is going on, I would dare say you now understand NoSQL databases.