NoSQL for Beginners
NoSQL can offer an advantage to those who are entering Data Science and Analytics, as well as having applications with high-performance needs that aren’t met by traditional SQL databases.
What is NoSQL?
NoSQL is essentially the response to SQL’s rigid structure. First created in the early 1970s, NoSQL didn’t really take off until the late 2000s, when Amazon and Google both put a lot of research and development into it. Since then, it’s taken off to be an integral part of the modern world, with many big websites around the world using some form of NoSQL.
So what is NoSQL exactly? Essentially it is a philosophy for creating databases that does not require a schema nor does it store data in a relational model. In fact, NoSQL has a variety of NoSQL Databases to pick from, each with their own specialization and use cases. As such, NoSQL is incredibly diverse when it comes to filling niches, and you can almost certainly find a NoSQL data model to fit your needs.
Differences Between SQL vs NoSQL
While SQL is a specific database and language, NoSQL isn’t, but that doesn’t mean that we can’t look at the general philosophies and differences between the two.
Scalability
When it comes to SQL, the only real way to scale is to upgrade vertically. That means that you need to buy higher-end and more expensive gear than you already have if you want better performance. Scaling with NoSQL is done through horizontal expansion, so all you really need to do is throw in another shard and you’re basically done.
This means that NoSQL is absolutely great for applications that are likely to grow in the future where hardware may very well be a substantial roadblock.
Schema
SQL is built from the ground-up to essentially avoid data duplication. That ultimately means that any SQL project requires an expert designer to spend a long period of time on the schema before implementing it. This step is not only important in the long run to maintain data quality, it can also get quite expensive.
On the other hand, since NoSQL doesn’t necessitate the need for a schema, you avoid the expense and time of that initial design stage. Furthermore, the lack of schema means that most NoSQL databases are incredibly flexible, allowing you to change or even mix data types and models. This makes administrating and dealing with the database much easier.
Performance
With SQL, querying data tends to require you to do so across multiple tables. With NoSQL, all the data is contained in one table, and therefore querying is much easier. This has the side-effect of making NoSQL much better at dealing with high-performance tasks. For example, Amazon DB can do millions of queries a second, which is pretty useful for a global store like Amazon.
Support
The only really big downside when it comes to NoSQL is that it isn’t as established as SQL. Keep in mind that SQL has several decades worth of a head start, and this maturity shows in the ease with which information can be found. Similarly, finding an expert on SQL is much easier than finding one for NoSQL.
Finally, since NoSQL isn’t a singular database or language, but instead several dozen data models, this granularity further divides any potential expertise and makes NoSQL a matter of specialization. Therefore, reliance for information and support will mostly be contained to smaller communities.
Types of NoSQL Databases
While there are over half a dozen NoSQL data models to go with, we’ll cover the four main ones here that are essential to understanding NoSQL databases.
Document Stores
This type of data model allows you to store information as any type of data. This is in contrast to SQL, which relies heavily on XML and JSON, and essentially ties the two together, and can make any query inefficient (or less efficient). Since NoSQL doesn’t use a scheme, there’s no need for relational data storage, and no need to tie those two together.
In fact, there is a NoSQL data model that is XML specific, if you want to go that route.
Graph
Graph or Network Data models are built around the concept that the relationship between the data is just as important as the data itself. In this data model, information is stored as relationships and nodes, with the nodes holding the data, and the relationship describing the relationship between any set of nodes.
As the name might suggest, and if you’ve been following along, this is an excellent data model to use for showing information on a graph. The ability to quickly visualize information from disparate sets of data, especially in relation to each other, can offer a massive amount of insight that doesn’t require pouring through several hundred pages of data.
Key-Value Store
As the name suggests, this data model stores information using keys and pointers to the specific value. Since both keys and values can be any piece of data you desire, Key-value store data models are quite versatile. Its purpose made for retrieving, storing, and managing arrays and is perfect for high volume applications.
In fact, Amazon DB is a key-value store data model, and this data model type was pioneered by Amazon themselves. A key-value store is also a general category under which other data models exist, with some types of graph data-models essentially functioning like key-value.
Column-oriented
Whereas SQL traditionally stores data in rows, Column-oriented stores it in columns. These are then grouped into families, which can themselves hold a nearly infinite amount of columns. Writing and reading are also done by columns as well, so the whole system is really efficient, and is made for fast search & access, as well as data aggregation.
That being said, it isn’t that great for complex querying.
Conclusion
One thing to remember is that NoSQL is not meant as a replacement to SQL so much as it's meant to supplement it. NoSQL itself mostly uses specialized databases to fill in the gaps that are missing from SQL, and while you absolutely can go without SQL if you so chose, NoSQL does not preclude the use of it. Sometimes you may very well find yourself using both SQL and NoSQL.
Bio: Alex Williams is a seasoned full-stack developer and the owner of Hosting Data UK. After graduating from the University of London, majoring in IT, Alex worked as a developer leading various projects for clients from all over the world for almost 10 years. Recently, Alex switched to being an independent IT consultant and started his own blog. There, he explores web development, data management, digital marketing, and solutions for online business owners just starting out.
Related:
- How to Acquire the Most Wanted Data Science Skills
- Working with Spark, Python or SQL on Azure Databricks
- 5 Tricky SQL Queries Solved