Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2021 » Feb » Tutorials, Overviews » Column-Oriented Databases, Explained

Column-Oriented Databases, Explained


NoSQL Databases have four distinct types. Key-value stores, document-stores, graph databases, and column-oriented databases. In this article, we’ll explore column-oriented databases, also known simply as “NoSQL columns”.



NoSQL has become increasingly popular as a complementary tool to traditional SQL approaches to databases and database management. As we know, NoSQL doesn’t follow the same relational model that SQL does, which allows it to do quite a lot of powerful things. More importantly, it’s very flexible and scalable, which is excellent for newer projects that don’t have the time or budget to spend on designing an SQL database.

As such, we’re going to take a bit of a deeper look at how different data models work, with this article taking a look at column databases. If you’d like to take a more general look, you should check out our NoSQL for Beginners guide.

 

How Does a Column Database Work?

 
At a very surface level, column-store databases do exactly what is advertised on the tin: namely, that instead of organizing information into rows, it does so in columns. This essentially makes them function the same way that tables work in relational databases. Of course, since this is a NoSQL database, this data model makes them much more flexible.

Image

 

More specifically, column databases use the concept of keyspace, which is sort of like a schema in relational models. This keyspace contains all the column families, which then contain rows, which then contain columns. It’s a bit tricky to wrap your head around at first but it’s relatively straightforward.

Image

 

Image

 

By taking a quick look, we can see that a column family has several rows. Within each row, there can be several different columns, with different names, links, and even sizes (meaning they don’t need to adhere to a standard). Furthermore, these columns only exist within their own row and can contain a value pair, name, and a timestamp.

If we take a specific row as an example:

Image

 

The Row Key is exactly that: the specific identifier of that row and is always unique. The column contains the name, value, and timestamp, so that’s straightforward. The name/value pair is also straight forward, and the timestamp is the date and time the data was entered into the database.
Some examples of column-store databases include Casandra, CosmoDB, Bigtable, and HBase.

 

Benefits of Column Databases

 
There are several benefits that go along with columnar databases:

  • Column stores are excellent at compression and therefore are efficient in terms of storage. This means you can reduce disk resources while holding massive amounts of information in a single column
  • Since a majority of the information is stored in a column, aggregation queries are quite fast, which is important for projects that require large amounts of queries in a small amount of time.
  • Scalability is excellent with column-store databases. They can be expanded nearly infinitely, and are often spread across large clusters of machines, even numbering in thousands. That also means that they are great for Massive Parallel Processing
  • Load times are similarly excellent, as you can easily load a billion-row table in a few seconds. That means you can load and query nearly instantly.
  • Large amounts of flexibility as columns do not necessarily have to look like each other. That means you can add new and different columns without disrupting the whole database. That being said, entering completely new record queries requires a change to all tables.

Overall, column-store databases are great for analytics and reporting: fast querying speeds and abilities to hold large amounts of data without adding a lot of overhead make it ideal.

 

Disadvantages of Column Databases

 
As it usually is in life, nothing is perfect and there are a couple of disadvantages to using column-oriented databases as well:

  • Designing an indexing schema that’s effective is difficult and time consuming. Even then, the said schema would still not be as effective as simple relational database schemas.
  • While this may not be an issue for some users, incremental data loading is suboptimal and should be avoided if possible.
  • This goes for all NoSQL database types and not just columnar ones. Security vulnerabilities in web applications are ever present and the fact that NoSQL databases lack inbuilt security features doesn’t help. If security is your number one priority, you should either look into relational databases you could employ or employ a well-defined schema if possible.
  • Online Transaction Processing (OLTP) applications are also not compatible with columnar databases due to the way data is stored.

 

Are Column Databases Always NoSQL?

 
Before closing out, we do want to point out that column-store databases aren’t necessarily NoSQL only. Often the argument goes that column-store is so different from relational database models, that it falls squarely in the NoSQL camp.

This isn’t necessarily always the case and the NoSQL vs SQL argument, in general, is pretty complex.

In the case of column-store databases, they are nearly identical to SQL methodologies. For example, keyspaces act as schema and therefore there is still some form of schema management. Another example is that the metadata can sometimes look exactly like that of a typical relational DBMS. Ironically, column-store databases also tend to be SQL and ACID compliant.

Even more so though, NoSQL databases are usually either document-store or key-store, of which column-store is neither.

Therefore, it’s hard to argue that column-store is purely NoSQL.

 

Conclusion

 
While column-store databases are incredibly powerful, they do have their own sets of issues. For example, the way that it writes data means a certain lack of consistency since columns require multiple writes to the disk. This is compared to relational databases where row-data is written sequentially.

Nonetheless, column-store remains one of the most used data models out there.

 
Bio: Alex Williams is a seasoned full-stack developer and the owner of Hosting Data UK. After graduating from the University of London, majoring in IT, Alex worked as a developer leading various projects for clients from all over the world for almost 10 years. Recently, Alex switched to being an independent IT consultant and started his own blog. There, he explores web development, data management, digital marketing, and solutions for online business owners just starting out.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy