Topics: AI | Data Science | Data Visualization | Deep Learning | Machine Learning | NLP | Python | R | Statistics

KDnuggets Home » News » 2021 » Mar » Tutorials, Overviews » Understanding NoSQL Document Databases ( 21:n10 )

Understanding NoSQL Document Databases


Out of all the NoSQL database types, document-stores are considered the most sophisticated ones. They store data in a JSON format which as opposed to a classic rows and columns structure.



By Alex Williams, Hosting Data UK

NoSQL databases form the backbone of most of our day-to-day internet usage. From Twitter using FlockDB to Amazon’s DynamoDB, we run into NoSQL on a daily basis.

While there are quite a few data models with dozens of databases each, today we’ll take a look at Document-store.

One of the most popular database models out there, document-store tends to work very similar to key-value in that documents are stored with specific keys to the information. Interestingly enough, the Windows Registry is a document-store database, so it’s a pretty powerful data model.

Image

 

How does a Document Database Work?

 
Ostensibly the idea behind document databases is that you can store any sort of information in a document. That means you can mix and match whatever sort of data you want without really having to worry about the database not being able to parse it. Of course, in practice, most document databases still tend to use some form of schema with a file format and some kind of predetermined structure.

Compared to an SQL database which is both tubular and relational, document store doesn’t have the same foibles and restrictions that SQL does. This means it’s much easier to work with the information at hand, and queries can be much easier to carry out. Ironically, the same sort of actions you can perform in an SQL database, you can also perform in a document-store such as deleting, adding, and querying.

As alluded to earlier, each document needs some sort of key, which is provided to it through a unique ID. When the unique ID is provided in any process, the information in the document itself is read and dealt with directly, rather than being taken out on a column by column basis.

One thing to be aware of when it comes to document databases (and NoSQL databases as a whole), is that they tend to be slightly less secure than SQL databases. As such, you really need to take into account database security, and one way to do that is using SAST. SAST, or Static Application Security Testing, looks directly at source code to find vulnerabilities. You can also run DAST, which is a dynamic version, and that can similarly help with avoiding NoSQL injections.

 

Benefits of Document Databases

 
Probably the biggest benefit of document-store is that everything is available in a single database, rather than having information spread across several linked databases. As such you get better performance compared to an SQL database as long as you don’t use relational processes. Interlinking documents can add a lot of complexity and become frustrating to use, and references don’t really work well in document-store.

Unlike in conventional databases where a field exists for each piece of information, even if there's nothing in it, a document-store is more flexible. In fact, there’s no need for consistency in the documents, and you can essentially store massive amounts of data with relatively no issues.

Similarly, since it’s more flexible, integrating new data isn’t problematic at all. Compared to a relational database where any new type of information must be added to all datasets, document-store only requires you to do it in a few.

More specifically, because schema can be modified without any downtime, or due to the fact that you may not know user needs in the future, document stores are great for these applications:

  • Large eCommerce platforms (Like Amazon)
  • Blogging sites (such as Twitter)
  • Content management systems (WordPress, windows registry)
  • Analytical platforms

Image

 

Disadvantages of Document Databases

 
While the majority of document-store databases have been around for a while now, there still isn’t much documentation outside of small niches and the database’s own wiki or forums. This is compounded by the fact that there are so many document-store databases to pick from, sometimes it can be hard to find specific information without deep dives.

Along with that, there is the possibility of a loss of data, either due to an incorrect configuration because of a lack of familiarity, or because of the use of a single node. Another issue is that document-stores are not really made for running multiple, complex operations or complex queries.

Finally, somewhat of a double-edged sword is the fact that document databases (and NoSQL databases as a whole) are rapidly evolving. Compared to SQL which is relatively well established and isn’t going to see much change, NoSQL can be difficult to keep up with if you don’t have the passion or interest.

 

Examples of Popular Document Databases

 
MongoDB: Easily one of the top NoSQL database engines, it is not only widespread, but it also uses something similar to JSON and has its own query language. We have a great guide covering MongoDB basics.

Elasticsearch: A search engine based on the document-store data model. It’s used for searching and indexing databases and is also pretty straightforward to learn.

CouchDB: Used with both Ubuntu and Facebook, it uses Javascript and is written in Erlang.

BaseX: A lightweight XML-based DBM, it’s open-source and uses Java.
 

Conclusion

 
There’s a good reason that Document-store data models are incredibly popular and widely used, and that’s due to their flexibility. As database applications become more and more complex, being able to easily add datasets or scale-up means less overall hassle and an easier project to deal with.

Document-store also helps with analytics, since a business can store a variety of information easily for reference later. As a few document-store databases are adding graph interfaces, such as MongoDB, it makes it easier to suss out information and patterns that might not have been obvious otherwise.

 
Bio: Alex Williams is a seasoned full-stack developer and the owner of Hosting Data UK. After graduating from the University of London, majoring in IT, Alex worked as a developer leading various projects for clients from all over the world for almost 10 years. Recently, Alex switched to being an independent IT consultant and started his own blog. There, he explores web development, data management, digital marketing, and solutions for online business owners just starting out.

Related:


Sign Up

By subscribing you accept KDnuggets Privacy Policy