Database Key Terms, Explained
Interested in a survey of important database concepts and terminology? This post defines 16 essential database key terms concisely and accurately.
This is the data about the data. Metadata describes data relationships and characteristics, and is often referred to as a data dictionary, though that seems to be a term more prevalent in the relational world (though not exclusive to it by any means).
A database is consistent when all of its imposed integrity constraints have been satisfied. Consistency can only be ensured if each database transaction, or data access request, begins in a known consistent state; otherwise, guarantees of consistency cannot be made. A database containing data that cannot be verified as consistent is problematic, especially to the extent to which its inconsistency is not known.
Data redundancy is a situation in a database in which copies of a given piece of data are housed in 2 different places. This redundancy can be achieved if data is held in multiple places in the same database, in multiple databases on the same computer, or in multiple databases across multiple computers, perhaps even using different database management server software. This redundancy can be leveraged for both data access and permanence.
ACID is an acronym referring to a set of database transaction properties, namely Atomicity, Consistency, Isolation, and Durability. A single database operation, or transaction, must be atomic, consistent, isolated, and durable in order to be valid. In other words, the set of steps which make up a transaction must either be completed in full or rolled back (atomic), consistent (see above definition), must be isolated from other potential transactions, and must be permanent (durable).
11. CAP Theorem
The CAP Theorem concludes that it is not possible for a distributed computer system (including distributed database management software and their housed data) to provide all of the following guarantees at the same time: Consistency, which states that each computer node contain all of the same data at the same time; availability, which states that each database request is responded to as either successful or failure; and partition tolerance, which states that the database system continues operating even when not all nodes are connected to one another and suffer communication issues. At best, only 2 of these guarantees can be made concurrently.
Sharding is a technique for partitioning data. A database shard is a horizontal (think rows, not columns) partition of data within a database, with each partition being referred to as a shard. These shards are then spread across computer nodes, in order to balance the load. Data may then be included in one or more of these shards.
13. Key-value Store
Key-value stores are one of the predominant NoSQL architectures. Key-value stores are simple paradigms at a high-level: assign values to keys to facilitate the access and storage of these values, which are always found via their keys. Data values are added to the database with a identifying keys; the same data values are later accessed with the same key. If you have an understanding of hash maps then you are a step ahead (dictionaries in Python). Redis is an example of a key-value store.
14. Document Store
A document store is another NoSQL database architecture. As is the requirement for NoSQL engines, MongoDB does not use a relational schema; instead, document stores use JSON-like "documents" to store data. The document is akin to a record, housing fields and values. MongoDB is a free and open source exemplar.
Another NoSQL architecture, column-oriented databases' rows actually contain what we most usually think of as vertical data, or what is traditionally held in relational columns (Rows contains columns? Huh?). The advantage of column-oriented database design is that some types of data lookups can become very fast, given that the desired data could be stored consecutively in a single row (compare this with having to search and read from multiple, nonconsecutive rows to attain the same field value in row-oriented database). Cassandra is a popular example of a column-oriented database.
16. Graph Database
The graph database is premised on edges acting as relationships, directly relating data instances to one another. Graph databases have advantages in some use cases, including potentially in certain data mining and pattern recognition scenarios, given that associations between data instances are explicitly stated. Neo4j is the most widely-used graph database available.