This blog post is for a high-level overview of the two main camps of databases, and a few other database concepts to help you speak intelligently and impress your friends. And if your friends aren’t impressed by eloquent database design discussion, well, get new friends.
Relational vs Non-Relational Databases
A good place to start is to determine the differences between relational and non-relational databases.
We can start with which came first. For decades, relational databases were how big data was managed. These databases are easily imaginable as a Microsoft Excel spreadsheet – rows of data divided into columns with headers that described the data types. There’s a few popular types of RDBMS (relational database management systems): Oracle, Microsoft SQL Server, PostgreSQL and MySQL.
Relational databases are queried using SQL – “structured query language.” This is a string command that is parsed by the database to extract the data you need.
SELECT column_name,column_name FROM table_name;
This line would give you the two columns of data in the table specified. Relational databases require STRUCTURED data to have this system work. Column names must be specified before data can be handled in them, and each row must contain each of the columns. This how the database must be made – it acts agnostically of the app itself and how the app handles the data.
Relational database relationships
We can start off by saying “joins” are a key theme – the joins but the “relation” in “relational database.” Join tables are used to cross-reference data from tables to express multiple relationships between the same instances of a value.
In these kinds of databases, you should be aware of “one-to-many” and “many-to-many” relationships. Tables will often reference each other by one of these relationships. For example, your customers table and your orders table will probably have a “one-to-many” relationship – meaning one customer can have many orders (or sometimes its more obvious for me to think in the inverse: an order can ONLY have ONE customer). To create that relationship, the ‘primary key’ of the ‘one’ side of that relationship (the customer) will be used as a ‘foreign key’ in the ‘many’ database. The customer ID, which is a unique value on the customers table, will be used as a lookup reference in a column on the orders table, where it doesn’t need to be unique (repeating customer IDs just signifies that customer has many orders.
A ‘many-to-many’ relationship might look like a table of authors and books. An author can write many books, and a book can have many authors. Each author and book will have a unique ID that is used in the ‘join table’ – a table that then lists the many author-book combinations by their IDs.
Non-relational databases, or noSQL databases / object databases were developed in the 80’s to solve “object-relational impedance mismatch.” It’s a fancy way for saying relational databases weren’t meshing conceptually with how object-oriented languages need to interact with data. A couple popular non-relational database management systems are MongoDB and CouchDB.
Because it’s stored this way, each data piece, or “document,” doesn’t need to have the same size, number, or types of field categories. Notice how hard it would be to store a list of pets of indeterminate size in a relational database, but how easy it is to store in the example above. You can augment the data structure much easier than with a relational database. There is no query language, or any language parsed by the database, to interact with the data. You interact with the data similar to how you interact with objects/classes in programming – which is why object-oriented programmers tend to favor this structure of databases. You can query the database like this:
Non-relational tables do not have join tables – rather, they often repeat data and rely on expensive searches to satisfy queries across sets of data. They should be made with the app itself in mind, and how that app is going to need the data, in order to optimize data access.
Now that we know what relational and non-relational databases are, let’s compare how they operate.
Relational databases are designed to scale UP and non-relational databases are designed to scale OUT. This means to scale a relational database, you need a BIGGER (more powerful) box, and to scale a non-relational database, you just need MORE boxes. There’s many factors involved with scaling a database… but in general, relational databases are known to be easier to scale due to the uniformity of their data.
ORM, or object-relational mapping, is the attempt to make relational databases behave more ‘object-y.’ ORMs are often used on top of a relational database and provide an interface that allows database transactions that look like non-relational database transactions – meaning you ideally don’t need to write any SQL queries. Although initially this looks like you’re getting the best of both worlds, there’s high costs associated with not interacting with a relational database the way it’s intended. You can lose a lot of efficiency by using an ORM incorrectly.
ACID or BASE – database characteristics and the CAP Theorem
To engage in a conversation about which database you should use for a specific program, you’ll want to be familiar with core database expectations and characteristics.
Atomicity – All interactions – or transactions – with the database are all-or-nothing. They execute completely or the entire transaction fails. Transactions are indivisible (like atoms) – they have no meaning partially, only completely.
Consistency – All interactions will begin and leave with the database in a valid state. This means that an interaction with the database cannot leave the database with data or in a state that it can’t process.
Isolation – Chained transactions behave the same as if they were executed serially.
Durability – Once a change to the database is made, it’s written to memory and remain even if the database is turned off or errors.
How ACID is applied: A database with “high acidity” has high constraints on its data at any given moment. There’s many checks run on the system and its difficult or impossible to run multiple transactions at once.
Consistency – the degree to which the data and system behavior is consistent over time.
Availability – the degree to which the database is available to be interacted with.
Partition Tolerance – the degree to which the database can continue to run under failure.
How the CAP Theorem is applied: The rule of the CAP Theorem is to “pick two.” These three categories are of course ideally all present in a database, but real constraints mean that you usually need to pick two of the three when designing a management system.
Basically Available – each transaction will have a response… although the response might be a ‘failure’ or ‘error.’
Soft state – data is considered ‘soft’ because it is undergoing changes in state between receiving input and resolving consistency.
Eventual consistency – the database will resolve itself into consistency, but usually after the batch of incoming requests is processed. Checks are not performed between individual transactions.
How BASE is applied: BASE arose as a counterpart to ACID, inspired by the needs of more recent styles of data and data interactions. For example, Amazon requires that many people be able to interact and potentially change database items at the same time. Sometimes this results in someone getting a “This book was actually already bought while you were trying to buy it” error, but gives the database the necessary flexibility to be available to a larger audience.
As you build programs and applications, you’ll need to consider your data structure and database selection from the beginning in order to build on top of it successfully. This requires the knowledge to be able to select the right database management system for your needs – and there’s no one-size-fits-all solution.
It isn’t uncommon to use a few database management systems together to solve data storage in different areas of the program. It’s also possible to “migrate databases” – moving from one system to another (although this usually requires your system to be down for a period of time).
Hopefully this blog post has helped build a foundation of how to select the database that fits your needs!