SQL vs NoSQL
Introduction
When we talk about data center topics, the first topics that come to mind are often hardware and network specifications, and even virtualization topics. Another very relevant topic that is not usually associated so much with data centers but that we believe is worth having a dedicated post is the way in which the data stored in these data centers is structured, since, like its The name indicates, they are precisely centers that, among other functions, have those of storing, consulting, modifying and deleting data.
In the world of programming it is often said that a good computer scientist is not the one who programs with fewer lines of code or faster, but the one who can analyze a problem and find the data structures that will make that problem can be solved in the same way. The most optimal and simple way possible, saving both time and space costs (although space costs are often sacrificed over time because disk space is very cheap compared to the value attributed to time).
Thus, in this post, relational databases will be compared with non-relational ones.
Relational databases (SQL)
When we think of databases, we usually imagine directly a database structured in related tables (which would come to be an SQL database), since these are very popular.
The principle of relational databases is based on the organization of information in small pieces (tables), which are related to each other through the relationship of identifiers (a kind of pointers that make some fields of some tables point to other fields from other tables to avoid repeating information).
In the computer field there is a lot of talk about ACID (Atomicity, Consistency, Isolation and Durability). These are properties that relational databases bring to systems and allow them to be more robust and less vulnerable to failure.
Two of the most widely used and well-known relational databases are MySQL and PostgreSQL, both taught in La Salle's database course.
Non-relational databases (NoSQL)
As its name indicates, non-relational databases are those that, unlike relational ones, do not have an identifier that serves as a relationship between one set of data and others, that is, they do not have relationships with each other (there really are those relationships in many languages such as MongoDB, although the grace is to reduce their use to a minimum or even eliminate them). The information is normally organized by documents (JSON files for example) and it is very useful when we do not have an exact scheme of what is going to be stored.
The most popular and most used non-relational database at the moment is MongoDB, although there are alternatives to this oriented to distributed architectures, big data and IoT. Among these is Cassandra, for example.
SQL vs NoSQL
Although both types of databases are perfectly valid, there are cases where it is better to use a relational or a non-relational database. Although there is no way to know which of the two will perform better unless it is tested (and it is still difficult to measure and depends on how the data is structured and what language is used), some guidelines can be followed basic to identify the needs of the database regarding the problem:
- Volume of data: when the volume of data does not grow or does it little by little, it can be an indication to use a SQL database, while if the volume of data grows very quickly at specific times, that can be an indication to use a NoSQL database.
- Process needs: when the process needs can be assumed in a single server, a SQL database can be used, while when the process needs cannot be anticipated or even when it is anticipated that a higher processing capacity will be needed To the one offered by a single server, it is recommended to use a NoSQL database, due to the ease that these usually offer to mount distributed systems.
- Usage peaks: when there are no usage peaks in the system by users (apart from those already foreseen in the analysis of the problem), a SQL database can be used, while if we have usage peaks due to part of the users on multiple occasions, it would be more appropriate to make use of NoSQL.
Conclusions
Although both SQL and NoSQL are valid and used in a great variety of contexts and occasions, when comparing them it can be seen that if you work under a controlled environment, with little growth and with a static behavior and without surprises, SQL can be the best option. However, cloud services are found in uncontrolled environments, with enormous and uneven growth, as well as unpredictable behavior. All this, together with the existing tools to create databases with distributed architectures, means that every day more projects hosted in data centers use non-relational databases.
Authors
Joan Farràs Tasias
Ferran Montoliu Boneu