Inconsistency in Relational Databases

All we need is an easy explanation of the problem, so here it is.

I generally get confused when the term consistency is used. NoSQL tutorials always refer to the reads whereas Relational tutorials refer to a consistent state (ensuring referential integrity constraint)

When the data is distributed across multiple servers (ex
1 master- n slaves configuration)

  1. Do relational databases ensure consistency in reads? I mean a committed write is immediately available to be read by other transactions? I suspect not given network is involved, making Relational DBs as eventual consistent DBs.
  2. Do relational databases ensure referential integrity constraint well?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

To directly answer your questions:

  1. "Do relational databases ensure consistency in reads?"

A: In short, when the ACID principals, and specifically the Consistency part, were defined, this was long before distributed data was a concept for relational databases. So the Consistency from that standpoint was automatic because there’s only a single server at play. Once a transaction was committed, the server was immediately Consistent with itself, regarding that transaction and all constraints associated with it, there were no other servers to synchronize and commit to. (Note this only focusses on one part to the meaning of Consistency in the ACID principals for the relevancy of OP’s question regarding distributed data.)

Nowadays there’s a multitude of ACID compliant relational database systems out there that are designed to handle distributed data across multiple servers. But let’s take even Microsoft SQL Server and their AlwaysOn Availability Groups feature as an example for a moment. This feature is meant to be a High Availability / Disaster Recovery feature by synchronizing data from your primary server to other secondary servers. It’s also ACID compliant because it can be configured to ensure Consistency across servers (when set to Synchronous mode). In a very rudimentary explanation on how it works is it only fully commits a transaction once that transaction has been synchronized to all secondary servers from the primary server. This guarantees Consistency across all servers, at all times, and allows it to remain ACID compliant in a distributed server environment.

  1. "Do relational databases ensure referential integrity constraint well?"

A: There’s not really a way to quantifiably answer that question, but generally speaking, yes they do. Referential Integrity is one of the main points of using a relational database. When there’s a well relationally defined schema, with proper constraints defined, such as foreign keys for example, a relational database is guaranteed to always enforce the rules of those constraints to ensure it always enforces the appropriate referential integrity. This ties into the definition of Consistency and how a relational database is ACID compliant.

Method 2

I’ll try to answer only 1:

When the data is distributed across multiple servers (ex 1 master- n slaves configuration)

  1. Do relational databases ensure consistency in reads? I mean a committed read is immediately available to be read by other transactions? I suspect not given network is involved, making Relational DBs as eventual consistent DBs.

This depends on the DBMS and configuration settings of the DB topology but the answer can be:
Yes, Relational DBs are not eventually consistent but truly consistent.

In Postgres for example, there are several synchronous_commit settings that allow various level of consistency. From the article Should You Ever Use Synchronous Replication in PostgreSQL?:

synchronous_commit=remote_apply — Commits are sent by the primary to the application only after the standbys defined in synchronous_standby_names have confirmed that the transactions have been applied beyond the WAL, to the database itself.

Basically it says that commits are readable only after they are written to the replicas ("slaves").

There are also "master-master" configurations in Postgres and all other major DBMSs that allow similar behaviour, with more than one write node ("master").

Method 3

Read consistency depends on the active Transaction Isolation Level. Here, you trade off consistency requirements against performance and additional error states.

Most applications get away with rather "weak" isolation guarantees, because a JOIN would still not return inconsistent data. For example,

CREATE TABLE one (a INTEGER PRIMARY KEY, b INTEGER NOT NULL);
CREATE TABLE two (c INTEGER PRIMARY KEY, d INTEGER NOT NULL REFERENCES one(a));

You would normally insert values with surrogate keys like

INSERT INTO one VALUES(nextval('a_seq'), '5');
INSERT INTO two VALUES(nextval('c_seq'), currval('a_seq'));

Technically, these are two transactions, so they become independently visible even on the highest isolation level, but if your query asks for

SELECT FROM one JOIN two ON a = d;

you will still only get complete sets. For most applications, that is sufficient, and avoiding the extra application logic required for stronger isolation levels (where transactions can be rejected seemingly randomly due to concurrent transactions) is preferable.

The database will reject transactions that cause a constraint to be violated, but illegal intermediate states in the middle of a transaction are allowed. For example,

DELETE FROM one WHERE a = 1;
DELETE FROM two WHERE d = 1;

would give an error if such a row existed in one; the same transactions in the reverse order are okay, as is wrapping both statements into a single transaction with BEGIN and COMMIT. If you like to live dangerously, you can also give an ON DELETE CASCADE rule that removes all dependent rows.

Consistency is seen from the point of view of the database server: the query is sent to the server, processed there, and the result set returned. The network latency is irrelevant for consistency: two clients that are not otherwise communicating with each other have no way of finding out which of them has sent its query first, so the server chooses an arbitrary order.

If the two clients do talk to each other, and one notifies the other that it has just inserted data and got an acknowledgement from the server, then if the second client will be able to see the data on its query.

Replicated databases will still keep this property, but at much stiffer performance penalties for higher isolation levels, because acknowledging a transaction needs all nodes to confirm that no other active transaction conflicts with it, and getting this consensus will take time.

At higher isolation levels, clients that have started a new transaction with BEGIN can perform SELECT queries on a frozen view of the database that does not reflect any other transactions that are in flight, but any data read will be locked until the end of the transaction, making concurrent writes fail.

From an application programmer point of view, you would typically perform modification inside the DBMS instead of retrieving data, modifying it and writing it back, e.g.

UPDATE accounts SET balance = balance - 1 WHERE user = '3';

This nicely avoids consistency problems with other transactions even at lower isolation levels, and thus lets you get away without locks: two such UPDATE queries can be sent from different clients, and the DBMS can resolve them internally, and without delay from network turnaround times.

Constraints need not be referential: A constraint that balance needs to be positive would be verified for each transaction, if only one can be executed, then the second is returned with an error (for concurrent transactions, which one is "second" is arbitrary).

These constraint checks are one of the main reasons why you’d want to use an RDBMS, but this is limited to data structures that can be expressed as relations between a finite amount of tables and appropriate indexes, so for example it is difficult to express graphs or hierarchies in a way that allows efficient queries, this is where NoSQL comes in.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply