Concurrency:
Concurrency is crucial in application design/development as most modern applications need to support concurrent requests. This article explores how Postgres Database manages concurrency.
Database concurrency is the ability to support multiple users and processes working on the database concurrently
Before diving in deep, let's first set our basics right. Understanding what are the expectations from a database (Relational)
ACID:
Atomicity (A): "All or nothing" principle, Operations must either be complete successfully as a whole or be rolled back entirely. For instance, transferring money between accounts involves two steps, which, if not handled properly, can lead to inconsistencies. By encapsulating the transfer within a single transaction using {BEGIN and COMMIT/ROLLBACK} basically executing it as one unit, the database ensures that either the entire transaction succeeds, preserving the DB state, or it fails, leaving the state unchanged.
Consistency (C):
Data must be in a consistent state when the transaction starts and ends.
Isolation (I):
All concurrent transactions should not interfere with each other even though they are running concurrently. Very easy to explain in theory but a bit hard to achieve isolation will explain more in detail soon.
Durability (D):
A successful transaction should write the data back to persistent storage instead of volatile memory even in the case of system failures.
The presence of the above four properties ensures that a transaction completes in expected behavior without any anomalies. When the database possesses these properties, they are said to be an ACID-compliant database.
Read Phenomena (Concurrency Anomalies):
In the concurrent world, multiple transactions run simultaneously and may interfere with each other, leading to various read phenomena. These phenomena can occur when a low level of isolation is used, and they encompass four distinct types of concurrency issues that databases can encounter:
Dirty Read
Non-Repeatable Read
Phantom Read
Lost Update
These concurrency problems arise due to the concurrent execution of transactions and the potential inconsistencies that can result from reading and writing data concurrently.
Dirty Read:
A dirty Read occurs when a transaction (T1) reads uncommitted data from another transaction (T2). This can result in various issues. For instance, consider a scenario where Bob's account balance is initially $1000. In transaction T1, his balance is updated to $110, but T1 has not been committed or rolled back yet. If another transaction reads Bob's balance during this intermediate state and takes actions based on that value, it can lead to problematic outcomes.
Note: ❌ icon emphasises the problem with this phenomenon
Non-Repeatable Read:
Non-Repeatable Read is a phenomenon that can be challenging to grasp, possibly due to its name. It occurs when a transaction (T1) reads the same record twice and observes a different version of the record in the second read. This discrepancy arises because another concurrent transaction (T2) modified the record in between the two reads by making updates. As a result, T1 perceives a change in the record's values, leading to the non-repeatable read phenomenon.
Note: ❌ icon emphasizes the problem with this phenomenon
For the first read in T1, the interest rate was 5% and it got changed to 5.5% by T2 this will be a problem because T1 is able to get the new interest rate which is 5.5% in the second read.
Phantom Read:
Phantom Read is a phenomenon similar to Non-Repeatable Read, wherein a transaction T1 observes a different set of records during subsequent reads due to the recent addition/updation/deletion of data caused by another committed transaction executing concurrently.
Note: ❌ icon emphasizes the problem with this phenomenon
T1 observes a different sum of quantities during the second read compared to the first because it can access the modifications made by transaction T2.
Lost Update:
Lost Update is a phenomenon that occurs when two transactions independently read and attempt to update the same object in separate transactions. The issue arises when the transaction that commits last overwrites the updates made by the previous transaction. As a result, the changes made by the first transaction are lost or overwritten, leading to the lost update phenomenon.
Note: ❌ icon emphasizes the problem with this phenomenon
Expectations from a database include the ability to eliminate the aforementioned anomalies and provide support for ACID properties. Notably, only the serializable isolation level is capable of supporting all ACID properties and resolving all anomalies. However, it is important to note that using the serializable isolation level involves certain trade-offs, which will be explored further in subsequent sections.
Isolation levels to rescue:
In order to fix all the read phenomena, ANSI(American National Standard Institute) defined four isolation levels:
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
The provided illustration represents the presence of read phenomena across different isolation levels. It is evident that the lowest level of isolation, READ UNCOMMITTED, fails to address any of the phenomena. Therefore, READ UNCOMMITTED can be ruled out, which explains why the default isolation level in Postgres is set to READ COMMITTED.
Can we observe these levels of isolation in action?
Pre-requisites:
Install Rancher-Desktop or Docker-Desktop to be able to run the containers. We will be running Postgres in a container. Refer to my earlier post to get to know more about Rancher Desktop installation.
2. Docker compose up and connect to running Postgres container:
3. Create a database of accounts using pgadmin. open http://localhost:5050 in a browser
Get the docker-machine ip of the running Postgres container and use that to configure the new server
Command to get the running docker container IP
docker inspect -f \
'{{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' \
<containerId>
4. Prepare some data for our interesting demo:
READ COMMITTED:
READ COMMITTED isolation level is specifically designed to prevent Dirty Reads. It guarantees that you will always read data that has been committed, regardless of any ongoing transactions. With READ COMMITTED isolation, you can trust that the data you read is in a consistent and stable state, providing a reliable view of the database.
Underneath the surface, the READ COMMITTED isolation level in Postgres leverages MVCC (Multi-Version Concurrency Control) or snapshots. MVCC functions by generating a distinct version or copy of the data for each transaction (identified by XID). These versions allow transactions to exclusively access the committed data versions.
Benefits:
Suitable for scenarios with high contention, potentially offering improved performance compared to other isolation levels.
Challenges:
However, relying solely on READ COMMITTED is not adequate, as it can lead to other phenomena such as Non-Repeatable Reads, Lost Updates, and Phantom Reads.
To address these challenges, additional isolation levels or strategies may be required to ensure consistent and predictable behavior, depending on the specific requirements of the application.
Repeatable Read:
This particular isolation level will avoid Dirty reads, Lost Updates and non-repeatable reads completely.
Avoiding Lost Updates:
Let's see an example of how this isolation level avoids Lost Updates, Postgres throws an error if a Lost update anomaly is detected so that application can retry or fail fast to handle this.
Explanation:
In the given scenario, both T1 and T2 transactions involve reading the balance of 'Srinu'. Initially, T1 updates the balance of 'Srinu' to 90. Subsequently, T2 also attempts to update the balance of 'Srinu'. However, this time, T2 has to wait until T1 either commits or rolls back the changes to avoid the lost update phenomenon.
As T2 waits for T1, it encounters an error message stating "could not serialize access due to concurrent update." when T1 commits the transaction. This error is raised to prevent concurrent updates that could result in inconsistencies. The client application can handle this error by either retrying the transaction to make it successful or failing fast, depending on the specific requirements of the application.
By catching and handling the error, the client application can ensure proper synchronization and prevent lost updates, maintaining data integrity and consistency.
Lost updates anomaly is a serious phenomena because this can lead to serious bugs if we don't avoid it. under the hood, this isolation level uses MVCC to avoid lost updates, will discuss more about MVCC in future sections.
Avoiding Non-repeatable reads:
As I explained before in previous sections, where a transaction reads the same row multiple times during its lifespan, but the values of the row change between the reads. This inconsistency can lead to unexpected and inconsistent query results. To avoid this we can use REPEATABLE READ Isolation level.
Explanation:
In this scenario, T1 reads the balance of 'Srinu' as 100, unaware that T2 has made modifications to the balance. The REPEATABLE READ isolation level ensures that T1 maintains a consistent view of the data within its transaction scope, avoiding any potential issues.
When T1 attempts to modify the balance of 'Srinu', Postgres detects a concurrent update and throws an error: "could not serialize access due to concurrent update." This error is raised to prevent conflicting modifications that could result in data inconsistencies.
By enforcing a repeatable read isolation level, Postgres ensures that transactions see a consistent snapshot of the data throughout their execution. This helps maintain data integrity and prevents unexpected behavior caused by concurrent updates.
Serializable:
The serializable isolation level is the highest isolation level. It prevents all anomalies, including dirty reads, non-repeatable reads, and phantom reads. Under the hood, it uses MVCC and Predicate locking to avoid any modifications to the data fetched during 1st read and it also throws an error so that the other concurrent transaction can retry to eventually make it successful
Highs:
Avoids all read phenomena (Dirty Reads, Phantom reads, Non Repeatable reads, Lost updates)
Can be used where it is important to prevent users from seeing stale or inaccurate data.
Lows:
Reduced concurrency, avoid using this isolation level if your application requires high concurrency.
In the next part, I will cover some advanced concepts which help in achieving the Postgres concurrency, Stay tuned for part-2
MVCC/Snapshots
Locks (Shared Vs Exclusive locks)
Optimistic vs Pessimistic locks
References:
Comments