Achieving Snapshot Isolation

In this lesson, we will learn the multi-version concurrency control mechanism, and discuss how it helps to achieve snapshot isolation.

Multi-version concurrency control (MVCC)

Multiversion Concurrency Control (MVCC) is a technique where multiple physical versions are maintained for a single logical data item. As a result, update operations do not overwrite existing records, but they write a new version of these records. Read operations can then select a specific version of a record, possibly an older one.

This is in contrast with the previous techniques, where updates are performed in place and there is a single record for each data item that can be accessed by read operations.

ReedD. P. Reed, “Naming and Synchronisation in a Decentralised Computer System,” Tech. Rep. MIT/LCS/TR-204, Dept. Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 1978. proposed the original protocol in a dissertation in 1978, but many different variations of the original idea have been proposed since then by SilberschatzA. Silberschatz, “A multi-version concurrency scheme with no rollbacks,” Proceedings of the first ACM SIGACT-SIGOPS symposium on Principles of distributed computing, 1982. and Stearns et al.R. E. Stearns and D. J. Rosenkratz, “Distributed database concurrency controls using before-values,” Proceedings of the 1981 ACM SIGMOD international conference on Management of data, 1981.

As the name implies, this technique focuses on the multi-version aspect of storage so that it can be used, theoretically, with both optimistic and pessimistic schemes.

However, most variations use an optimistic concurrency control method to leverage the multiple versions of an item from transactions that run concurrently.

Achieving snapshot isolation using MVCC

In practice, MVCC is commonly used to implement the snapshot isolation level.

Snapshot isolation (SI) is an isolation level that guarantees that all reads made in a transaction see a consistent snapshot of the database from the point it started, and the transaction commits successfully if no other transaction has updated the same data since that snapshot. As a result, it is practically easier to achieve snapshot isolation using a MVCC technique.

Steps

This works in the following way:

  • Each transaction is assigned a unique timestamp at the beginning.

  • Every entry for a data item contains a version that corresponds to the timestamp of the transaction that created this new version.

  • Every transaction records the following pieces of information during its beginning:

    • The transaction with the highest timestamp that has committed so far (say, TsT_s)
    • The number of active transactions that have started but haven’t been committed yet

Performing a read operation

When performing a read operation for an item, a transaction returns the entry with the latest version that is earlier than Ts and does not belong to one of the transactions that were active at the beginning of this transaction. This prevents dirty reads, as only committed values from other transactions can be returned.

There is an exception to this rule: if the transaction has already updated this item, then this value is returned instead.

Fuzzy reads are also prevented since all the reads return values from the same snapshot and ignore values from transactions that committed after this transaction started.

Performing a write operation

When performing a write operation for an item, a transaction checks whether there is an entry for the same item that satisfies one of the following criteria: its version is higher than this transaction’s timestamp, or its version is lower than this transaction’s timestamp, but this version belongs to one of the transactions that were active at the beginning of this transaction.

In any of these cases, the transaction is aborted and can be restarted from scratch with a larger timestamp.

In the first case, if the transaction committed correctly, we would have an entry with version TjT_j committed before an entry with version TiT_i, even though Ti<TjT_i < T_j, which is wrong.

In the second case, the transaction is aborted to prevent a lost update anomaly.

While this prevents a lot of the anomalies, it is still not serializable, and some anomalies would still be possible.

Not all anomalies can be prevented in MVCC

An example of an anomaly that would not be prevented is a write skew.

Write-skew example

The following illustration shows why we can’t prevent a write skew from happening.

In the schedule shown in the above illustration, none of the transactions sees the versions written by the other transaction. However, this would not be possible in a serial execution.

Create a free account to view this lesson.

Continue your learning journey with a 14-day free trial.

By signing up, you agree to Educative's Terms of Service and Privacy Policy