We know that there are some potential anomalies from concurrent transactions that are not properly isolated.

There are different isolation levels that prevent these anomalies. Stronger isolation levels prevent more anomalies at the cost of performance.

We also provided some examples to illustrate the real-life consequences of these anomalies.

As the previous lesson, Consistency and Isolation, explained, the system should be strictly serializable to be completely protected against any of these anomalies.

Strictly serializable is the strongest isolation level. Then comes serializability, which we will see in this lesson.

A system that provides serializability guarantees that the result of any allowed execution of transactions is the same as that produced by some serial execution of the same transactions(hence its name).

In the context of isolation, the execution of multiple transactions that correspond to an ordering of the associated operations is also called a schedule.

You might ask: what does the same mean in the previous sentence? Let’s explain.

Types of serializability

There are two major types of serializability that establish two different notions of similarity.

View serializability

A schedule is a view equivalent to a serial schedule with the same transactions when all the operations from transactions in the two schedules read and write the same data values (“view” the same data).

Conflict serializability

A schedule is a conflict equivalent to a serial schedule with the same transactions when every pair of conflicting operations between transactions is ordered in the same way in both schedules.

It turns out that calculating whether a schedule is view serializable is a very computationally hard problem. More specifically, it is an NP-complete problem. This means that the time required to solve the problem using any currently known algorithm increases rapidly as the size of the problem grows. So, we will not analyze view serializability further.

However, determining whether a schedule is conflict serializable is a much easier problem to solve, which is one of the reasons conflict serializability is widely used.

Determining whether a schedule is a conflict serializable

To understand conflict serializability, we first have to define what it means for two operations to be conflicting.

Two operations are conflicting (or conflict) if:

  • They belong to different transactions
  • They are on the same data item, and at least one of them is a write operation, where a write operation inserts, modifies or deletes an object

As a result, we can have three different forms of conflicts:

  1. A read-write conflict
  2. A write-read conflict
  3. A write-write conflict

Trivial way

A trivial way to check if a schedule is a conflict serializable is to calculate all possible serial schedules, identify conflicting operations in them, and check if their order is the same as in the schedule under examination.

This is computationally heavy, as it requires us to compute all the possible permutations of transactions.

A more practical way of determining whether a schedule is conflict serializable is through a precedence graph.

Precedence graph

A precedence graph is a directed graph, where the:

  • Nodes represent transactions in a schedule
  • Edges represent conflicts between operations

The edges in the graph denote the order in which transactions must execute in the corresponding serial schedule.

As a result, a schedule is a conflict serializable if and only if its precedence graph of committed transactions is acyclic.

Let’s look at an example to get a better idea about this rule.

Example

The following illustration contains a schedule of three transactions T1T_1, T2T_2, and T3T_3, where R(Ii)R(I_i) and W(Ii)W(I_i) represent a read and write operation, respectively, on item IiI_i.

As we can see in the above illustration, the conflicting operations in this schedule form a precedence graph with a cycle. This means that this schedule is not conflict serializable. The cycle between T1T_1 and T2T_2 means that there must be a serial schedule where T1T_1 executes before T2T_2 and vice-versa, which is impossible.

The following illustration contains a slightly different schedule of the same transactions that are now conflict serializable since there are no cycles in the corresponding precedence graph.

Looking at the precedence graph, we can see the edges only impose the constraints that T1T_1 must be before T2T_2 and T3T_3 in the corresponding serial schedule. This means that this schedule is conflict equivalent to both serial schedules T1T_1, T2T_2, T3T_3 and T1T_1, T3T_3, T2T_2.

The method described above is one way to determine whether a schedule is serializable. However, what we really need is a way to generate a schedule that is serializable ahead of time.

Generating a schedule that is serializable

The notion of a precedence graph is beneficial here.

All we need to do is ensure that no cycle is formed as we execute operations in the schedule. We can achieve this in two basic ways.

  1. Prevent transactions from making progress when there is a risk of introducing a conflict that can create a cycle.
  2. Let transactions execute all their operations and check if committing that transaction could introduce a cycle. In that case, the transaction can be aborted and restarted from scratch.

These two approaches lead to the two main mechanisms for concurrency control.

Mechanisms for concurrency control

There are two main mechanisms for concurrency control:

Pessimistic concurrency control

This approach blocks a transaction if it’s expected to cause violation of serializability, and resumes it when it is safe to do so.

This is usually achieved by having transactions acquire locks on the data they process to prevent other transactions from processing the same data concurrently.

The name pessimistic comes from the fact that this approach assumes that the majority of transactions are expected to conflict with each other, so appropriate measures are taken to prevent this from causing issues.

Optimistic concurrency control

This approach delays the checking of whether a transaction complies with the rules until the end of the transaction. The transaction is aborted if a commit would violate any serializability rules, and is then restarted and re-executed from the beginning.

The name “optimistic” comes from the fact that this approach assumes that the majority of transactions are expected not to have conflicts, so measures are taken only in the rare case that they do.

The main trade-off between pessimistic and optimistic concurrency control is between the extra overhead from locking mechanisms, and the wasted computation from aborted transactions.

Cases where optimistic methods perform well

In general, optimistic methods are expected to perform well in cases where there are not many conflicts between transactions.

This can be the case for workloads with many read-only transactions and only a few write transactions, or in cases where most of the transactions touch different data.

Cases where pessimistic methods perform well

Pessimistic methods incur some overhead from the use of locks. Still, they can perform better in workloads that contain a lot of transactions that conflict. This is because they reduce the number of aborts and restarts, thus reducing wasted effort.

Get hands-on with 1400+ tech skills courses.