Cassandra's Consistency Levels

Let's study some available consistency levels in Cassandra and also look into the mechanisms that help Cassandra to favor high availability and performance.

When Cassandra nodes communicate with clients they can specify different consistency levels, which allows them to optimize for consistency, availability, or latency accordingly.

The client can define the desired read consistency level and the desired write consistency level, where each consistency level provides different guarantees.

Consistency levels

Some of the available consistency levels are as follows:

ALL

A write must be written to all replica nodes in the cluster for the associated partition.A read returns the record only after all replicas have responded, while the operation fails if a single replica does not respond.

Note: This option provides the highest consistency and the lowest availability.

QUORUM

A write must be written on a quorum of replica nodes across all datacenters.

A read returns the record after a quorum of replicas from all datacenters has replied.

Note: This option provides a balance between strong consistency and tolerance to a small number of failures.

ONE

A write must be written to at least one replica node.

A read returns the record after the response of a single replica node.

Note: This option provides the highest availability but incurs a risk of reading stale data since the replica that replied might not have received the latest write.

Deciding the appropriate consistency level

The two consistency levels are not independent, so one should consider the interactions between them when deciding the appropriate level.

If we assume a keyspace with replication factor NN and clients that read with read consistency RR and write with write consistency WW, then a read operation is guaranteed to reflect the latest successful write as long as R+W>NR + W > N. For instance, this could be achieved by:

  • Performing both reads and writes at the QUORUM level.
  • Performing reads at ONE level and writes at ALL levels, or vice versa.

In all of the above cases, at least one node from the read set will exist in the write set, thus having seen the latest write.

Note: However, each one of them provides different levels of availability, durability, latency, and consistency for read and write operations.

Mechanisms that help Cassandra to favor high availability and performance

As explained above, Cassandra favors high availability and performance over data consistency. As a result, it employs several mechanisms that ensure the cluster can keep processing operations even during node failures and partitions. The replicas can converge again as quickly as possible after recovery.

Some of these mechanisms are described in the following section:

Hinted handoff

Hinted handoff happens during write operations.

If the coordinator cannot contact the necessary number of replicas, the coordinator can store the result of the operation locally and forward it to the failed node after it has recovered.

Read repair

Read repair happens during read operations.

Suppose the coordinator receives conflicting data from the contacted replicas. In that case, it resolves the conflict by selecting the latest record and forwards it synchronously to the stale replicas before responding to the read request.

Anti-entropy repair

Anti-entropy repair happens in the background.

Replica nodes exchange the data for a specific range, and if they find differences, they keep the latest data for each record, complying with the LWW strategy.

Note: However, this involves big datasets, so it’s important to minimize network bandwidth consumption. For this reason, the nodes encode the data for a range in a Merkle tree and gradually exchange parts of the tree to discover the conflicting data that need to be exchanged.

Get hands-on with 1400+ tech skills courses.