Cassandra's Cluster Internode Communication

Let's explore how the nodes in a cassandra cluster communicate with each other.

Gossip protocol

The nodes of the cluster communicate with each other periodically via a gossip protocol. This allows the nodes to exchange state and topology information about themselves and other nodes they know about within the cluster. New information is gradually spread throughout the cluster via this process. In this way, nodes are able to keep track of which nodes are responsible for which token ranges, so that they can route requests accordingly.

Nodes can also determine which nodes are healthy (reachable) and which are not. The system will not send requests to nodes that are unreachable.

An operator can use administrator tools to instruct a node of the cluster to remove another node that has crashed permanently from the ring. Any partitions belonging to that node will be replicated to a different node from the remaining replicas.

Bootstrapping

The bootstrapping process will allow the first nodes to join the cluster. For this reason, a set of nodes are designated as seed nodes and they can be specified to all the nodes of the cluster via a configuration file or a third-party system during startup.

Handling requests

Cassandra has no notion of a leader or primary node. All replica nodes are considered equivalent. Every incoming request can be routed to any node in the cluster. This node is called the coordinator node and is responsible for managing the execution of the request on behalf of the client. This node identifies the nodes that contain the data for the requested partition and dispatches the requests. After successfully collecting the responses, it replies to the client.

Conflict occurrence

As there is no leader and all replica nodes are equivalent, they can handle writes concurrently. As a result, there is a need for a conflict resolution scheme.

Conflict resolution scheme

Cassandra uses a last-write-wins (LWW) scheme. Every row that is written comes with a timestampIt’s a wall-clock timestamp, i.e. based on physical clocks.. When a read is performed, the coordinator collects all the responses from the replica nodes and returns the one with the latest timestamp.

Selecting a coordinator node

The client can specify policies for the selection of the coordinator node. This policy might select coordinator nodes randomly in a round-robin fashion, select the closest node or select one of the replica nodes to reduce subsequent network hops.

Similar to the concept of seed nodes, the client driver is provided with some configuration that contains a list of contact points, which are nodes of the cluster. The client will initially try to connect to one of these nodes to acquire a view of the whole cluster and route requests everywhere.

Get hands-on with 1400+ tech skills courses.