FaunaDB is a distributed datastore inspired by the Calvin protocolA. Thomson, T. Diamond, S.-C. Weng, K. Ren, P. Shao, and D. J. Abadi, “Calvin: Fast Distributed Transactions for Partitioned Database Systems,” Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012. for its core architecture.

Calvin protocol

Calvin is based on the following central idea:

By replicating inputs instead of effects to the various nodes of the system, it’s possible to have a more deterministic system where all the non-failing nodes go through the same states.

This determinism of the Calvin protocol can obviate the need for agreement protocols, such as two-phase commit, when performing distributed transactions since the nodes involved in the transaction can rely on each other, proceeding in the same way.

Abstract architecture of FaunaDB’s architecture

The abstract architecture of FaunaDB is composed of three layers.

The sequencing layer

The sequencing layer receives inputs or commands and places them in a global order, which is achieved via a consensus protocol.

This is the sequence all the nodes will execute the operations.

The scheduling layer

The scheduling layer orchestrates the execution of transactions using a deterministic locking scheme to guarantee equivalence to the serial order specified by the sequencing layer . It also allows transactions to be executed concurrently.

The storage layer

The storage layer is responsible for the physical data layout.

Roles performed by FaunaDB’s nodes

Every node in FaunaDB performs three roles simultaneously:

Query coordinator

Query coordinator receives and processes a request.

Note: The request might be processed locally or routed to other nodes, depending on its type.

Data replica

The data replica stores data and serves them during read operations.

Log replica

The Log replica reaches consensus on the order of inputs and adds them to the globally ordered log.

Conceptual architecture of FaunaDB

A cluster comprises three or more logical datacenters and data is partitioned inside a datacenter and replicated across datacenters for increased performance and availability.

Note: Similar to Spanner, multiple versions of each data item are preserved in FaunaDB.

FaunaDB uses a slightly customized version of Raft for consensus. FaunaDB’s version aggregates requests and replicates them in batches to improve throughput. These batches are called epochs and a typical window of batching is 10 milliseconds so that the impact on latency is not significant. The ordering of requests is achieved by combining the epoch number and the request index in the batch.

The following illustration contains a conceptual view of the architecture. Each role is visualized separately to facilitate an understanding of how the various functions interoperate. However, a single node can perform all these roles, as explained previously.

When a request arrives to a query coordinator, it speculatively executes the transaction at the latest known log timestamp to discover the data accessed by the transaction, also referred to as read and write intents.

The processing thereafter differs depending on the type of request.

A read-write transaction request

If the request is a read-write transaction, it is forwarded to a log replica that makes sure it’s recorded as part of the next batch, as agreed via consensus with the other replicas.

The request is then forwarded to each data replica that contains associated data.

Difference with other systems

An interesting difference with other systems is that in FaunaDB, data transfer at this stage is push-based, not pull-based.

For example, if replica A needs to perform a write based on data owned by replica B during a transaction, replica B is supposed to send the data to replica A instead of replica A requesting them. A significant advantage of this is fewer messages that lead to reduced latency. A valid question is what happens if the node that is supposed to send the data fails. In this case, the data replica can fall back to requesting the data from other replicas of this partition.

As a result, each data replica blocks until it has received all the data needed from other replicas. Then, it resolves the transaction, applies any local writes and acknowledges the success to the query coordinator.

Note: Data might have changed since the speculative execution of the query coordinator. If that’s the case, the transaction will be aborted and can potentially be retried, but this will be a unanimous decision since all the nodes will execute the operations in the same order. Consequently, there is no need for an agreement protocol, such as a two-phase commit.

A read-only transaction request

If the request is a read-only transaction, it is sent to the replica(s) that contain the associated data or served locally if the query coordinator contains all the data.

The transaction is time-stamped with the latest known log timestamp, and all read operations are performed at this timestamp.

The client library also maintains the timestamp of the highest log position seen so far, which is used to guarantee a monotonically advancing view of the transaction order. This guarantees causal consistency in cases where the client switches from node A to node B, where node B lags behind node A in transaction execution from the log.

Guarantees provided by FaunaDB

In terms of guarantees,

  • Read-write transactions are strictly serializable
  • Read-only transactions are just serializable

However, read-only transactions can opt-in to be strictly serializable by using the so-called linearized endpoint. In that case, the read is combined with a no-op write, and it’s executed as a regular read-write transaction going through consensus. This increases latency.

Achieving guarantees

To achieve the above guarantees, read-write transactions make use of a pessimistic concurrency control scheme based on read/write locks. This protocol is deterministic, which means it guarantees that all nodes will acquire and release locks in the exact same order.

Note: An interesting benefit of this protocol is that deadlocks are prevented. There is literatureD. J. Abadi and J. M. Faleiro, “An Overview of Deterministic Database Systems,” Communications of the ACM, Volume 61 Issue 9, September 2018, 2018. that examines the benefits of determinism in database systems in more detail.

The order is defined by order of the transactions in the log.

Note: This does not prevent transactions from running concurrently. It just requires that locks for the transaction, tit_i can be acquired only after locks have been acquired (and potentially released) for all previous transactions tj(j<i)t_j (j < i).

This means that FaunaDB must know all the data accessed by read/write transactions in advance, which means FaunaDB cannot support interactive transactions. As described previously, this is not strictly required since the query coordinator performs an initial reconnaissance query and includes the results in the submitted transactions. So, all the replicas can perform the reads again during the execution of the transaction and identify whether read/write sets have changed, where the transaction can be aborted and retried. This technique is called Optimistic Lock Location Prediction (OLLP).

Interactive transactions

Interactive transactions are ones that a client can keep open and execute operations dynamically while potentially performing other operations not related to the database.

In contrast to interactive transactions, transactions in FaunaDB are declared at once and sent for processing.

FaunaDB can still simulate interactive transactions via a combination of snapshot reads and compare-and-swap operations.

Now that you’ve covered essential distributed data store concepts, test your knowledge by interacting with the AI widget below to answer six different questions focused on different distributed data stores. To get started, say hello to Edward in the widget below, and it will lead the way.

Powered by AI
18 Prompts Remaining
Prompt AI WidgetOur tool is designed to help you to understand concepts and ask any follow up questions. Ask a question to get started.

Get hands-on with 1400+ tech skills courses.