Spanner Operations
Let's study the operations supported by the Spanner.
Spanner supports the following types of operations:
- Read-write transactions
- Read-only transactions
- Standalone (strong or stale) reads
Read-write transaction
A read-write transaction can contain both read and/or write operations. It provides full ACID properties for the operations of the transaction. More specifically, read-write transactions are not simply serializable, but they are strictly serializable.
Note: Spanner documentation also refers to strict serializability with “external consistency”, but both are essentially the same guarantees.
A read-write transaction executes a set of reads and write operations atomically at a single logical point in time.
Note: As explained earlier, Spanner achieves these properties using two-phase locking for isolation and two-phase commit for atomicity across multiple splits.
Workflow
The workflow for the read-write transaction follows the following sequence:
-
After opening a transaction, a client directs all the read operations to the leader of the replica group that manages the split with the required rows. This leader acquires read locks for the rows and columns involved before serving the read request. Every read also returns the timestamp of any data read.
-
Any write operations are buffered locally in the client until the point the transaction is committed. While the transaction is open, the client sends keepalive messages to prevent participant leaders from timing out a transaction.
-
When a client has completed all reads and buffered all writes, it starts the
. It chooses one of the participant leaders as the coordinator leader and sends atwo-phase commit protocol The two-phase commit is required only if the transaction accesses data from multiple replica groups. Otherwise, the leader of the single replica group can commit the transaction only through Paxos. prepare
request to all the participant leaders along with the identity of the coordinator leader. The participant leaders involved in write operations also receive the buffered writes at this stage. -
Every participant leader acquires the necessary write locks, chooses a
prepare
timestamp si that is larger than any timestamps of previous transactions, and logs aprepare
record in its replica group through Paxos. The leader also replicates the lock acquisition to the replicas to ensure they will be held even in the case of a leader failure. It then responds to the coordinator leader with theprepare
timestamp.
The following illustration contains a visualization of this sequence:
Spanner mitigating availability problems
It is worth noting that the availability problems from the two-phase commit are partially mitigated in this scheme because both the participants and the coordinator are essentially a Paxos group. So, if one of the leader nodes crashes, then another replica from that replica group will eventually detect that, take over and help the protocol make progress.
Spanner handling deadlocks
The two-phase locking protocol can result in deadlocks. Spanner resolves these situations via a
Checking whether a replica is up-to-date or not
Spanner needs a way to know if a replica is up-to-date to satisfy a read operation. For this reason, each replica tracks a value called safe time ().
Safe time ()
The is the maximum timestamp at which the replica is up-to-date. Thus, a replica can satisfy a read at a timestamp if .
This value is calculated as:
is the timestamp of the highest-applied Paxos write at a replica group and represents the highest watermark below, which writes will no longer occur with respect to Paxos.
is calculated as overall transactions prepared (but not committed yet) at replica group . If there are no such transactions, then .
Read-only transactions
Read-only transactions allow a client to perform multiple reads at the same timestamp, and these operations are also guaranteed to be strictly serializable.
An interesting property
An interesting property of read-only transactions is that they do not need to hold any locks and block other transactions. The reason for this is that these transactions perform reads at a specific timestamp, which is selected in such a way as to guarantee that any concurrent/future write operations will update data at a later timestamp.
The timestamp is selected at the beginning of the transaction as TT.now().latest
, and it’s used for all the read operations that are executed as part of this transaction.
In general, the read operations at timestamp can be served by any replica that is up to date, which means .
More specifically:
- In some cases, a replica can be certain via its internal state and TrueTime that it is up to date enough to serve the read and does so.
- In some other cases, a replica might not be sure if it has seen the latest data. It can then ask the leader of its group for the timestamp of the last transaction it needs to apply in order to serve the read.
- In case the replica is the leader itself, it can proceed directly since it is always up to date.
Standalone reads
Spanner also supports standalone reads outside the context of transactions.
Standalone reads do not differ a lot from the read operations performed as part of read-only transactions. For instance, their execution follows the same logic using a specific timestamp.
Standalone reads can be either strong or stale.
Strong read
A strong read is a read at a current timestamp and is guaranteed to see all the data committed up until the start of the read.
Stale read
A stale read is a read at a timestamp in the past. It can be provided by the application or calculated by Spanner based on a specified upper bound on staleness. A stale read is expected to have lower latency at the cost of stale data since it’s less likely the replica will need to wait before serving the request.
Partitioned Data Manipulation Language
There is also another type of operation called partitioned Data Manipulation Language (DML).
DMLallows a client to specify an update or delete operation in a declarative form, then executes in parallel at each replica group. This parallelism and the associated data locality make these operations very efficient. However, this comes with some tradeoffs.
- These operations need to be fully partitionable. This means they must be expressible as the union of a set of statements, where each statement accesses a single row of the table, and each statement accesses no other tables. Doing so ensures each replica group will execute the operation locally without any coordination with other replica groups.
- Furthermore, these operations must be idempotent because Spanner might execute a statement multiple times against some groups due to network-level retries.
Atomicity guarantee
Spanner does not provide atomicity guarantees for each statement across the entire table, but it provides atomicity guarantees per each group. It means that a statement might only run against some rows of the table, e.g., if the user cancels the operation midway or the execution fails in some splits due to constraint violations.
Get hands-on with 1400+ tech skills courses.