Dotted Version Vectors

Let's learn what dotted version vectors are and how they provide scaling of version vectors.

Dotted version vectors is a technique that makes it possible to successfully identify concurrent versions, and allows the version vectors to scale with the number of servers.

The following characteristic of this technique allows it to achieve this.

Characteristic that makes it possible to scale version vectors

Each entry in the vector is not a single number anymore, but a pair of numbers. This can encode a sequence of numbers that are not fully sequential but contain one gap.

The pair (n1,n2)(n_1,n_2) represents all the numbers from 11 to n1n_1 plus the number n2n_2.

For example, the pair (4,7)(4,7) represents the sequence [1,2,3,4,7][1,2,3,4,7]. Note that the second number is optional and some entries can still be a single number. This can be leveraged to keep track of concurrency between multiple versions.

The order between the two versions is now defined in terms of the contains relationship on the corresponding sequences. So, for vectors v1v_1, v2v_2 the relationship v1v2v_1\leq v_2 holds if the sequence represented by v1v_1 is a subset of the sequence represented by v2v_2. More specifically:

  • (m)(m)(m) \leq (m^{\prime}) if mmm \leq m^{\prime}
  • (m)(m,n)(m)\leq(m^{\prime},n^{\prime}) if mmm=m+1=nm\leq m^{\prime} \vee m=m^{\prime}+1=n^{\prime}
  • (m,n)(m)(m,n)\leq(m^{\prime}) if nmn\leq m^{\prime}
  • (m,n)(m,n)(m,n)\leq(m^{\prime},n^{\prime}) if nm(mmn=n)n\leq m^{\prime} \vee (m\leq m^{\prime}\wedge n=n^{\prime})

The update rule executed by each replica node when it receives a write operation is also slightly different.

For all the indexes except the one belonging to the replica node, the node uses the value (m)(m) where mm is the maximum number amongst those available in the provided version vectors in the context.

For the index corresponding to the replica node, the node uses the pair (m,n+1)(m, n+1), where mm is the maximum number amongst those available in the provided version vectors in the context, and nn is the maximum number amongst the version vectors present in the replica node (essentially the value of its logical clock).

For a more elaborate analysis of the rules and formal proof that this technique is safe, refer to the original paperN. M. Perguica, C. Baquero, P. S. Almeida, V. Fonte, and G. Ri- cardo, “Dotted Version Vectors: Logical Clocks for Optimistic Replication,” arXiv:1011.5808, 2010..

The following illustration shows what a solution with dotted version vectors would look like in our previous examples.

As we can see, the write operations by client nodes CC and DD end up receiving version vectors {(B,0,1)}\lbrace(B,0,1)\rbrace and {(B,0,2)}\lbrace(B,0,2)\rbrace and they are successfully identified as concurrent, since {(B,0,1)}{(B,0,2)}\lbrace (B,0,1) \rbrace \nleq \lbrace (B,0,2) \rbrace and {(B,0,2)}{(B,0,1)}\lbrace (B,0,2) \rbrace \nleq \lbrace (B,0,1) \rbrace.

Get hands-on with 1400+ tech skills courses.