Assembling genomes: Paired De Bruijn Graphs

Let’s talk about paired de Bruijn graphs.

We'll cover the following

Constructing a path graph
Gluing identically labeled nodes
Paired composition graph
A pitfall of paired de Bruijn graphs
Charging Station: Generating All Eulerian Cycles
Charging Station: Reconstructing a String Spelled by a Path in the Paired deBruijn Graph

Given a (k, d)-mer (a $_{1}$ . . . a $_{k}$ | b $_{1}$ , . . . b $_{k}$ ), we define its prefix and suffix as the following (k −1, d + 1)-mers:

Prefix((a $_{1}$ …a $_{k}$ |b $_{1}$ ,…b $_{k}$ )) = (a $_{1}$ …a $_{k-1}$ |b $_{1}$ …b $_{k-1}$ )

Suffix((a $_{1}$ …a $_{k}$ |b $_{1}$ ,…b $_{k}$ )) = (a $_{2}$ …a $_{k}$ |b $_{2}$ …b $_{1}$ )

For example Prefix((GAC | TCA)) = (GA | TC) and Suffix((GAC | TCA)) = (AC | CA). Note that for consecutive (k, d)-mers appearing in Text, the suffix of the first (k, d)-mer is equal to the prefix of the second (k, d)-mer. For example for the consecutive (k, d)-mers (TAA | GCC) and (AAT | CCA) in TAATGCCATGGGATGTT.

Suffix((TAA | GCC)) = Prefix((AAT | CCA)) = (AA | CC)

Constructing a path graph

Given a string Text, we construct a graph PathGraph $_{k, d}$ (Text) that represents a path formed by |Text| − (k + d + k) + 1 edges corresponding to all (k, d)-mers in Text. We label edges in this path by (k, d)-mers and label the starting and ending nodes of an edge by its prefix and suffix, respectively (see figure given below).

Get hands-on with 1400+ tech skills courses.

Before Getting Started

Where in the Genome Does DNA Replication Begin?

DNA Replication: Open Problems, Charging Stations, and Detours

How Do We Assemble Genomes?

Assemble Genomes: Charging Stations, and Detours

How Do We Compare Biological Sequences?

Biological Sequences: Detours

Conclusion

Assembling genomes: Paired De Bruijn Graphs

Constructing a path graph