Statement

Given a string, dna, that represents a DNA subsequence, and a number kk, return all the contiguous subsequences (substrings) of length kk that occur more than once in the string. The order of the returned subsequences does not matter. If no repeated substring is found, the function should return an empty set.

The DNA sequence is composed of a series of nucleotides abbreviated as AA, CC, GG, and TT. For example, ACGAATTCCGACGAATTCCG is a DNA sequence. When studying DNA, it is useful to identify repeated subsequences in it.

Constraints:

  • 11 \leq dna.length \leq 10310^3
  • dna[i] is either A, C, G, or T.
  • 1k101 \leq k \leq 10

Solution

So far, you’ve probably brainstormed some approaches on how to solve this problem. Let’s explore some of these approaches and figure out which one to follow while considering time complexity and any implementation constraints.

Naive approach

A naive approach would be to iterate through the input DNA sequence and add all the unique substrings of length kk to a set. If a substring is already present in a set, it is a repeated substring.

Here’s how the algorithm works:

  • We iterate the string using a pointer ii, ranging from 00 to (nk+1)(n - k + 1). This is the number of kk-length substrings present in the sequence.

  • At each iteration, we generate the current kk-length substring, i.e., dna[i]…dna[i + k - 1].

  • Next, we check if this substring is already present in the set.

    • If it is, the current substring is a repeated sequence, so we add it to our output.

    • Otherwise, the current substring has not yet been repeated, so we just add it to the set.

  • We repeat the above process for all kk-length substrings.

  • Once all kk-length substrings have been evaluated, we return the output.

The time complexity of this approach is O((nk)×k)O((n - k) \times k), where nn is the length of the input sequence and kk is the size of each contiguous subsequence we consider. This is because we iterate over (nk+1)(n - k + 1) substrings of length kk, and at each iteration, the time taken to generate a kk-length substring is O(k)O(k).

The space complexity of this approach is O((nk)×k)O((n - k) \times k), since in the worst case, our set can contain (nk+1)(n - k + 1) elements, and at each iteration of the traversal, we are allocating memory to generate a new kk-length substring.

Optimized approach using sliding window

The problem can be optimized using a sliding window approach. We use the Rabin-Karp algorithm that utilizes a sliding window with rolling hashRolling hash is used to prevent rehashing the whole string while calculating hash values of the substrings of a given string. for pattern matching.

Here’s the basic idea of the algorithm:

  • We traverse the string by using a sliding window of length kk, which slides one character forward on each iteration.

  • On each iteration, we compute the hash of the current kk-length substring in the window.

  • We check if the hash is already present in the set.

    • If it is, the substring is repeated, so we add it to the output.

    • Otherwise, the substring has not yet been repeated, so we add the computed hash to the set.

  • We repeat the above process for all kk-length substrings by sliding the window one character forward on each iteration.

  • After all kk-length substrings have been evaluated, we return the output.

There are multiple approaches for computing hash values, and the choice of the hash function can impact the algorithm’s time complexity. Let’s look at some approaches below.

Hashing and comparison in linear time

Let’s use a simple hashing method that sums the ASCII code of characters present in a window.

Consider the sequence ACTCTACTCT with k=2k = 2.

  1. Initially, the sequence in the window is ACAC and its hash value is:

    H(AC)=65+67=132H(AC) = 65 + 67 = 132

    Since the above hash value has not been repeated yet, we add this hash value to the set and slide the window one character forward.

  2. The sequence in the window is now CTCT. To compute the hash value of CTCT, the ASCII of AA will be removed from the previous hash value and the ASCII of TT will then be added:

    H(CT)=13265+84=151H(CT) = 132 - 65 + 84 = 151

    Since the above hash value has not been repeated yet, we add this hash value to the set and slide the window one character forward.

  3. The sequence in the window is now TCTC. To compute the hash value of TCTC, the ASCII of CC will be removed from the previous hash value and then again added:

    H(TC)=15167+67=151H(TC) = 151 - 67 + 67 = 151

    Here, we have the same hash value but different sequences—CTCT and TCTC. This means that if a hash value is already present in the set, we need to compare the corresponding sequences as well to confirm if they are identical. In this case, they are not, so we add this hash value to the set and slide the window one character forward.

  4. The sequence in the window is now CTCT. To compute the hash value of CTCT, the ASCII of TT will be removed from the previous hash value and then again added:

    H(CT)=15184+84=151H(CT) = 151 - 84 + 84 = 151

    Here we have the same hash value, so we compare the two sequences. Since they are identical, the sequence has been repeated and is therefore added to the output.

Computing the hash value and then comparing the strings if the hashes are equal will take linear time, O(k)O(k). In the worst case, the comparisons will occur after each slide, which will make the running time the same as that of the naive approach, which is O((nk+1)×k)O((n-k+1)\times k).

Hashing and comparison in constant time

We need a hash function that helps us achieve constant-time hashing. For this purpose, we use the polynomial rolling hash technique:

H=c1ak1+c2ak2+...+ciaki+...+ck1a1+cka0H = c_1 a^{k-1} + c_2 a^{k-2} + ... + c_i a^{k-i} + ...+ c_{k-1} a^{1} + c_ka^{0}

Here, aa is a constant, c1,,ckc_1, \ldots, c_k are the characters in a sequence, and kk is the substring length. Since we only have 44 possible nucleotides, our aa would be 44. We also assign numeric values to the nucleotides, as shown in the table below:

Note: We use a value of 44 for the constant aa since this ensures that each nucleotide is assigned a unique value in the polynomial hash function. This reduces the number of hash collisions and helps reduce the randomness in the behaviour of the hash function.

It is not necessary to set the constant aa equal to 44. In fact, this choice is somewhat arbitrary and can be any number that is greater than or equal to the number of nucleotides. The idea is to choose a value that is large enough to avoid hash collisions while still being relatively small, so as to minimize the risk of arithmetic overflow.

Example: The polynomial hash value for the sequence ATGATG will be

H(ATG)=(1×42)+(4×41)+(3×40)=35H(ATG) = (1 \times 4^2) + (4 \times 4^1) + (3 \times 4^0) = 35

Consider the same sequence ACTCTACTCT with k=2k = 2.

  1. Initially, the sequence in the window is ACAC, and its hash value is:

    H(AC)=H(A)+H(C)H(AC) = H(A) + H(C)

    =(1×41)+(2×40)= (1 \times 4^{1}) + (2\times 4^{0})

    =6= 6

    Since the above hash value has not been repeated yet, we add this hash value in the set and slide the window one character forward.

  2. The sequence in the window is now CTCT. To compute the hash value of CTCT, we first need to remove the contribution of AA from the previous hash value:

    H(C)=H(AC)H(A)=H(C) = H(AC) - H(A) =

    =6(1×41)= 6 - (1 \times 4^1)

    =2= 2

    and then add the contribution of TT:

    H(CT)=H(C)+H(T)H(CT) = H(C) + H(T)

    =2+(4×40)= 2 + (4 \times 4^0)

    =6= 6

    This can’t be right, since both ACAC and CTCT cannot yield the same hash value. Examining our work carefully, we notice that we are not correctly accounting for place values. So, we need to shift the remaining bases to the left by one position so that the hash corresponds to the new sliding window. We do this by multiplying the current hash value by the base value a=4a=4. This means that H(C)H(C) becomes 2×412 \times 4^1 and H(T)H(T) becomes 4×404 \times 4^0. After that, we add the contribution of the incoming character TT to get the new hash value for the current sliding window:

    H(CT)=[(H(AC)H(A))×4]+H(T)H(CT) = [(H(AC) - H(A)) \times 4] + H(T)

    H(CT)=[6(1×41)]×4+(4×40)H(CT) = [6 - (1 \times 4^1)] \times 4 + (4\times 4^0)

    =12= 12

    Since this hash value has not been seen yet, we add it to the set and slide the window one character forward.

  3. The sequence in the window is now TCTC. We compute the hash value of TCTC in the same way as explained above:

H(TC)=([H(CT)H(C)]×4)+H(C)H(TC) = ([H(CT) - H(C)] \times 4) + H(C)

=([H(CT)(2×4)]×4)+H(C)= ([H(CT) - (2 \times 4)] \times 4) + H(C)

=([128]×4)+2= ([12 - 8] \times 4) + 2

=(4×4)+2= (4 \times 4) + 2

=16+2=18= 16 + 2 = 18

Since this hash value has not been seen yet, we add it to the set and slide the window one character forward. As you can see, even though CTCT and TCTC contain the same characters, since their relative order is different, the respective hash values of the two strings are also different.

  1. The sequence in the window is now CTCT. We compute the hash value of CTCT in the same way as explained above:

    H(CT)=[(H(TC)H(T))×4]+H(T)H(CT) = [(H(TC) - H(T)) \times 4] + H(T)

=[(1816)×4]+4= [(18-16) \times 4] + 4

=[2×4]+4= [2 \times 4] + 4

=12= 12

This hash value is present in the set. Therefore, we add it to the output without checking if the sequences are identical.

From the above two approaches, it is clear that we will be using the polynomial rolling hash function in our solution to compute the hash values of the kk-length substrings since this method skips the need to compare two sequences if their hash values are the same and, therefore, helps us achieve constant-time hashing and comparison.

The slides below illustrate how the algorithm runs:

Press + to interact
canvasAnimation-image
1 of 15

Note: In the following section, we will gradually build the solution. Alternatively, you can skip straight to just the code.

Step-by-step solution construction

We will first define the following variables to set ourselves up to implement the polynomial rolling hash function:

  • mapping: This is a hash map defining the numeric mapping of nucleotides. It will be of the form: {A:1,C:2,G:3,T:4}\{A: 1, C: 2, G: 3, T: 4\}.
  • baseValue: This is the base value used in the polynomial hash function. We will use a base value of 44 since there are 44 nucleotides in the sequence.
  • numbers: This is an array storing the integer form of the string dna based on the mapping defined above. For example, for the sequence ACTCTACTCT, numbers will consist of [1,2,4,2,4][1, 2, 4, 2, 4]. This array will make it easier to access the numeric value of the nucleotides when calculating the hash value.
class RepeatedDNA {
public static void findRepeatedSequences(String dna, int k) {
int stringLength = dna.length();
// Mapping of characters
Map<Character, Integer> mapping = new HashMap<>();
mapping.put('A', 1);
mapping.put('C', 2);
mapping.put('G', 3);
mapping.put('T', 4);
// Base value
int baseValue = 4;
// Numeric representation of the sequence
List<Integer> numbers = new ArrayList<>(Arrays.asList(new Integer[stringLength]));
Arrays.fill(numbers.toArray(), 0);
for (int i = 0; i < stringLength; i++) {
numbers.set(i, mapping.get(dna.charAt(i)));
}
System.out.println("\tConverted sequence: " + numbers.toString());
}
// Driver code
public static void main(String[] args) {
List<String> inputsString = Arrays.asList("ACGT", "AGACCTAGAC", "AAAAACCCCCAAAAACCCCCC",
"GGGGGGGGGGGGGGGGGGGGGGGGG", "TTTTTCCCCCCCTTTTTTCCCCCCCTTTTTTT", "TTTTTGGGTTTTCCA",
"AAAAAACCCCCCCAAAAAAAACCCCCCCTG", "ATATATATATATATAT");
List<Integer> inputsK = Arrays.asList(3, 3, 8, 12, 10, 14, 10, 6);
for (int i = 0; i < inputsK.size(); i++) {
System.out.println((i + 1) + ".\tInput sequence: " + inputsString.get(i) +
"\n\tk: " + inputsK.get(i) + "\n");
findRepeatedSequences(inputsString.get(i), inputsK.get(i));
System.out.println(Print.repeat("-", 100));
}
}
}
Repeated DNA Sequences

We declare a variable, hashValue, to store the hash value of the current kk-length sequence in the window. It is initialized to 00.

Next, we slide the window along the string, dna, using a pointer, i, ranging from 00 to (stringLengthk+1)(stringLength - k + 1):

  • When i is 00, the window is at its starting position, i.e., the first kk-length substring. For this sequence, we calculate the hash value from scratch using the above-mentioned polynomial hash function.

    for (int j = 0; j < k; j++) {
        hashValue += numbers.get(j) * (int) Math.pow(baseValue, k - j - 1);
    }
    
  • Otherwise, the window is not at its starting position. So, we calculate the hash value of the current kk-length substring by utilizing the hash value of the previous kk-length substring:

    int previousHashValue = hashValue;
    hashValue = ((previousHashValue - numbers.get(i - 1) * (int) Math.pow(baseValue, k - 1)) * baseValue) + numbers.get(i + k - 1);
    
  • The above process is repeated by sliding the window one character forward.

class RepeatedDNA {
public static void findRepeatedSequences(String dna, int k) {
int stringLength = dna.length();
// Mapping of characters
Map<Character, Integer> mapping = new HashMap<>();
mapping.put('A', 1);
mapping.put('C', 2);
mapping.put('G', 3);
mapping.put('T', 4);
// Base value
int baseValue = 4;
// Numeric representation of the sequence
List<Integer> numbers = new ArrayList<>(Arrays.asList(new Integer[stringLength]));
Arrays.fill(numbers.toArray(), 0);
for (int i = 0; i < stringLength; i++) {
numbers.set(i, mapping.get(dna.charAt(i)));
}
// Current hash value
int hashValue = 0;
for (int i = 0; i < stringLength - k + 1; i++) {
// If the window is at its initial position, calculate the hash value from scratch
if (i == 0) {
for (int j = 0; j < k; j++) {
hashValue += numbers.get(j) * (int) Math.pow(baseValue, k - j - 1);
}
} else {
int previousHashValue = hashValue;
hashValue = ((previousHashValue - numbers.get(i - 1) * (int) Math.pow(baseValue, k - 1)) * baseValue) + numbers.get(i + k - 1);
}
System.out.println("\tHash value of " + dna.substring(i, i + k) + ": " + hashValue);
}
}
// Driver code
public static void main(String[] args) {
List<String> inputsString = Arrays.asList("ACGT", "AGACCTAGAC", "AAAAACCCCCAAAAACCCCCC",
"GGGGGGGGGGGGGGGGGGGGGGGGG", "TTTTTCCCCCCCTTTTTTCCCCCCCTTTTTTT", "TTTTTGGGTTTTCCA",
"AAAAAACCCCCCCAAAAAAAACCCCCCCTG", "ATATATATATATATAT");
List<Integer> inputsK = Arrays.asList(3, 3, 8, 12, 10, 14, 10, 6);
for (int i = 0; i < inputsK.size(); i++) {
System.out.println((i + 1) + ".\tInput sequence: " + inputsString.get(i) +
"\n\tk: " + inputsK.get(i) + "\n");
findRepeatedSequences(inputsString.get(i), inputsK.get(i));
System.out.println(Print.repeat("-", 100));
}
}
}
Repeated DNA Sequences

We declare the following variables to keep track of the hash values and store the repeated substrings:

  • hashSet: This is a set that stores all the unique hash values of the kk-length substrings. It is initialized to empty.

  • output: This is a set that stores the repeated substrings.

We check if the calculated hash value of the current kk-length substring is present in hashSet:

  • If it is, the substring is repeated, so it is added to output:

    if (hashSet.contains(hashValue)) {
       output.add(dna.substring(i, i + k));
    }
    
  • Otherwise, we will just add the hash value of the substring to hashSet:

    hashSet.add(hashValue)
    

When the hash values of all kk-length substrings have been evaluated, i.e., the sliding window can not move forward, we return output.

class RepeatedDNA {
public static Set<String> findRepeatedSequences(String dna, int k) {
int stringLength = dna.length();
// Mapping of characters
Map<Character, Integer> mapping = new HashMap<>();
mapping.put('A', 1);
mapping.put('C', 2);
mapping.put('G', 3);
mapping.put('T', 4);
// Base value
int baseValue = 4;
// Numeric representation of the sequence
List<Integer> numbers = new ArrayList<>(Arrays.asList(new Integer[stringLength]));
Arrays.fill(numbers.toArray(), 0);
for (int i = 0; i < stringLength; i++) {
numbers.set(i, mapping.get(dna.charAt(i)));
}
// Current hash value
int hashValue = 0;
// 1. Hash set to store the unique hash values
// 2. Output set to store the repeated substrings
Set<Integer> hashSet = new HashSet<>();
Set<String> output = new HashSet<>();
for (int i = 0; i < stringLength - k + 1; i++) {
// If the window is at its initial position, calculate the hash value from scratch
if (i == 0) {
for (int j = 0; j < k; j++) {
hashValue += numbers.get(j) * (int) Math.pow(baseValue, k - j - 1);
}
} else {
int previousHashValue = hashValue;
hashValue = ((previousHashValue - numbers.get(i - 1) * (int) Math.pow(baseValue, k - 1)) * baseValue) + numbers.get(i + k - 1);
}
// If the current hash value is present in the hash set, the current substring has been repeated, so we add it to the output
if (hashSet.contains(hashValue)) {
output.add(dna.substring(i, i + k));
}
// We add the evaluated hash value to the hash set
hashSet.add(hashValue);
System.out.println("\n\tHash value of " + dna.substring(i, i + k) + ": " + hashValue +
"\n\tHash Set: " + Print.printSetInt(hashSet) +
"\n\tOutput: " + Print.printSetString(output));
}
return output;
}
// Driver code
public static void main(String[] args) {
List<String> inputsString = Arrays.asList("ACGT", "AGACCTAGAC", "AAAAACCCCCAAAAACCCCCC",
"GGGGGGGGGGGGGGGGGGGGGGGGG", "TTTTTCCCCCCCTTTTTTCCCCCCCTTTTTTT", "TTTTTGGGTTTTCCA",
"AAAAAACCCCCCCAAAAAAAACCCCCCCTG", "ATATATATATATATAT");
List<Integer> inputsK = Arrays.asList(3, 3, 8, 12, 10, 14, 10, 6);
for (int i = 0; i < inputsK.size(); i++) {
System.out.println((i + 1) + ".\tInput sequence: " + inputsString.get(i) +
"\n\tk: " + inputsK.get(i));
findRepeatedSequences(inputsString.get(i), inputsK.get(i));
System.out.println(Print.repeat("-", 100));
}
}
}
Repeated DNA Sequences

The above solution works for most inputs. However, it will not work if the length of the string, dna, is less than the window size, k. So, we need to handle this case by returning an empty set.

class RepeatedDNA {
public static Set<String> findRepeatedSequences(String dna, int k) {
int stringLength = dna.length();
if (stringLength < k) {
return new HashSet<String>();
}
// Mapping of characters
Map<Character, Integer> mapping = new HashMap<>();
mapping.put('A', 1);
mapping.put('C', 2);
mapping.put('G', 3);
mapping.put('T', 4);
// Base value
int baseValue = 4;
// Numeric representation of the sequence
List<Integer> numbers = new ArrayList<>(Arrays.asList(new Integer[stringLength]));
Arrays.fill(numbers.toArray(), 0);
for (int i = 0; i < stringLength; i++) {
numbers.set(i, mapping.get(dna.charAt(i)));
}
// Current hash value
int hashValue = 0;
// 1. Hash set to store the unique hash values
// 2. Output set to store the repeated substrings
Set<Integer> hashSet = new HashSet<>();
Set<String> output = new HashSet<>();
for (int i = 0; i < stringLength - k + 1; i++) {
// If the window is at its initial position, calculate the hash value from scratch
if (i == 0) {
for (int j = 0; j < k; j++) {
hashValue += numbers.get(j) * (int) Math.pow(baseValue, k - j - 1);
}
} else {
int previousHashValue = hashValue;
hashValue = ((previousHashValue - numbers.get(i - 1) * (int) Math.pow(baseValue, k - 1)) * baseValue) + numbers.get(i + k - 1);
}
// If the current hash value is present in the hash set, the current substring has been repeated, so we add it to the output
if (hashSet.contains(hashValue)) {
output.add(dna.substring(i, i + k));
}
// We add the evaluated hash value to the hash set
hashSet.add(hashValue);
}
return output;
}
// Driver code
public static void main(String[] args) {
List<String> inputsString = Arrays.asList("ACGT", "AGACCTAGAC", "AAAAACCCCCAAAAACCCCCC",
"GGGGGGGGGGGGGGGGGGGGGGGGG", "TTTTTCCCCCCCTTTTTTCCCCCCCTTTTTTT", "TTTTTGGGTTTTCCA",
"AAAAAACCCCCCCAAAAAAAACCCCCCCTG", "ATATATATATATATAT");
List<Integer> inputsK = Arrays.asList(3, 3, 8, 12, 10, 14, 10, 6);
for (int i = 0; i < inputsK.size(); i++) {
System.out.println((i + 1) + ".\tInput sequence: " + inputsString.get(i) +
"\n\tk: " + inputsK.get(i) +
"\n\n\tRepeated sequences: " + Print.printSetString(findRepeatedSequences(inputsString.get(i), inputsK.get(i))));
System.out.println(Print.repeat("-", 100));
}
}
}
Repeated DNA Sequences

Just the code

Here’s the complete solution to this problem:

class RepeatedDNA {
public static Set<String> findRepeatedSequences(String dna, int k) {
int stringLength = dna.length();
if (stringLength < k) {
return new HashSet<String>();
}
Map<Character, Integer> mapping = new HashMap<>();
mapping.put('A', 1);
mapping.put('C', 2);
mapping.put('G', 3);
mapping.put('T', 4);
int baseValue = 4;
List<Integer> numbers = new ArrayList<>(Arrays.asList(new Integer[stringLength]));
Arrays.fill(numbers.toArray(), 0);
for (int i = 0; i < stringLength; i++) {
numbers.set(i, mapping.get(dna.charAt(i)));
}
int hashValue = 0;
Set<Integer> hashSet = new HashSet<>();
Set<String> output = new HashSet<>();
for (int i = 0; i < stringLength - k + 1; i++) {
if (i == 0) {
for (int j = 0; j < k; j++) {
hashValue += numbers.get(j) * (int) Math.pow(baseValue, k - j - 1);
}
} else {
int previousHashValue = hashValue;
hashValue = ((previousHashValue - numbers.get(i - 1) * (int) Math.pow(baseValue, k - 1)) * baseValue) + numbers.get(i + k - 1);
}
if (hashSet.contains(hashValue)) {
output.add(dna.substring(i, i + k));
}
hashSet.add(hashValue);
}
return output;
}
// Driver code
public static void main(String[] args) {
List<String> inputsString = Arrays.asList("ACGT", "AGACCTAGAC", "AAAAACCCCCAAAAACCCCCC",
"GGGGGGGGGGGGGGGGGGGGGGGGG", "TTTTTCCCCCCCTTTTTTCCCCCCCTTTTTTT", "TTTTTGGGTTTTCCA",
"AAAAAACCCCCCCAAAAAAAACCCCCCCTG", "ATATATATATATATAT");
List<Integer> inputsK = Arrays.asList(3, 3, 8, 12, 10, 14, 10, 6);
for (int i = 0; i < inputsK.size(); i++) {
System.out.println((i + 1) + ".\tInput sequence: " + inputsString.get(i) +
"\n\tk: " + inputsK.get(i) +
"\n\n\tRepeated sequences: " + Print.printSetString(findRepeatedSequences(inputsString.get(i), inputsK.get(i))));
System.out.println(Print.repeat("-", 100));
}
}
}
Repeated DNA Sequences

Solution summary

To recap, the solution to this problem can be divided into the following six main parts:

  1. Iterate over all kk-length substrings.

  2. Compute the hash value for the contents of the window.

  3. Add this hash value to the set that keeps track of the hashes of all substrings of the given length.

  4. Move the window one step forward and compute the hash of the new window using the rolling hash method.

  5. If the hash value of a window has already been seen, the sequence in this window is repeated, so we add it to the output set.

  6. Once all substrings have been evaluated, return the output set.

Time complexity

Average case

The average case time complexity of this solution is O(n)O(n), where nn is the length of the input string. It is calculated as follows:

  • Time taken to populate the numbers array: O(n)O(n).

  • Time taken to traverse all the kk-length substrings: O(nk+1)O(n - k + 1).

  • Time taken to calculate the hash value of a kk-length substring: O(1)O(1).

So, the dominating time complexity becomes O(n)O(n).

Worst case

To understand the worst case time complexity of this solution, consider the input string “AAAAAAAA” with k=2k = 2. This combination of inputs ensures that a repeated sequence “AA” is detected and added to the output each time the window slides forward. Therefore, we must generate a kk-length substring on each (nk+1)(n - k + 1) iteration of the loop. The time to generate a kk-length substring is O(k)O(k). Therefore, the overall time complexity becomes O((nk)×k)O((n - k) \times k).

Space complexity

The space complexity of this solution is O(n)O(n). It is calculated as follows:

  • Space occupied by the mapping hash map: O(1)O(1).

  • Space occupied by the numbers array: O(n)O(n).

  • Space occupied by the hashSet set: O(nk+1)O(n - k + 1).

So, the dominating space complexity becomes O(n)O(n).

Level up your interview prep. Join Educative to access 70+ hands-on prep courses.