Solution: Design Add and Search Words Data Structure
Let's solve the Design Add and Search Words Data Structure problem using the Trie pattern.
Statement
Design a data structure called WordDictionary that supports the following functionalities:
-
Constructor: This function will initialize the object.
-
Add Word(word): This function will store the provided word in the data structure.
-
Search Word(word): This function will return TRUE if any string in the WordDictionary object matches the query word. Otherwise, it will return FALSE. If the query word contains dots,
.
, each dot is free to match any letter of the alphabet.For example, the dot in the string “.ad” can have possible search results like “aad”, “bad”, “cad”, and so on.
-
Get Words(): This function will return all the words in the WordDictionary class.
Constraints:
-
word.length
-
Words passed to Add Word() consist of lowercase English letters.
-
Words passed to Search Word() consist of
.
or lowercase English letters. -
There will be, at most, three dots in a word passed to Search Word().
-
At most, calls will be made to Add Word() , Get Words() and Search Word().
Solution
So far, you have probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.
Naive approach
The problem statement asks us to devise an algorithm to support the three mentioned functionalities. One possible approach is to utilize a hash map. We can start by creating an empty hash map such that the key is the length of the word and the value is the word itself. If we get multiple words of the same length, we create a list of words, instead of a single value, and associate this list with the key. When searching for a word in the hash map, we use the length of the word as the key and then go through the associated list of words looking for our word. In this search, we would have to compare each word, one character at a time, with our word.
Function | Time complexity | Space complexity |
Add Word() | O(1) | O(1) |
Search Word() | O(w), where w is the number of words that have the same length as the word being search | O(1) |
Get Words() | O(t), where t is the total number of words | O(1) |
However, this solution would be inefficient when searching for all strings with a common prefix.
Furthermore, once the size of the hash map increases, there will be a large number of words against each key, which would result in poor performance when searching for a word.
Optimized approach using trie
A more efficient approach is to use a trie data structure. Using a trie, we can search for words that exactly match a query string, as well as retrieve words that partially match query strings with wildcard characters. Each node in a trie is a specialized data structure, a trie node, featuring the following members:
-
Children: This is an array of length . Each index of this array corresponds to a letter of the English alphabet.
-
Complete: This variable takes a boolean value. The value will be TRUE if the node represents the last character of the word. Otherwise, it will be FALSE.
For better understanding, let’s look at the diagram below:
While adding a word, we’ll verify at each step if the child node that needs to be added is already present. If it’s already present, we’ll go down one step. Otherwise, we’ll add the child node to the trie and go down one step.
For searches, there are these two possibilities:
-
If the current character is not “.”, the search will be as simple as the add feature: we start from the root and go down the trie, looking for the current character of the query string among the children of the current trie node. If the character is not found, the search returns FALSE. If the character is found, we move to the matching child of the current node and search among its children for the next character of the query string. This process continues until a character is not found, or until all the characters in the query string are found in the trie.
-
When the current character is “.”, all the children of the current trie node are considered valid matches for this position in the query string. So, for each child, we need to check its children to see if one of them matches the character following “.”.
The slides below illustrate how the algorithm runs:
Let’s look at the code for this solution below:
class WordDictionary {TrieNode root;boolean canFind;// Initialize the root with TrieNode and set// the 'canFind' boolean to FALSEpublic WordDictionary() {root = new TrieNode();canFind = false;}// Function to get all words in the dictionarypublic List<String> getWords() {List<String> wordsList = new ArrayList < String > ();if (root == null)return new ArrayList < String > ();return DFS(root, "", wordsList);}private List<String> DFS(TrieNode node, String word, List<String> wordsList) {if (node == null) return wordsList;if (node.complete) {wordsList.add(word);}for (int j = (int)'a'; j <= (int)'z'; j++) {String prefix = word + (char) j;wordsList = DFS(node.children[j - 'a'], prefix.toString(), wordsList);}return wordsList;}// Function to add a new word to the dictionarypublic void addWord(String word) {int n = word.length();TrieNode curNode = root;for (int i = 0; i < n; i++) {int index = (int)(word.charAt(i) - 'a');if (curNode.children[index] == null) {curNode.children[index] = new TrieNode();}curNode = curNode.children[index];if (i == n - 1) {if (curNode.complete == true) {System.out.println("\tWord already present!");return;}curNode.complete = true;}}System.out.println("\tWord added successfully!");}// Function to search for a word in the dictionarypublic boolean search(String word) {canFind = false;searchHelper(root, word, 0);return canFind;}private void searchHelper(TrieNode node, String word, int i) {if (canFind) return;if (node == null) return;int n = word.length();if (n == i) {if (node.complete) {canFind = true;}return;}if (word.charAt(i) == '.') {for (int j = (int)'a'; j <= (int)'z'; j++) {searchHelper(node.children[(char) j - 'a'], word, i + 1);}} else {int index = word.charAt(i) - 'a';searchHelper(node.children[index], word, i + 1);}}}class Solution {public static void main(String[] args) {WordDictionary obj = new WordDictionary();String[] words = {"add", "sky", "hello", "multi", "addition", "sky", "multiply", "table"};int i = 1;for (String w: words) {System.out.println(i + ".\tAdding word: '" + w + "'");obj.addWord(w);System.out.println(PrintHyphens.repeat("-", 100));i += 1;}String[] wordSearch = {"helo", "multiple", "...le", "..llo", "..r"};for (String v: wordSearch) {System.out.println(i + ".\tSearching word: '" + v + "'");System.out.println("\t" + obj.search(v));System.out.println(PrintHyphens.repeat("-", 100));i += 1;}System.out.println(i + "\tGetting all words: " + String.join(", ", obj.getWords()));}}
Solution summary
To recap, the solution to this problem can be divided into the following parts:
-
Add words to a trie data structure by creating a new node for each character.
-
For a simple search, start from the root, go down the trie, matching each query string character with a child of the current node.
-
When the query string contains a “.”, consider that all the children of the current trie node are valid matches and check all of their children against the next character in the query string.
Time complexity
-
Add Word(word): At each step, we’ll either examine or create a node in the trie. In the worst-case scenario, the examination or creation of a node in the trie will take operations, where will be the length of the word. As , this means that the worst-case time complexity is , that is,
-
Search Word(word): Depending upon the search query, the time complexity for searching a word in a trie would vary:
-
Scenario 1: For a word without any “.”, the time complexity would be , where is the length of the word. As , this means that the worst-case time complexity is .
-
Scenario 2: For a word that is a combination of letters and “.”, the function would have to explore branches whenever a “.” is encountered. Let’s say the word to search is “…oo.”. Due to the first “.”, we would have to check all the children of the root node in the trie. Due to the second dot, we would have to check all the grandchildren of the root node. So, the cost of searching for this word depends on the number of wildcard characters, , and the total length of the word . So, the cost in this scenario will be .
Let’s consider the case where the term is maximized, that is, . In this case, the time complexity is , that is, , that is, . Alternatively, we may want to maximize . As , the highest value of the term is . In this case, the worst-case time complexity is , that is, , that is, .
-
Scenario 3: If the query word only contains wildcard characters, then searching for such a word will take , where is the number of wildcard characters. As , this means that the worst-case time complexity is , that is, .
-
-
Get Words(): If the length of the longest word is , and given that there are letters in the English alphabet, the time complexity of retrieving all the words will be . This case occurs where all the children of every node in the trie are in use. As , this means that the worst-case time complexity is .
Space complexity
-
Add Word(word): In the worst-case scenario, the newly inserted word doesn’t share a prefix with the words already inserted in the trie. For a word with characters, we would need additional space in memory. As , this means that the worst-case space complexity is , that is, .
-
Search Word(word): In the worst case, for a word of length , there would be recursive calls to the search function, so we would take up space on the stack. Again, as , this means that the worst-case space complexity is .
-
Get Words(): In the worst case, we would need to make recursive calls to retrieve the longest word of length , taking up space on the stack. Again, as , the worst-case space complexity is .
Level up your interview prep. Join Educative to access 70+ hands-on prep courses.