Implementing a Trie Data Structure in Java

Harnessing the Power of the Trie Data Structure

Trie (pronounced as "try") is a tree-like data structure used for efficiently storing and searching a dynamic set of strings. It's particularly useful for tasks such as autocomplete, spell checking, and prefix searching. In this article, we'll dive into the implementation details of a Trie data structure in Java, complete with code snippets and explanations.

Problem Statement:

This a problem revolving around the concept of a "trie" or "prefix tree," a specialized tree data structure used for efficient storage and retrieval of keys within a collection of strings. This versatile data structure finds application in tasks such as autocomplete and spell checking.

The task at hand involves the implementation of the Trie class, which encompasses several essential functionalities:

  1. Initializing the Trie object using the Trie() constructor.

  2. Inserting a given string, denoted as word, into the trie using the insert(String word) method.

  3. Determining the count of occurrences of a specific string, indicated as word, within the trie. This is achieved through the countWordsEqualTo(String word) method.

  4. Calculating the count of strings within the trie that commence with a particular string, designated as prefix. The countWordsStartingWith(String prefix) method accomplishes this.

  5. Erasing a designated string, represented by word, from the trie with the erase(String word) method.

By addressing these components, the implementation facilitates effective manipulation and retrieval of data within the trie structure.

TrieNode: The Building Block

At the core of the Trie data structure is the TrieNode class, representing a node in the Trie. Each TrieNode contains an array of child nodes (26 in this case, one for each lowercase English letter), along with counters to keep track of word instances and prefixes. Let's explore the key methods and functionalities within this class.

Checking for Child Nodes

The contains method is used to determine whether a specific child node exists for a given character. It performs a quick check on the children array to see if the node is present.

public boolean contains(char ch) {
    return (children[ch - 'a'] != null);
}

Getting Child Nodes

The get method retrieves the child node corresponding to a given character. It directly returns the child node associated with the character if it exists.

public TrieNode get(char ch) {
    return children[ch - 'a'];
}

Creating Child Nodes

The set method is responsible for creating and setting a child node for a given character. If the child node does not exist, a new node is created and added to the children array.

public void set(char ch) {
    children[ch - 'a'] = new TrieNode();
}

Counting Word Instances

The getWordCount method returns the number of instances of a specific word that end at the current node.

public int getWordCount() {
    return wordCount;
}

Counting Prefixes

The getPrefixCount method provides the count of strings that have the current node as a prefix. This counter is crucial for efficiently finding the number of words starting with a particular prefix.

public int getPrefixCount() {
    return prefixCount;
}

Trie: Building and Using the Data Structure

The Trie class encapsulates the TrieNode instances and provides methods for inserting, searching, and erasing words from the Trie.

Constructor and Initialization

The constructor initializes the Trie by creating the root TrieNode.

public Trie() {
    root = new TrieNode();
}

Inserting Words

The insert method adds a word to the Trie by traversing the TrieNodes for each character in the word. It increments the prefix count for each node on the traversal path and updates the word count at the final node.

public void insert(String word) {
    TrieNode curr = root;

    for (int i = 0; i < word.length(); i++) {
        curr.increasePrefixCount();

        char ch = word.charAt(i);

        if (!curr.contains(ch))
            curr.set(ch);

        curr = curr.get(ch);
    }

    curr.increasePrefixCount();
    curr.increaseWordCount();
}

Counting Words and Prefixes

The countWordsEqualTo and countWordsStartingWith methods utilize Trie traversal to efficiently count instances of specific words and prefixes, respectively.

public int countWordsEqualTo(String word) {
    // Implementation details explained above
}

public int countWordsStartingWith(String prefix) {
    // Implementation details explained above
}

Erasing Words

The erase method removes a word from the Trie while updating prefix counts along the traversal path. If the word's count is zero, no action is taken.

public void erase(String word) {
    if (countWordsEqualTo(word) == 0) {
        return;
    }

    TrieNode curr = root;

    for (int i = 0; i < word.length(); i++) {
        curr.decreasePrefixCount();
        char ch = word.charAt(i);
        curr = curr.get(ch);
    }

    curr.decreasePrefixCount();
    curr.decreaseWordCount();
}

Conclusion

The Trie data structure is a powerful tool for efficiently storing and searching strings, making it ideal for tasks involving autocomplete, spell checking, and prefix searching. By breaking down the implementation into TrieNode and Trie classes, we've created a functional and easy-to-use Trie data structure in Java. This article has provided insights into the underlying logic of each method, enabling you to leverage the Trie's capabilities in various applications.