Aho-Corasick Algorithm for Multiple String Matching

Understanding the Aho-Corasick Algorithm

The Aho-Corasick algorithm is a powerful string-searching algorithm that can find all occurrences of a finite set of patterns within a text in linear time. It combines the data structures of a trie and a finite automaton. First, it builds a trie from all the patterns. Then, it adds failure links to the trie, transforming it into a state machine. This allows the search to proceed without backtracking, even when a mismatch occurs, by following failure links to find the next best partial match.

Example: Aho-Corasick Implementation

Here is a Python implementation of the Aho-Corasick algorithm. It involves building a trie, computing failure links, and then searching the text using the constructed automaton.


from collections import deque

class TrieNode:
    def __init__(self):
        self.children = {}
        self.fail = None
        self.output = []

def build_aho_corasick(patterns):
    root = TrieNode()
    # 1. Build the trie
    for pattern in patterns:
        node = root
        for char in pattern:
            node = node.children.setdefault(char, TrieNode())
        node.output.append(pattern)

    # 2. Build failure links using BFS
    queue = deque()
    for node in root.children.values():
        node.fail = root
        queue.append(node)

    while queue:
        current_node = queue.popleft()
        for char, next_node in current_node.children.items():
            fail_node = current_node.fail
            while char not in fail_node.children and fail_node is not root:
                fail_node = fail_node.fail
            
            if char in fail_node.children:
                next_node.fail = fail_node.children[char]
            else:
                next_node.fail = root
            
            next_node.output.extend(next_node.fail.output)
            queue.append(next_node)
            
    return root

def search_aho_corasick(text, root):
    results = []
    node = root
    for i, char in enumerate(text):
        while char not in node.children and node is not root:
            node = node.fail
        
        if char in node.children:
            node = node.children[char]
        else:
            node = root

        if node.output:
            for pattern in node.output:
                results.append((i - len(pattern) + 1, pattern))
    return results

# Example Usage
patterns = ["he", "she", "his", "hers"]
text = "ushers"

root = build_aho_corasick(patterns)
matches = search_aho_corasick(text, root)

print(f"Patterns found in '{text}':")
for index, pattern in matches:
    print(f"- Pattern '{pattern}' found at index {index}")
# Expected Output:
# Patterns found in 'ushers':
# - Pattern 'he' found at index 2
# - Pattern 'she' found at index 1
# - Pattern 'hers' found at index 2

            

Related Algorithm Guides

Explore more algorithm interview guides powered by AI coaching

Mixed Reality Developer Interview Questions
AI-powered interview preparation guide
Nonprofit Management Interview Questions
AI-powered interview preparation guide
Real Time Interview Performance Feedback
AI-powered interview preparation guide
Public Speaking Interview Question Preparation
AI-powered interview preparation guide