Suffix Array and LCP Array for String Processing

Understanding Suffix and LCP Arrays

Suffix arrays and LCP (Longest Common Prefix) arrays are advanced data structures used for efficient string processing. A suffix array provides a sorted list of all suffixes of a string, which enables fast searching and comparison operations. The LCP array complements the suffix array by storing the length of the longest common prefix between adjacent suffixes in the sorted order. Together, they form a powerful toolkit for solving a variety of complex string problems, such as finding the longest repeated substring or the number of unique substrings.

Example: Building and Using Suffix/LCP Arrays

Here is a Python implementation demonstrating how to build a suffix array (using a simple O(n log^2 n) approach) and then construct the LCP array using Kasai's algorithm.


# O(n log^2 n) suffix array construction
def build_suffix_array(s):
    n = len(s)
    suffixes = [(s[i:], i) for i in range(n)]
    suffixes.sort(key=lambda x: x[0])
    return [suffix[1] for suffix in suffixes]

# Kasai's algorithm for LCP array construction in O(n)
def build_lcp_array(s, suffix_array):
    n = len(s)
    lcp = [0] * n
    # inv_suffix[i] stores the rank of suffix starting at i
    inv_suffix = [0] * n
    for i in range(n):
        inv_suffix[suffix_array[i]] = i

    k = 0  # Length of previous LCP
    for i in range(n):
        if inv_suffix[i] == n - 1:
            k = 0
            continue
        
        j = suffix_array[inv_suffix[i] + 1]
        while i + k < n and j + k < n and s[i + k] == s[j + k]:
            k += 1
        
        lcp[inv_suffix[i] + 1] = k
        
        if k > 0:
            k -= 1
            
    return lcp

# Example: Find the longest repeated substring
def longest_repeated_substring(s):
    if not s:
        return ""
    suffix_array = build_suffix_array(s)
    lcp_array = build_lcp_array(s, suffix_array)
    
    max_lcp = 0
    start_index = 0
    for i in range(1, len(s)):
        if lcp_array[i] > max_lcp:
            max_lcp = lcp_array[i]
            start_index = suffix_array[i]
            
    return s[start_index : start_index + max_lcp]

# Example Usage
text = "banana"
suffix_arr = build_suffix_array(text)
# Expected Suffix Array for 'banana': [5, 3, 1, 0, 4, 2] corresponding to
# 'a', 'ana', 'anana', 'banana', 'na', 'nana'
lcp_arr = build_lcp_array(text, suffix_arr)
# Expected LCP Array: [0, 1, 3, 0, 2, 0] (lcp('a', 'ana')=1, lcp('ana', 'anana')=3, etc.)

print(f"Suffix Array for '{text}': {suffix_arr}")
print(f"LCP Array for '{text}': {lcp_arr}")

lrs = longest_repeated_substring(text)
print(f"Longest repeated substring in '{text}': '{lrs}'") # 'ana'

text2 = "mississippi"
lrs2 = longest_repeated_substring(text2)
print(f"Longest repeated substring in '{text2}': '{lrs2}'") # 'issi'

            

Related Algorithm Guides

Explore more algorithm interview guides powered by AI coaching

Career Pivot Interview Question Answers
AI-powered interview preparation guide
Floyd Warshall Algorithm For All Pairs Shortest Paths
AI-powered interview preparation guide
Professional Conduct Interview Questions
AI-powered interview preparation guide
Ai Interview Confidence Booster
AI-powered interview preparation guide