Open In App

Pattern Searching using Suffix Tree

Last Updated : 11 Mar, 2024
Summarize
Comments
Improve
Suggest changes
Like Article
Like
Save
Share
Report
News Follow

Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m. 

Preprocess Pattern or Preprocess Text? 

We have discussed the following algorithms in the previous posts: KMP Algorithm Rabin Karp Algorithm Finite Automata based Algorithm Boyer Moore Algorithm 

All of the above algorithms preprocess the pattern to make the pattern searching faster. The best time complexity that we could get by preprocessing pattern is O(n) where n is length of the text. In this post, we will discuss an approach that preprocesses the text. A suffix tree is built of the text. After preprocessing text (building suffix tree of text), we can search any pattern in O(m) time where m is length of the pattern. Imagine you have stored complete work of William Shakespeare and preprocessed it. You can search any string in the complete work in time just proportional to length of the pattern. This is really a great improvement because length of pattern is generally much smaller than text. Preprocessing of text may become costly if the text changes frequently. It is good for fixed text or less frequently changing text though. 

A Suffix Tree for a given text is a compressed trie for all suffixes of the given text. We have discussed Standard Trie. Let us understand Compressed Trie with the following array of words.

{bear, bell, bid, bull, buy, sell, stock, stop}

Following is standard trie for the above input set of words. 

Following is the compressed trie. Compress Trie is obtained from standard trie by joining chains of single nodes. The nodes of a compressed trie can be stored by storing index ranges at the nodes. 
 

 

How to build a Suffix Tree for a given text? 

As discussed above, Suffix Tree is compressed trie of all suffixes, so following are very abstract steps to build a suffix tree from given text. 1) Generate all suffixes of given text. 2) Consider all suffixes as individual words and build a compressed trie. Let us consider an example text “banana\0” where ‘\0’ is string termination character. Following are all suffixes of “banana\0”

banana\0
anana\0
nana\0
ana\0
na\0
a\0
\0

If we consider all of the above suffixes as individual words and build a trie, we get following. 
 

 

If we join chains of single nodes, we get the following compressed trie, which is the Suffix Tree for given text “banana\0” 
 

 

Please note that above steps are just to manually create a Suffix Tree. We will be discussing actual algorithm and implementation in a separate post. 

How to search a pattern in the built suffix tree? 

We have discussed above how to build a Suffix Tree which is needed as a preprocessing step in pattern searching. 

Following are abstract steps to search a pattern in the built Suffix Tree. 

1) Starting from the first character of the pattern and root of Suffix Tree, do following for every character. …..

a) For the current character of pattern, if there is an edge from the current node of suffix tree, follow the edge. …..
b) If there is no edge, print “pattern doesn’t exist in text” and return.

2) If all characters of pattern have been processed, i.e., there is a path from root for characters of the given pattern, then print “Pattern found”. Let us consider the example pattern as “nan” to see the searching process. Following diagram shows the path followed for searching “nan” or “nana”.  

How does this work? 

Every pattern that is present in text (or we can say every substring of text) must be a prefix of one of all possible suffixes. The statement seems complicated, but it is a simple statement, we just need to take an example to check validity of it. 

Applications of Suffix Tree 

Suffix tree can be used for a wide range of problems. Following are some famous problems where Suffix Trees provide optimal time complexity solution. 

1) Pattern Searching 
2) Finding the longest repeated substring 
3) Finding the longest common substring 
4) Finding the longest palindrome in a string 

There are many more applications. See this for more details. Ukkonen’s Suffix Tree Construction is discussed in following articles:

Ukkonen’s Suffix Tree Construction – Part 1
Ukkonen’s Suffix Tree Construction – Part 2 
Ukkonen’s Suffix Tree Construction – Part 3 
Ukkonen’s Suffix Tree Construction – Part 4 
Ukkonen’s Suffix Tree Construction – Part 5 
Ukkonen’s Suffix Tree Construction – Part 6



Similar Reads

Suffix Tree Application 2 - Searching All Patterns
Given a text string and a pattern string, find all occurrences of the pattern in string. Few pattern searching algorithms (KMP, Rabin-Karp, Naive Algorithm, Finite Automata) are already discussed, which can be used for this check. Here we will discuss the suffix tree based algorithm. In the 1st Suffix Tree Application (Substring Check), we saw how
15+ min read
Suffix Tree Application 4 - Build Linear Time Suffix Array
Given a string, build it's Suffix Array We have already discussed following two ways of building suffix array: Naive O(n2Logn) algorithmEnhanced O(nLogn) algorithm Please go through these to have the basic understanding. Here we will see how to build suffix array in linear time using suffix tree.As a prerequisite, we must know how to build a suffix
15+ min read
Difference between Suffix Array and Suffix Tree
Suffix Array and Suffix Tree are data structures used for the efficient string processing and pattern matching. They provide the different ways to the store and query substrings each with the unique characteristics and use cases. Understanding the differences between them helps in the choosing the right data structure for the specific applications.
3 min read
Check if count of substrings in S with string S1 as prefix and S2 as suffix is equal to that with S2 as prefix and S1 as suffix
Given three strings S, S1, and S2, the task is to check if the number of substrings that start and end with S1 and S2 is equal to the number of substrings that start and end with S2 and S1 or not. If found to be true, then print "Yes". Otherwise, print "No". Examples: Input: S = "helloworldworldhelloworld", S1 = "hello", S2 = "world"Output: NoExpla
8 min read
Construct array B as last element left of every suffix array obtained by performing given operations on every suffix of given array
Given an array arr[] of N integers, the task is to print the last element left of every suffix array obtained by performing the following operation on every suffix of the array, arr[]: Copy the elements of the suffix array into an array suff[].Update ith suffix element as suff[i] = (suff[i] OR suff[i+1]) - (suff[i] XOR suff[i+1]) reducing the size
9 min read
Find the suffix factorials of a suffix sum array of the given array
Given an array arr[] consisting of N positive integers, the task is to find the suffix factorials of a suffix sum array of the given array. Examples: Input: arr[] = {1, 2, 3, 4}Output: {3628800, 362880, 5040, 24}Explanation: The suffix sum of the given array is {10, 9, 7, 4}. Therefore, suffix factorials of the obtained suffix sum array is {10!, 9!
5 min read
Maximum prefix sum which is equal to suffix sum such that prefix and suffix do not overlap
Given an array arr[] of N Positive integers, the task is to find the largest prefix sum which is also the suffix sum and prefix and suffix do not overlap. Examples: Input: N = 5, arr = [1, 3, 2, 1, 4]Output: 4Explanation: consider prefix [1, 3] and suffix [4] which gives maximum prefix sum which is also suffix sum such that prefix and suffix do not
7 min read
Pattern Searching using C++ library
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function that prints all occurrences of pat[] in txt[]. You may assume that n > m.Examples: Input : txt[] = "geeks for geeks" pat[] = "geeks" Output : Pattern found at index 0 Pattern found at index 10 Input : txt[] = "aaaa" pat[] = "aa" Output : Pattern found at index 0 Pattern found a
3 min read
Pattern Searching using a Trie of all Suffixes
Problem Statement: Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.As discussed in the previous post, we discussed that there are two ways efficiently solve the above problem.1) Preprocess Pattern: KMP Algorithm, Rabin Kar
13 min read
Overview of Graph, Trie, Segment Tree and Suffix Tree Data Structures
Introduction:Graph: A graph is a collection of vertices (nodes) and edges that represent relationships between the vertices. Graphs are used to model and analyze networks, such as social networks or transportation networks.Trie: A trie, also known as a prefix tree, is a tree-like data structure that stores a collection of strings. It is used for ef
10 min read
Real time optimized KMP Algorithm for Pattern Searching
In the article, we have already discussed the KMP algorithm for pattern searching. In this article, a real-time optimized KMP algorithm is discussed. From the previous article, it is known that KMP(a.k.a. Knuth-Morris-Pratt) algorithm preprocesses the pattern P and constructs a failure function F(also called as lps[]) to store the length of the lon
7 min read
Rabin-Karp algorithm for Pattern Searching in Matrix
Given matrices txt[][] of dimensions m1 x m2 and pattern pat[][] of dimensions n1 x n2, the task is to check whether a pattern exists in the matrix or not, and if yes then print the top most indices of the pat[][] in txt[][]. It is assumed that m1, m2 ? n1, n2 Examples: Input: txt[][] = {{G, H, I, P} {J, K, L, Q} {R, G, H, I} {S, J, K, L} } pat[][]
15+ min read
Rabin-Karp Algorithm for Pattern Searching
Given a text T[0. . .n-1] and a pattern P[0. . .m-1], write a function search(char P[], char T[]) that prints all occurrences of P[] present in T[] using Rabin Karp algorithm. You may assume that n > m. Examples: Input: T[] = "THIS IS A TEST TEXT", P[] = "TEST"Output: Pattern found at index 10 Input: T[] = "AABAACAADAABAABA", P[] = "AABA"Output:
15 min read
Optimized Algorithm for Pattern Searching
Question: We have discussed the Naive String matching algorithm here. Consider a situation where all characters of a pattern are different. Can we modify the original Naive String Matching algorithm so that it works better for these types of patterns? If we can, then what are the changes to the original algorithm? Solution: In the original Naive St
7 min read
Pattern Searching | Set 6 (Efficient Construction of Finite Automata)
In the previous post, we discussed the Finite Automata-based pattern searching algorithm. The FA (Finite Automata) construction method discussed in the previous post takes O((m^3)*NO_OF_CHARS) time. FA can be constructed in O(m*NO_OF_CHARS) time. In this post, we will discuss the O(m*NO_OF_CHARS) algorithm for FA construction. The idea is similar t
9 min read
What is Pattern Searching ?
Pattern searching in Data Structures and Algorithms (DSA) is a fundamental concept that involves searching for a specific pattern or sequence of elements within a given data structure. This technique is commonly used in string matching algorithms to find occurrences of a particular pattern within a text or a larger string. By using various algorith
5 min read
Aho-Corasick Algorithm for Pattern Searching
Given an input text and an array of k words, arr[], find all occurrences of all words in the input text. Let n be the length of text and m be the total number of characters in all words, i.e. m = length(arr[0]) + length(arr[1]) + ... + length(arr[k-1]). Here k is total numbers of input words. Example: Input: text = "ahishers" arr[] = {"he", "she",
15+ min read
Boyer Moore Algorithm for Pattern Searching
Pattern searching is an important problem in computer science. When we do search for a string in a notepad/word file, browser, or database, pattern searching algorithms are used to show the search results. A typical problem statement would be-  " Given a text txt[0..n-1] and a pattern pat[0..m-1] where n is the length of the text and m is the lengt
15+ min read
Naive algorithm for Pattern Searching
Given text string with length n and a pattern with length m, the task is to prints all occurrences of pattern in text. Note: You may assume that n > m. Examples:  Input:  text = "THIS IS A TEST TEXT", pattern = "TEST"Output: Pattern found at index 10 Input:  text =  "AABAACAADAABAABA", pattern = "AABA"Output: Pattern found at index 0, Pattern fo
6 min read
Introduction to Pattern Searching
Pattern searching is an algorithm that involves searching for patterns such as strings, words, images, etc. We use certain algorithms to do the search process. The complexity of pattern searching varies from algorithm to algorithm. They are very useful when performing a search in a database. The Pattern Searching algorithm is useful for finding pat
15+ min read
Z algorithm (Linear time pattern searching Algorithm)
This algorithm efficiently locates all instances of a specific pattern within a text in linear time. If the length of the text is "n" and the length of the pattern is "m," then the total time taken is O(m + n), with a linear auxiliary space. It is worth noting that the time and auxiliary space of this algorithm is the same as the KMP algorithm, but
13 min read
Pattern Searching
Pattern searching algorithms are essential tools in computer science and data processing. These algorithms are designed to efficiently find a particular pattern within a larger set of data. Pattern searching algorithms play important role in tasks such as text processing, data mining, and information retrieval. What is Pattern Searching?Pattern sea
4 min read
KMP Algorithm for Pattern Searching
Given two strings txt and pat of size N and M, where N > M. String txt and pat represent the text and pattern respectively. The task is to print all indexes of occurrences of pattern string in the text string. Use one-based indexing while returning the indices. Examples: Input:  txt = "THIS IS A TEST TEXT", pat = "TEST"Output: Pattern found at i
15+ min read
Finite Automata algorithm for Pattern Searching
Given a text txt[0..n-1] and a pattern pat[0..m-1], write a function search(char pat[], char txt[]) that prints all occurrences of pat[] in txt[]. You may assume that n > m.Examples: Input: txt[] = "THIS IS A TEST TEXT" pat[] = "TEST" Output: Pattern found at index 10 Input: txt[] = "AABAACAADAABAABA" pat[] = "AABA" Output: Pattern found at inde
13 min read
Searching in Binary Indexed Tree using Binary Lifting in O(LogN)
Binary Indexed Tree (BIT) is a data structure that allows efficient queries of a range of elements in an array and updates on individual elements in O(log n) time complexity, where n is the number of elements in the array. Binary Lifting:One of the efficient techniques used to perform search operations in BIT is called Binary lifting.Binary Lifting
9 min read
Searching in Splay Tree
Splay Tree- Splay tree is a binary search tree. In a splay tree, M consecutive operations can be performed in O (M log N) time. A single operation may require O(N) time but average time to perform M operations will need O (M Log N) time. When a node is accessed, it is moved to the top through a set of operations known as splaying. Splaying techniqu
15+ min read
Searching in Binary Search Tree (BST)
Given a BST, the task is to search a node in this BST. For searching a value in BST, consider it as a sorted array. Now we can easily perform search operation in BST using Binary Search Algorithm. Input: Root of the below BST Output: TrueExplanation: 8 is present in the BST as right child of rootInput: Root of the below BST Output: FalseExplanation
7 min read
Iterative searching in Binary Search Tree
Given a Binary Search Tree and a key, the task is to find if the node with a value key is present in the BST or not. Example: Input: Root of the below BST Output: TrueExplanation: 8 is present in the BST as right child of rootInput: Root of the below BST Output: FalseExplanation: 14 is not present in the BST Approach: The idea is to traverse the Bi
6 min read
Suffix Tree Application 3 - Longest Repeated Substring
Given a text string, find Longest Repeated Substring in the text. If there are more than one Longest Repeated Substrings, get any one of them. Longest Repeated Substring in GEEKSFORGEEKS is: GEEKS Longest Repeated Substring in AAAAAAAAAA is: AAAAAAAAA Longest Repeated Substring in ABCDEFG is: No repeated substring Longest Repeated Substring in ABAB
15+ min read
Generalized Suffix Tree
In earlier suffix tree articles, we created suffix tree for one string and then we queried that tree for substring check, searching all patterns, longest repeated substring and built suffix array (All linear time operations).There are lots of other problems where multiple strings are involved. e.g. pattern searching in a text file or dictionary, sp
15+ min read