Open In App

CSES solution – Counting Patterns

Last Updated : 23 Apr, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

Given a string S and patterns[], count for each pattern the number of positions where it appears in the string.

Examples:

Input: S = “aybabtu”, patterns[] = {“bab”, “abc”, “a”}
Output:
1
0
2
Explanation:

  • “bab” occurs only 1 time in “aybabtu”, that is from S[2…4].
  • “bab” does not occur in “aybabtu”.
  • “a” occurs only 2 times in “aybabtu”, that is from S[0…0] and from S[3…3].

Input: S = “geeksforgeeks”, patterns[] = {“geeks”, “for”, “gfg”}
Output:
2
1
0
Explanation:

  • “geeks” occurs 2 times in “geeksforgeeks”, that is from S[0…4] and S[8…12].
  • “for” occurs 1 time in “geeksforgeeks”, that is from S[5…7].
  • “gfg” does not occur in “geeksforgeeks”.

Approach: To solve the problem, follow the below idea:

The idea is to uses Suffix Array data structure. A Suffix Array is a sorted array of all suffixes of a given string.

Let’s look at the intuition in step-by-steps:

Suffix Array Construction: The first part of the solution is to building a suffix array for the given string. A suffix array is a sorted array of all suffixes of a given string. The buildSuffixArray() function constructs this array. It starts by initializing the suffix array and position array. The position array holds the rank (i.e., lexicographic order) of each suffix. Then, it iteratively sorts the suffixes based on their current and next gap’s characters until all ranks are unique.

Pattern Checking: The checkPattern() function checks if a pattern is present at a given position in the suffix array. It compares the characters of the pattern with the characters of the suffix starting at the given position. If the pattern is lexicographically smaller, it returns -1; if larger, it returns 1; if equal and the pattern length is less than or equal to the remaining length of the suffix, it returns 0.

Pattern Searching: Our solve() function performs a binary search for the leftmost and rightmost occurrence of the pattern in the suffix array using the checkPattern() function. The difference between the rightmost and leftmost position plus one will gives the number of occurrences of the pattern in the string.

Step-by-step algorithm:

  • Comparison Function (compareSuffixes):
    • Compares two suffixes based on their positions and characters beyond a specified gap.
    • If positions are different, returns true if the first position is smaller.
    • If positions are equal, compares additional characters beyond the gap.
  • Build Suffix Array Function (buildSuffixArray):
    • Initializes the suffix array and positions based on the characters of the input string.
  • Uses a loop to build the suffix array:
    • Sorts the suffix array based on the comparison function.
    • Updates the positions based on the sorted order.
    • Checks if all suffixes are in order; if so, exits the loop.
  • Pattern Check Function (checkPattern):
  • Checks if a pattern is present at a given position in the suffix array.
  • Returns -1 if the pattern is smaller, 0 if it matches, and 1 if it’s greater.
  • Pattern Search Function (solve):
    • Uses binary search to find the range where the pattern appears in the suffix array.
    • Initializes left and right boundaries.
    • Finds the leftmost occurrence using binary search and updates the left boundary.
    • Finds the rightmost occurrence using binary search and updates the right boundary.
    • Calculates and prints the count of occurrences.

Below is the implementation of the algorithm:

C++
#include <bits/stdc++.h>
using namespace std;

#define int long long
#define endl '\n'

const int maxN = 1e5 + 5;
int suffixArray[maxN], position[maxN], temp[maxN];
int gap, n;
string s;

// Function to compare two suffixes
bool compareSuffixes(int x, int y)
{
    // Compare the positions of two suffixes
    if (position[x] != position[y])
        return position[x] < position[y];

    // Move to the next positions with a gap and
    // compare again
    x += gap;
    y += gap;
    return (x < n && y < n) ? position[x] < position[y]
                            : x > y;
}

// Function to build the suffix array
void buildSuffixArray()
{
    // Initialize the suffix array and positions based on
    // the characters of the string
    for (int i = 0; i < n; i++)
        suffixArray[i] = i, position[i] = s[i];

    // Build the suffix array using repeated sorting and
    // updating positions
    for (gap = 1;; gap <<= 1) {
        // Sort the suffix array based on the comparison
        // function
        sort(suffixArray, suffixArray + n, compareSuffixes);

        // Update the temporary array with cumulative
        // comparisons
        for (int i = 0; i < n - 1; i++)
            temp[i + 1]
                = temp[i]
                  + compareSuffixes(suffixArray[i],
                                    suffixArray[i + 1]);

        // Update the positions based on the sorted order
        for (int i = 0; i < n; i++)
            position[suffixArray[i]] = temp[i];

        // Check if all suffixes are in order; if so, exit
        // the loop
        if (temp[n - 1] == n - 1)
            break;
    }
}

// Function to check if a pattern is present at a given
// position in the suffix array
int checkPattern(int mid, string& pattern)
{
    int flag = -1, patternSize = pattern.size(),
        suffixStart = suffixArray[mid];

    // Check if the suffix can contain the entire pattern
    if (n - suffixStart >= patternSize)
        flag = 0;

    // Compare characters of the pattern and suffix
    for (int i = 0; i < min(n - suffixStart, patternSize);
         i++) {
        if (s[suffixStart + i] < pattern[i])
            return -1;
        if (s[suffixStart + i] > pattern[i])
            return 1;
    }
    return flag;
}

// Function to find and print the count of occurrences of a
// pattern in the string
void solve(string& pattern)
{
    int left = 0, right = n - 1;
    int answer = -1, l = left, r = right;

    // Binary search for the leftmost occurrence of the
    // pattern
    while (l <= r) {
        int mid = l + (r - l) / 2;
        int check = checkPattern(mid, pattern);
        if (check == 0) {
            answer = mid;
            r = mid - 1;
        }
        else if (check == 1)
            r = mid - 1;
        else
            l = mid + 1;
    }

    // If the pattern is not found, print 0 and return
    if (answer == -1) {
        cout << 0 << endl;
        return;
    }

    // Update the left boundary for the next binary search
    left = answer, l = left, r = right;

    // Binary search for the rightmost occurrence of the
    // pattern
    while (l <= r) {
        int mid = l + (r - l) / 2;
        int check = checkPattern(mid, pattern);
        if (check == 0) {
            answer = mid;
            l = mid + 1;
        }
        else if (check == -1)
            l = mid + 1;
        else
            r = mid - 1;
    }

    // Update the right boundary
    right = answer;

    // Print the count of occurrences
    cout << right - left + 1 << endl;
}

// Main function
signed main()
{
    // Set the input string and its size
    s = "aybabtu";
    n = s.size();

    // Build the suffix array
    buildSuffixArray();

    // Define patterns to search for
    vector<string> patterns = { "bab", "abc", "a" };

    // For each pattern, call the solve function to find and
    // print the count of occurrences
    for (auto pattern : patterns) {
        solve(pattern);
    }
}
Java
import java.util.*;

public class Main {
    // Function to build the suffix array
    static List<Integer> buildSuffixArray(String s) {
        int n = s.length();
        List<Integer> suffixArray = new ArrayList<>();
        for (int i = 0; i < n; i++) {
            suffixArray.add(i);
        }
        int[] position = new int[n];
        for (int i = 0; i < n; i++) {
            position[i] = s.charAt(i);
        }
        int[] temp = new int[n];

        int[] gap = {1}; // Use an array to hold the value of gap

        while (true) {
            suffixArray.sort((a, b) -> {
                if (position[a] != position[b]) {
                    return position[a] - position[b];
                }
                int aNextPos = (a + gap[0] < n) ? position[a + gap[0]] : -1;
                int bNextPos = (b + gap[0] < n) ? position[b + gap[0]] : -1;
                return aNextPos - bNextPos;
            });

            for (int i = 0; i < n - 1; i++) {
                temp[i + 1] = temp[i] + (compareSuffixes(suffixArray.get(i), suffixArray.get(i + 1), position, gap[0], n) ? 1 : 0);
            }

            for (int i = 0; i < n; i++) {
                position[suffixArray.get(i)] = temp[i];
            }

            if (temp[n - 1] == n - 1) {
                break;
            }

            gap[0] <<= 1;
        }

        return suffixArray;
    }

    // Function to compare two suffixes
    static boolean compareSuffixes(int x, int y, int[] position, int gap, int n) {
        if (position[x] != position[y]) {
            return position[x] < position[y];
        }

        x += gap;
        y += gap;
        return (x < n && y < n) ? (position[x] < position[y]) : (x > y);
    }

    // Function to check if a pattern is present at a given position in the suffix array
    static int checkPattern(int mid, String pattern, String s, List<Integer> suffixArray) {
        int flag = -1;
        int patternSize = pattern.length();
        int suffixStart = suffixArray.get(mid);

        if (s.length() - suffixStart >= patternSize) {
            flag = 0;
        }

        for (int i = 0; i < Math.min(s.length() - suffixStart, patternSize); i++) {
            if (s.charAt(suffixStart + i) < pattern.charAt(i)) {
                return -1;
            }
            if (s.charAt(suffixStart + i) > pattern.charAt(i)) {
                return 1;
            }
        }

        return flag;
    }

    // Function to find and print the count of occurrences of a pattern in the string
    static void solve(String pattern, String s, List<Integer> suffixArray) {
        int left = 0;
        int right = s.length() - 1;
        int answer = -1;
        int l = left;
        int r = right;

        while (l <= r) {
            int mid = l + (r - l) / 2;
            int check = checkPattern(mid, pattern, s, suffixArray);
            if (check == 0) {
                answer = mid;
                r = mid - 1;
            } else if (check == 1) {
                r = mid - 1;
            } else {
                l = mid + 1;
            }
        }

        if (answer == -1) {
            System.out.println(0);
            return;
        }

        left = answer;
        l = left;
        r = right;

        while (l <= r) {
            int mid = l + (r - l) / 2;
            int check = checkPattern(mid, pattern, s, suffixArray);
            if (check == 0) {
                answer = mid;
                l = mid + 1;
            } else if (check == -1) {
                l = mid + 1;
            } else {
                r = mid - 1;
            }
        }

        right = answer;
        System.out.println(right - left + 1);
    }

    // Main function
    public static void main(String[] args) {
        String s = "aybabtu";
        List<Integer> suffixArray = buildSuffixArray(s);
        List<String> patterns = Arrays.asList("bab", "abc", "a");

        for (String pattern : patterns) {
            solve(pattern, s, suffixArray);
        }
    }
}
Python3
# Importing the required libraries
from typing import List

# Function to compare two suffixes
def compare_suffixes(x: int, y: int, position: List[int], gap: int, n: int) -> bool:
    # Compare the positions of two suffixes
    if position[x] != position[y]:
        return position[x] < position[y]

    # Move to the next positions with a gap and compare again
    x += gap
    y += gap
    return (x < n and y < n) if position[x] < position[y] else x > y

# Function to build the suffix array
def build_suffix_array(s: str) -> List[int]:
    n = len(s)
    suffix_array = list(range(n))
    position = [ord(char) for char in s]
    temp = [0]*n

    # Build the suffix array using repeated sorting and updating positions
    gap = 1
    while True:
        suffix_array.sort(key=lambda x: (position[x], position[x + gap] if x + gap < n else -1))

        # Update the temporary array with cumulative comparisons
        for i in range(n - 1):
            temp[i + 1] = temp[i] + (compare_suffixes(suffix_array[i], suffix_array[i + 1], position, gap, n))

        # Update the positions based on the sorted order
        for i in range(n):
            position[suffix_array[i]] = temp[i]

        # Check if all suffixes are in order; if so, exit the loop
        if temp[n - 1] == n - 1:
            break

        gap <<= 1

    return suffix_array

# Function to check if a pattern is present at a given position in the suffix array
def check_pattern(mid: int, pattern: str, s: str, suffix_array: List[int]) -> int:
    flag = -1
    pattern_size = len(pattern)
    suffix_start = suffix_array[mid]

    # Check if the suffix can contain the entire pattern
    if len(s) - suffix_start >= pattern_size:
        flag = 0

    # Compare characters of the pattern and suffix
    for i in range(min(len(s) - suffix_start, pattern_size)):
        if s[suffix_start + i] < pattern[i]:
            return -1
        if s[suffix_start + i] > pattern[i]:
            return 1

    return flag

# Function to find and print the count of occurrences of a pattern in the string
def solve(pattern: str, s: str, suffix_array: List[int]) -> None:
    left = 0
    right = len(s) - 1
    answer = -1
    l = left
    r = right

    # Binary search for the leftmost occurrence of the pattern
    while l <= r:
        mid = l + (r - l) // 2
        check = check_pattern(mid, pattern, s, suffix_array)
        if check == 0:
            answer = mid
            r = mid - 1
        elif check == 1:
            r = mid - 1
        else:
            l = mid + 1

    # If the pattern is not found, print 0 and return
    if answer == -1:
        print(0)
        return

    # Update the left boundary for the next binary search
    left = answer
    l = left
    r = right

    # Binary search for the rightmost occurrence of the pattern
    while l <= r:
        mid = l + (r - l) // 2
        check = check_pattern(mid, pattern, s, suffix_array)
        if check == 0:
            answer = mid
            l = mid + 1
        elif check == -1:
            l = mid + 1
        else:
            r = mid - 1

    # Update the right boundary
    right = answer

    # Print the count of occurrences
    print(right - left + 1)

# Main function
def main():
    # Set the input string and its size
    s = "aybabtu"

    # Build the suffix array
    suffix_array = build_suffix_array(s)

    # Define patterns to search for
    patterns = ["bab", "abc", "a"]

    # For each pattern, call the solve function to find and print the count of occurrences
    for pattern in patterns:
        solve(pattern, s, suffix_array)

if __name__ == "__main__":
    main()
JavaScript
// Function to build the suffix array
function buildSuffixArray(s) {
    let n = s.length;
    let suffixArray = Array.from({ length: n }, (_, i) => i);
    let position = Array.from(s, char => char.charCodeAt(0));
    let temp = new Array(n).fill(0);

    let gap = 1;
    while (true) {
        suffixArray.sort((a, b) => {
            if (position[a] !== position[b]) {
                return position[a] - position[b];
            }
            let aNextPos = (a + gap < n) ? position[a + gap] : -1;
            let bNextPos = (b + gap < n) ? position[b + gap] : -1;
            return aNextPos - bNextPos;
        });

        for (let i = 0; i < n - 1; i++) {
            temp[i + 1] = temp[i] + (compareSuffixes(suffixArray[i], suffixArray[i + 1], position, gap, n) ? 1 : 0);
        }

        for (let i = 0; i < n; i++) {
            position[suffixArray[i]] = temp[i];
        }

        if (temp[n - 1] === n - 1) {
            break;
        }

        gap <<= 1;
    }

    return suffixArray;
}

// Function to compare two suffixes
function compareSuffixes(x, y, position, gap, n) {
    if (position[x] !== position[y]) {
        return position[x] < position[y];
    }

    x += gap;
    y += gap;
    return (x < n && y < n) ? (position[x] < position[y]) : (x > y);
}

// Function to check if a pattern is present at a given position in the suffix array
function checkPattern(mid, pattern, s, suffixArray) {
    let flag = -1;
    let patternSize = pattern.length;
    let suffixStart = suffixArray[mid];

    if (s.length - suffixStart >= patternSize) {
        flag = 0;
    }

    for (let i = 0; i < Math.min(s.length - suffixStart, patternSize); i++) {
        if (s[suffixStart + i] < pattern[i]) {
            return -1;
        }
        if (s[suffixStart + i] > pattern[i]) {
            return 1;
        }
    }

    return flag;
}

// Function to find and print the count of occurrences of a pattern in the string
function solve(pattern, s, suffixArray) {
    let left = 0;
    let right = s.length - 1;
    let answer = -1;
    let l = left;
    let r = right;

    while (l <= r) {
        let mid = l + Math.floor((r - l) / 2);
        let check = checkPattern(mid, pattern, s, suffixArray);
        if (check === 0) {
            answer = mid;
            r = mid - 1;
        } else if (check === 1) {
            r = mid - 1;
        } else {
            l = mid + 1;
        }
    }

    if (answer === -1) {
        console.log(0);
        return;
    }

    left = answer;
    l = left;
    r = right;

    while (l <= r) {
        let mid = l + Math.floor((r - l) / 2);
        let check = checkPattern(mid, pattern, s, suffixArray);
        if (check === 0) {
            answer = mid;
            l = mid + 1;
        } else if (check === -1) {
            l = mid + 1;
        } else {
            r = mid - 1;
        }
    }

    right = answer;
    console.log(right - left + 1);
}

// Main function
function main() {
    let s = "aybabtu";
    let suffixArray = buildSuffixArray(s);
    let patterns = ["bab", "abc", "a"];

    for (let pattern of patterns) {
        solve(pattern, s, suffixArray);
    }
}

main();

Output
1
0
2

Time Complexity:

Building Suffix Array: O(n log2n)
Checking each Pattern: O(logn)
Overall Time Complexity:(mlogn + nlog2n), , where m is the number of patterns and n is the length of the input string.

Auxiliary Space Complexity: O(n) due to the arrays suffixArray, position, and temp. These arrays are used to store information about the suffix array and the intermediate steps in its construction.



Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads