C Program for Anagram Substring Search (Or Search for all permutations)

Last Updated : 20 Oct, 2023

Write a C program for a given text txt[0..n-1] and a pattern pat[0..m-1], the task is to prints all occurrences of pat[] and its permutations (or anagrams) in txt[]. You may assume that n > m.

Note: Expected time complexity is O(n)

Examples:

Input: txt[] = “BACDGABCDA” pat[] = “ABCD”
Output: Found at Index 0
Found at Index 5
Found at Index 6

Input: txt[] = “AAABABAA” pat[] = “AABA”
Output: Found at Index 0
Found at Index 1
Found at Index 4

We strongly recommend that you click here and practice it, before moving on to the solution.

Approach:

The idea is to consider all the substrings of the txt[] with are of lengths equal to the length of pat[] and check whether the sorted version of substring is equal to the sorted version of pat[]. If they are equal then that particular substring is the permutation of the pat[], else not.

Steps-by-step approach:

Consider the Input txt[] = “BACDGABCDA” pat[] = “ABCD”.
Occurrences of the pat[] and its permutations are found at indexes 0,5,6.
The permutations are BACD,ABCD,BCDA.
Let’s sort the pat[] and the permutations of pat[] in txt[].
pat[] after sorting becomes : ABCD
permutations of pat[] in txt[] after sorting becomes : ABCD, ABCD,ABCD.
So we can say that the sorted version of pat[] and sorted version of its permutations yield the same result.

Below is the implementation of the above approach:

C

// C code for the above approach
#include <stdio.h>
#include <string.h>
 
// Function to search for a pattern in a given text
void search(char* pat, char* txt)
{
    // Get the length of the text
    int n = strlen(txt);
    // Get the length of the pattern
    int m = strlen(pat);
 
    // Create a sorted version of the pattern
    char sortedpat[m + 1];
    strcpy(sortedpat, pat);
    for (int i = 0; i < m; i++) {
        for (int j = i + 1; j < m; j++) {
            if (sortedpat[i] > sortedpat[j]) {
                char temp = sortedpat[i];
                sortedpat[i] = sortedpat[j];
                sortedpat[j] = temp;
            }
        }
    }
 
    // Iterate through the text to find matching patterns
    for (int i = 0; i <= n - m; i++) {
        char temp[m + 1];
        strncpy(temp, txt + i, m);
        temp[m] = '\0';
 
        // Create a sorted version of the current substring
        for (int j = 0; j < m; j++) {
            for (int k = j + 1; k < m; k++) {
                if (temp[j] > temp[k]) {
                    char temp_char = temp[j];
                    temp[j] = temp[k];
                    temp[k] = temp_char;
                }
            }
        }
 
        // Compare the sorted pattern with the sorted
        // substring
        if (strcmp(sortedpat, temp) == 0) {
            printf("Found at Index %d\n", i);
        }
    }
}
 
// Driver code
int main()
{
    // The input text
    char txt[] = "BACDGABCDA";
    // The pattern to search for
    char pat[] = "ABCD";
    // Call the search function
    search(pat, txt);
    return 0;
}

Output

Found at Index 0
Found at Index 5
Found at Index 6

Time Complexity : O(m²), where m is the length of pat[]
Auxiliary Space: O(m)

C Program for Anagram Substring Search (Or Search for all permutations) using Rabin Karp Algorithm:

The idea is to modify Rabin Karp Algorithm. For example, we can keep the hash value as sum of ASCII values of all characters under modulo of a big prime number. For every character of text, we can add the current character to hash value and subtract the first character of previous window. This solution looks good, but like standard Rabin Karp, the worst case time complexity of this solution is O(mn). The worst case occurs when all hash values match and we one by one match all characters.

We can achieve O(n) time complexity under the assumption that alphabet size is fixed which is typically true as we have maximum 256 possible characters in ASCII. The idea is to use two count arrays:

The first count array store frequencies of characters in pattern.

The second count array stores frequencies of characters in current window of text.

The important thing to note is, time complexity to compare two count arrays is O(1) as the number of elements in them are fixed (independent of pattern and text sizes).

Steps-by-step approach:

Store counts of frequencies of pattern in first count array countP[]. Also store counts of frequencies of characters in first window of text in array countTW[].
Now run a loop from i = M to N-1. Do following in loop.
- If the two count arrays are identical, we found an occurrence.
- Increment count of current character of text in countTW[] .
- Decrement count of first character in previous window in countWT[].
The last window is not checked by above loop, so explicitly check it.

Below is the implementation of the above approach:

C

// C program to search all anagrams of a pattern in a text 
#include <stdbool.h> 
#include <stdio.h> 
#include <string.h> 
 
#define MAX 256 
 
// This function returns true if contents of arr1[] and 
// arr2[] are same, otherwise false. 
bool compare(char arr1[], char arr2[]) 
{ 
    for (int i = 0; i < MAX; i++) 
        if (arr1[i] != arr2[i]) 
            return false; 
    return true; 
} 
 
// This function search for all permutations of pat[] in 
// txt[] 
void search(char* pat, char* txt) 
{ 
    int M = strlen(pat), N = strlen(txt); 
 
    // countP[]: Store count of all characters of pattern 
    // countTW[]: Store count of current window of text 
    char countP[MAX] = { 0 }, countTW[MAX] = { 0 }; 
    for (int i = 0; i < M; i++) { 
        (countP[pat[i]])++; 
        (countTW[txt[i]])++; 
    } 
 
    // Traverse through remaining characters of pattern 
    for (int i = M; i < N; i++) { 
        // Compare counts of current window of text with 
        // counts of pattern[] 
        if (compare(countP, countTW)) 
            printf("Found at Index %d \n", (i - M)); 
 
        // Add current character to current window 
        (countTW[txt[i]])++; 
 
        // Remove the first character of previous window 
        countTW[txt[i - M]]--; 
    } 
 
    // Check for the last window in text 
    if (compare(countP, countTW)) 
        printf("Found at Index %d \n", (N - M)); 
} 
 
/* Driver program to test above function */
int main() 
{ 
    char txt[] = "BACDGABCDA"; 
    char pat[] = "ABCD"; 
    search(pat, txt); 
    return 0; 
} 
 
// This code is contributed by Aditya Kumar (adityakumar129)

Output

Found at Index 0 
Found at Index 5 
Found at Index 6

Time Complexity: O(256 * (n – m) + m)
Auxiliary space: O(m), where m is 256

Please refer complete article on Anagram Substring Search (Or Search for all permutations) for more details!

Suggest improvement

C Program for Find largest prime factor of a number

C Program for Binary Insertion Sort

Share your thoughts in the comments

C Program for Anagram Substring Search (Or Search for all permutations)

C

C Program for Anagram Substring Search (Or Search for all permutations) using Rabin Karp Algorithm:

C

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?