Skip to content
Related Articles

Related Articles

Count distinct substrings of a string using Rabin Karp algorithm

View Discussion
Improve Article
Save Article
  • Difficulty Level : Medium
  • Last Updated : 29 Jun, 2022
View Discussion
Improve Article
Save Article

Given a string, count the number of distinct substrings using Rabin Karp Algorithm.

Examples

Input  : str = “aba”
Output : 5
Explanation :
Total number of distinct substring are 5 - "a", "ab", "aba", "b" ,"ba" 

Input  : str = “abcd”
Output : 10
Explanation :
Total number of distinct substring are 10 - "a", "ab", "abc", "abcd", "b", "bc", "bcd", "c", "cd", "d" 

Approach:

Prerequisite: Rabin-Karp Algorithm for Pattern Searching

Calculate the current hash value of the current character and store
in a dictionary/map to avoid repetition. 

To compute the hash (rolling hash) as done in Rabin-Karp algorithm follow:

The hash function suggested by Rabin and Karp calculates an integer value. The integer value for a string is numeric value of a string. For example, if all possible characters are from 1 to 10, the numeric value of “122” will be 122. The number of possible characters is higher than 10 (256 in general) and pattern length can be large. So the numeric values cannot be practically stored as an integer. Therefore, the numeric value is calculated using modular arithmetic to make sure that the hash values can be stored in an integer variable (can fit in memory words). To do rehashing, we need to take off the most significant digit and add the new least significant digit for in hash value. Rehashing is done using the following formula.

hash( txt[s+1 .. s+m] ) = ( d ( hash( txt[s .. s+m-1]) – txt[s]*h ) + txt[s + m] ) mod q

hash( txt[s .. s+m-1] ) : Hash value at shift s.
hash( txt[s+1 .. s+m] ) : Hash value at next shift (or shift s+1)
d: Number of characters in the alphabet
q: A prime number
h: d^(m-1)

The idea is similar as we evaluate a mathematical expression. For example, we have a string of “1234” let we compute the value of the substring “12” is 12 and we want to compute the value of the substring “123” this can be calculated as ((12)*10+3)=123, similar logic is applied here.
 

C++




#include <bits/stdc++.h>
using namespace std;
 
// Driver code
int main()
{
  int t = 1;
 
  // store prime to reduce overflow
  long long mod = 9007199254740881;
 
  for(int i = 0; i < t; i++)
  {
 
    // string to check number of distinct substring
    string s = "abcd";
 
    // to store substrings
    vector<vector<long long>>l;
 
    // to store hash values by Rabin Karp algorithm
    unordered_map<long long,int>d;
 
    for(int i=0;i<s.length();i++){
      int suma = 0;
      long long pre = 0;
 
      // Number of input alphabets
      long long D = 256;
 
      for(int j=i;j<s.length();j++){
 
        // calculate new hash value by adding next element
        pre = (pre*D+s[j]) % mod;
 
        // store string length if non repeat
        if(d.find(pre) == d.end())
          l.push_back({i, j});
        d[pre] = 1;
      }
    }
 
    // resulting length
    cout<<l.size()<<endl;
 
    // resulting distinct substrings
    for(int i = 0; i < l.size(); i++)
      cout << s.substr(l[i][0],l[i][1]+1-l[i][0]) << " ";
  }
}
 
// This code is contributed by shinjanpatra

Python3




# importing libraries
import sys
import math as mt
t = 1
# store prime to reduce overflow
mod = 9007199254740881
 
for ___ in range(t):
 
    # string to check number of distinct substring
    s = 'abcd'
 
    # to store substrings
    l = []
 
    # to store hash values by Rabin Karp algorithm
    d = {}
 
    for i in range(len(s)):
        suma = 0
        pre = 0
 
        # Number of input alphabets
        D = 256
 
        for j in range(i, len(s)):
 
            # calculate new hash value by adding next element
            pre = (pre*D+ord(s[j])) % mod
 
            # store string length if non repeat
            if d.get(pre, -1) == -1:
                l.append([i, j])
            d[pre] = 1
 
    # resulting length
    print(len(l))
 
    # resulting distinct substrings
    for i in range(len(l)):
        print(s[l[i][0]:l[i][1]+1], end=" ")

Javascript




<script>
 
let t = 1
 
// store prime to reduce overflow
let mod = 9007199254740881
 
for(let i = 0; i < t; i++){
    // string to check number of distinct substring
    let s = 'abcd'
 
    // to store substrings
    let l = []
 
    // to store hash values by Rabin Karp algorithm
    let d = new Map()
 
    for(let i=0;i<s.length;i++){
        let suma = 0
        let pre = 0
 
        // Number of input alphabets
        let D = 256
 
        for(let j=i;j<s.length;j++){
 
            // calculate new hash value by adding next element
            pre = (pre*D+s.charCodeAt(j)) % mod
 
            // store string length if non repeat
            if(d.has([pre, -1]) == false)
                l.push([i, j])
            d.set(pre , 1)
        }
    }
 
    // resulting length
    document.write(l.length,"</br>")
 
    // resulting distinct substrings
    for(let i = 0; i < l.length; i++)
        document.write(s.substring(l[i][0],l[i][1]+1)," ")
}
 
// This code is contributed by shinjanpatra
 
</script>

Output

10
a ab abc abcd b bc bcd c cd d 

Time Complexity: O(N2), N is the length of string


My Personal Notes arrow_drop_up
Recommended Articles
Page :

Start Your Coding Journey Now!