Count Distinct Strings present in an array using Polynomial rolling hash function
Given an array of strings arr[], the task is to find the count of distinct strings present in the array using polynomial rolling hash function.
Examples:
Input: arr[] = { “abcde”, “abcce”, “abcdf”, “abcde”, “abcdf” }
Output: 3
Explanation:
Distinct strings in the array are { “abcde”, “abcce”, “abcdf” }.
Therefore, the required output is 3.
Input: arr[] = { “ab”, “abc”, “abcd”, “abcde”, “a” }
Output: 5
Explanation:
Distinct strings in the array are { “abcde”, “abcd”, “abc”, “ab”, “a” }.
Therefore, the required output is 5.
Approach: The problem can be solved using Hashing. The idea is to use rolling hash function to calculate the hash value of all the strings of the array and store it in another array, say Hash[]. Finally, print the count of distinct elements in Hash[] array. Follow the steps below to solve the problem:
- Initialize an array, say Hash[], to store the hash value of all the strings present in the array using rolling hash function.
- Initialize a variable, say cntElem, to store the count of distinct strings present in the array.
- Traverse the array arr[]. For every string encountered, calculate the hash value of that string and store it in the hash[] array.
- Sort the array hash[].
- Traverse the array hash[]. For every array element, check if hash[i] and hash[i – 1] are equal or not. If found to be false, then increment cntElem by 1.
- Finally, print the value of cntElem.
Below is the implementation of the above approach:
C++
#include<bits/stdc++.h>
using namespace std;
long long compute_hash(string str)
{
int p = 31;
int MOD = 1e9 + 7;
long long hash_val = 0;
long long mul = 1;
for ( char ch : str) {
hash_val
= (hash_val + (ch - 'a' + 1) * mul)
% MOD;
mul = (mul * p) % MOD;
}
return hash_val;
}
int distinct_str(vector<string>& arr, int n)
{
vector< long long > hash(n);
for ( int i = 0; i < n; i++) {
hash[i] = compute_hash(arr[i]);
}
sort(hash.begin(), hash.end());
int cntElem = 1;
for ( int i = 1; i < n; i++) {
if (hash[i] != hash[i - 1]) {
cntElem++;
}
}
return cntElem;
}
int main()
{
vector<string> arr={ "abcde" , "abcce" , "abcdf" , "abcde" };
int N = arr.size();
cout << distinct_str(arr, N) << endl;
return 0;
}
|
Java
import java.util.Arrays;
public class GFG {
static int compute_hash(String str)
{
int p = 31 ;
int MOD = ( int )1e9 + 7 ;
int hash_val = 0 ;
int mul = 1 ;
for ( int i = 0 ; i < str.length(); i++) {
char ch = str.charAt(i);
hash_val
= (hash_val + (ch - 'a' + 1 ) * mul)
% MOD;
mul = (mul * p) % MOD;
}
return hash_val;
}
static int distinct_str(String arr[], int n)
{
int hash[] = new int [n];
for ( int i = 0 ; i < n; i++) {
hash[i] = compute_hash(arr[i]);
}
Arrays.sort(hash);
int cntElem = 1 ;
for ( int i = 1 ; i < n; i++) {
if (hash[i] != hash[i - 1 ]) {
cntElem++;
}
}
return cntElem;
}
public static void main (String[] args)
{
String arr[] = { "abcde" , "abcce" ,
"abcdf" , "abcde" };
int N = arr.length;
System.out.println(distinct_str(arr, N));
}
}
|
Python3
def compute_hash( str ):
p = 31
MOD = 10 * * 9 + 7
hash_val = 0
mul = 1
for ch in str :
hash_val = (hash_val + ( ord (ch) - ord ( 'a' ) + 1 ) * mul) % MOD
mul = (mul * p) % MOD
return hash_val
def distinct_str(arr, n):
hash = [ 0 ] * (n)
for i in range (n):
hash [i] = compute_hash(arr[i])
hash = sorted ( hash )
cntElem = 1
for i in range ( 1 , n):
if ( hash [i] ! = hash [i - 1 ]):
cntElem + = 1
return cntElem
if __name__ = = '__main__' :
arr = [ "abcde" , "abcce" , "abcdf" , "abcde" ]
N = len (arr)
print (distinct_str(arr, N))
|
C#
using System;
class GFG
{
static int compute_hash( string str)
{
int p = 31;
int MOD = ( int )1e9 + 7;
int hash_val = 0;
int mul = 1;
for ( int i = 0; i < str.Length; i++)
{
char ch = str[i];
hash_val = (hash_val + (ch -
'a' + 1) * mul) % MOD;
mul = (mul * p) % MOD;
}
return hash_val;
}
static int distinct_str( string []arr, int n)
{
int []hash = new int [n];
for ( int i = 0; i < n; i++)
{
hash[i] = compute_hash(arr[i]);
}
Array.Sort(hash);
int cntElem = 1;
for ( int i = 1; i < n; i++)
{
if (hash[i] != hash[i - 1])
{
cntElem++;
}
}
return cntElem;
}
public static void Main (String[] args)
{
string []arr = { "abcde" , "abcce" ,
"abcdf" , "abcde" };
int N = arr.Length;
Console.WriteLine(distinct_str(arr, N));
}
}
|
Javascript
<script>
function compute_hash(str)
{
let p = 31;
let MOD = 1e9 + 7;
let hash_val = 0;
let mul = 1;
for (let i = 0; i < str.length; i++)
{
let ch = str[i];
hash_val = (hash_val + (ch.charCodeAt() - 'a' .charCodeAt() + 1) * mul) % MOD;
mul = (mul * p) % MOD;
}
return hash_val;
}
function distinct_str(arr, n)
{
let hash = new Array(n);
for (let i = 0; i < n; i++)
{
hash[i] = compute_hash(arr[i]);
}
hash.sort( function (a, b){ return a - b});
let cntElem = 1;
for (let i = 1; i < n; i++)
{
if (hash[i] != hash[i - 1])
{
cntElem++;
}
}
return cntElem;
}
let arr = [ "abcde" , "abcce" , "abcdf" , "abcde" ];
let N = arr.length;
document.write(distinct_str(arr, N));
</script>
|
Time Complexity: O(N * M), where M is the maximum length of the string
Auxiliary Space: O(N)
Last Updated :
19 Apr, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...