Count Distinct Subsequences
Last Updated :
29 Dec, 2023
Given a string, find the count of distinct subsequences of it.
Examples:
Input: str = “gfg”
Output: 7
Explanation: The seven distinct subsequences are “”, “g”, “f”, “gf”, “fg”, “gg” and “gfg”
Input: str = “ggg”
Output: 4
Explanation: The four distinct subsequences are “”, “g”, “gg” and “ggg”
The problem of counting distinct subsequences is easy if all characters of input string are distinct. The count is equal to nC0 + nC1 + nC2 + … nCn = 2n.
How to count distinct subsequences when there can be repetition in input string?
A Simple Solution to count distinct subsequences in a string with duplicates is to generate all subsequences. For every subsequence, store it in a hash table if it doesn’t exist already. The time complexity of this solution is exponential and it requires exponential extra space.
Method 1(Naive Approach): Using a set (without Dynamic Programming)
Approach: Generate all the possible subsequences of a given string. The subsequences of a string can be generated in the following manner:
- Include a particular element(say ith) in the output array and recursively call the function for the rest of the input string. This results in the subsequences of a string having ith character.
- Exclude a particular element(say ith) and recursively call the function for the rest of the input string. This contains all the subsequences which don’t have the ith character.
Once we have generated a subsequence, in the base case of the function we insert that generated subsequence in an unordered set. An unordered Set is a Data structure, that stores distinct elements in an unordered manner. This way we insert all the generated subsequences in the set and print the size of the set as our answer because at last, the set will contain only distinct subsequences.
Implementation:
C++
#include <bits/stdc++.h>
using namespace std;
unordered_set<string> sn;
void subsequences( char s[], char op[], int i, int j)
{
if (s[i] == '\0' ) {
op[j] = '\0' ;
sn.insert(op);
return ;
}
else {
op[j] = s[i];
subsequences(s, op, i + 1, j + 1);
subsequences(s, op, i + 1, j);
return ;
}
}
int main()
{
char str[] = "ggg" ;
int m = sizeof (str) / sizeof ( char );
int n = pow (2, m) + 1;
char op[m+1];
subsequences(str, op, 0, 0);
cout << sn.size();
sn.clear();
return 0;
}
|
Java
import java.io.*;
import java.lang.Math;
import java.util.*;
class GFG {
public static void subsequences(Set<String> sn,
char [] s, char [] op,
int i, int j, int n)
{
if (i == n) {
op[j] = '\0' ;
sn.add(String.valueOf(op));
return ;
}
else {
op[j] = s[i];
subsequences(sn, s, op, i + 1 , j + 1 , n);
subsequences(sn, s, op, i + 1 , j, n);
return ;
}
}
public static void main(String[] args)
{
char [] str = { 'g' , 'g' , 'g' };
int m = str.length;
int n = ( int )Math.pow( 2 , m) + 1 ;
Set<String> sn = new HashSet<String>();
char [] op = new char [m + 1 ];
subsequences(sn, str, op, 0 , 0 , m);
System.out.println(sn.size());
sn.clear();
}
}
|
Python3
import math
sn = []
global m
m = 0
def subsequences(s, op, i, j):
if (i = = m):
op[j] = None
temp = "".join([i for i in op if i])
sn.append(temp)
return
else :
op[j] = s[i]
subsequences(s, op,
i + 1 , j + 1 )
subsequences(s, op,
i + 1 , j)
return
str = "ggg"
m = len ( str )
n = int (math. pow ( 2 , m) + 1 )
op = [ None for i in range (n)]
subsequences( str , op, 0 , 0 )
print ( len ( set (sn)))
|
C#
using System;
using System.Collections.Generic;
class GFG
{
public static void subsequences(HashSet< string > sn,
char [] s, char [] op,
int i, int j, int n)
{
if (i == n) {
op[j] = '\0' ;
sn.Add( string .Join( "" , op));
return ;
}
else {
op[j] = s[i];
subsequences(sn, s, op, i + 1, j + 1, n);
subsequences(sn, s, op, i + 1, j, n);
return ;
}
}
public static void Main( string [] args)
{
char [] str = { 'g' , 'g' , 'g' };
int m = str.Length;
int n = ( int )Math.Pow(2, m) + 1;
HashSet< string > sn = new HashSet< string >();
char [] op = new char [m + 1];
subsequences(sn, str, op, 0, 0, m);
Console.WriteLine(sn.Count);
}
}
|
Javascript
<script>
let sn = new Set();
let m = 0;
function subsequences(s, op, i, j)
{
if (i == m) {
op[j] = '\0' ;
sn.add(op.join( "" ));
return ;
}
else
{
op[j] = s[i];
subsequences(s, op, i + 1, j + 1);
subsequences(s, op, i + 1, j);
return ;
}
}
let str= "ggg" ;
m = str.length;
let n = Math.pow(2, m) + 1;
let op= new Array(n);
subsequences(str, op, 0, 0);
document.write(sn.size);
</script>
|
Time Complexity: O(2^n)
Auxiliary Space: O(2^n)
where n is the length of the string.
Method 2(Efficient Approach): Using Dynamic Programming
An Efficient Solution doesn’t require the generation of subsequences.
Let countSub(n) be count of subsequences of
first n characters in input string. We can
recursively write it as below.
countSub(n) = 2*Count(n-1) - Repetition
If current character, i.e., str[n-1] of str has
not appeared before, then
Repetition = 0
Else:
Repetition = Count(m)
Here m is index of previous occurrence of
current character. We basically remove all
counts ending with previous occurrence of
current character.
How does this work?
If there are no repetitions, then count becomes double of count for n-1 because we get count(n-1) more subsequences by adding current character at the end of all subsequences possible with n-1 length.
If there are repetitions, then we find a count of all distinct subsequences ending with the previous occurrence. This count can be obtained by recursively calling for an index of the previous occurrence.
Since the above recurrence has overlapping subproblems, we can solve it using Dynamic Programming.
Below is the implementation of the above idea.
C++
#include <bits/stdc++.h>
using namespace std;
const int MAX_CHAR = 256;
int countSub(string str)
{
vector< int > last(MAX_CHAR, -1);
int n = str.length();
int dp[n + 1];
dp[0] = 1;
for ( int i = 1; i <= n; i++) {
dp[i] = 2 * dp[i - 1];
if (last[str[i - 1]] != -1)
dp[i] = dp[i] - dp[last[str[i - 1]]];
last[str[i - 1]] = (i - 1);
}
return dp[n];
}
int main()
{
cout << countSub( "gfg" );
return 0;
}
|
Java
import java.util.ArrayList;
import java.util.Arrays;
public class Count_Subsequences {
static final int MAX_CHAR = 256 ;
static int countSub(String str)
{
int [] last = new int [MAX_CHAR];
Arrays.fill(last, - 1 );
int n = str.length();
int [] dp = new int [n + 1 ];
dp[ 0 ] = 1 ;
for ( int i = 1 ; i <= n; i++) {
dp[i] = 2 * dp[i - 1 ];
if (last[( int )str.charAt(i - 1 )] != - 1 )
dp[i] = dp[i] - dp[last[( int )str.charAt(i - 1 )]];
last[( int )str.charAt(i - 1 )] = (i - 1 );
}
return dp[n];
}
public static void main(String args[])
{
System.out.println(countSub( "gfg" ));
}
}
|
Python3
MAX_CHAR = 256
def countSub(ss):
last = [ - 1 for i in range (MAX_CHAR + 1 )]
n = len (ss)
dp = [ - 2 for i in range (n + 1 )]
dp[ 0 ] = 1
for i in range ( 1 , n + 1 ):
dp[i] = 2 * dp[i - 1 ]
if last[ ord (ss[i - 1 ])] ! = - 1 :
dp[i] = dp[i] - dp[last[ ord (ss[i - 1 ])]]
last[ ord (ss[i - 1 ])] = i - 1
return dp[n]
print (countSub( "gfg" ))
|
C#
using System;
public class Count_Subsequences {
static readonly int MAX_CHAR = 256;
static int countSub(String str)
{
int [] last = new int [MAX_CHAR];
for ( int i = 0; i < MAX_CHAR; i++)
last[i] = -1;
int n = str.Length;
int [] dp = new int [n + 1];
dp[0] = 1;
for ( int i = 1; i <= n; i++) {
dp[i] = 2 * dp[i - 1];
if (last[( int )str[i - 1]] != -1)
dp[i] = dp[i] - dp[last[( int )str[i - 1]]];
last[( int )str[i - 1]] = (i - 1);
}
return dp[n];
}
public static void Main(String[] args)
{
Console.WriteLine(countSub( "gfg" ));
}
}
|
Javascript
<script>
let MAX_CHAR = 256;
function countSub(str)
{
let last = new Array(MAX_CHAR);
last.fill(-1);
let n = str.length;
let dp = new Array(n + 1);
dp[0] = 1;
for (let i = 1; i <= n; i++)
{
dp[i] = 2 * dp[i - 1];
if (last[str[i - 1].charCodeAt()] != -1)
dp[i] = dp[i] - dp[last[str[i - 1].charCodeAt()]];
last[str[i - 1].charCodeAt()] = (i - 1);
}
return dp[n];
}
document.write(countSub( "gfg" ));
</script>
|
Time Complexity: O(n)
Auxiliary Space: O(n)
Method 3: Using Map
Idea:
Let’s say we have 2 variables : `allCount` which adds up total distinct subsequence count and `levelCount` which stores the count of subsequences ending at index i. To find repetitions we will store the most recent levelCount for each character. Finally we will see how we can determine `allCount` using the `levelCount` variable.
Below is the steps to solve the problem:
- Declare a map .
- Start a loop to iterate through the characters of the input string s.
- Inside the loop, when i (the current index) is 0, this is the first character in the string.
- Set allCount to 1 since the first character is always unique.
- Update the map mp with the index 1 for the first character c.
- For characters at positions other than 0:
- Calculate the current levelCount as allCount + 1, representing the number of unique substrings at the current level.
-
- If char c is not present in map, it means the character is new (has not been seen before in this substring). In this case:
- Increment allCount by levelCount to account for the new character.
- If char c is present in map, it means the character has been seen before in this substring. In this case:
- Adjust allCount by adding levelCount – mp to account for the fact that some substrings may have been counted already.
- Update the map mp with the current levelCount for the character c since this is the latest level of uniqueness.
C++
#include <iostream>
#include <map>
using namespace std;
int countSub(string s) {
map< char , int > mp;
int n = s.size();
int allCount = 0, levelCount = 0;
for ( int i = 0; i < n; i++) {
char c = s[i];
if (i == 0) {
allCount = levelCount = 1;
mp = 1;
continue ;
}
levelCount = allCount + 1;
if (mp.find(c) == mp.end()) {
allCount += levelCount;
} else {
allCount += levelCount - mp;
}
mp = levelCount;
}
return allCount;
}
int main() {
string list[] = { "abab" , "gfg" };
for (string s : list) {
int cnt = countSub(s);
int withEmptyString = cnt + 1;
cout << "With empty string count for " << s << " is " << withEmptyString << endl;
cout << "Without empty string count for " << s << " is " << cnt << endl;
}
return 0;
}
|
Java
import java.io.*;
import java.util.*;
class SubsequenceCount
{
public static int countSub(String s)
{
HashMap<Character,
Integer> map = new HashMap<Character,
Integer>();
for ( int i = 0 ; i < s.length(); i++)
{
map.put(s.charAt(i), - 1 );
}
int allCount = 0 ;
int levelCount = 0 ;
for ( int i= 0 ;i<s.length();i++)
{
char c = s.charAt(i);
if (i== 0 )
{
allCount = 1 ;
map.put(c, 1 );
levelCount = 1 ;
continue ;
}
levelCount = allCount + 1 ;
if (map.get(c)< 0 )
{
allCount = allCount + levelCount;
}
else
{
allCount = allCount + levelCount - map.get(c);
}
map.put(c,levelCount);
}
return allCount;
}
public static void main(String[] args)
{
List<String> list = Arrays.asList( "abab" , "gfg" );
for (String s : list)
{
int cnt = countSub(s);
int withEmptyString = cnt+ 1 ;
System.out.println( "With empty string count for " +
s + " is " + withEmptyString);
System.out.println( "Without empty string count for " +
s + " is " + cnt);
}
}
}
|
Python3
def count_sub(s):
mp = {}
n = len (s)
all_count = level_count = 0
for i in range (n):
c = s[i]
if i = = 0 :
all_count = mp = level_count = 1
continue
level_count = all_count + 1
if c not in mp:
all_count + = level_count
else :
all_count + = level_count - mp
mp = level_count
return all_count
if __name__ = = "__main__" :
strings = [ "abab" , "gfg" ]
for s in strings:
cnt = count_sub(s)
with_empty_string = cnt + 1
print (f "With empty string count for {s} is {with_empty_string}" )
print (f "Without empty string count for {s} is {cnt}" )
|
C#
using System;
using System.Collections.Generic;
class GFG
{
public static int countSub( string s)
{
Dictionary< char , int > map = new Dictionary< char , int >();
for ( int i = 0; i < s.Length; i++)
{
if (!map.ContainsKey(s[i]))
{
map.Add(s[i], -1);
}
}
int allCount = 0;
int levelCount = 0;
for ( int i = 0; i < s.Length; i++)
{
char c = s[i];
if (i == 0)
{
allCount = 1;
if (!map.ContainsKey(c))
{
map.Add(c, 1);
}
else
{
map = 1;
}
levelCount = 1;
continue ;
}
levelCount = allCount + 1;
if (map < 0)
{
allCount = (allCount + levelCount);
}
else
{
allCount = (allCount + levelCount - map);
}
if (!map.ContainsKey(c))
{
map.Add(c, levelCount);
}
else
{
map = levelCount;
}
}
return allCount;
}
static void Main()
{
List< string > list = new List< string >();
list.Add( "abab" );
list.Add( "gfg" );
foreach ( string s in list)
{
int cnt = countSub(s);
int withEmptyString = cnt + 1;
Console.WriteLine( "With empty string count for " +
s + " is " + withEmptyString);
Console.WriteLine( "Without empty string count for " +
s + " is " + cnt);
}
}
}
|
Javascript
function countSub(s)
{
let map = new Map();
for (let i = 0; i < s.length; i++)
{
map.set(s[i], -1);
}
let allCount = 0;
let levelCount = 0;
for (let i=0;i<s.length;i++)
{
let c = s[i];
if (i==0)
{
allCount = 1;
map.set(c,1);
levelCount = 1;
continue ;
}
levelCount = allCount + 1;
if (map.get(c)<0)
{
allCount = allCount + levelCount;
}
else
{
allCount = allCount + levelCount - map.get(c);
}
map.set(c,levelCount);
}
return allCount;
}
let list=[ "abab" , "gfg" ];
for (let i=0;i<list.length;i++)
{
let cnt = countSub(list[i]);
let withEmptyString = cnt+1;
console.log( "With empty string count for " +
list[i] + " is " + withEmptyString);
console.log( "Without empty string count for " +
list[i] + " is " + cnt);
}
|
Output
With empty string count for abab is 12
Without empty string count for abab is 11
With empty string count for gfg is 7
Without empty string count for gfg is 6
Time Complexity: O(n)
Space Complexity: O(1)
Share your thoughts in the comments
Please Login to comment...