Count Distinct Subsequences

Given a string, find the count of distinct subsequences of it.

Examples:

Input  : str = "gfg"
Output : 7
The seven distinct subsequences are "", "g", "f",
"gf", "fg", "gg" and "gfg" 

Input  : str = "ggg"
Output : 4
The four distinct subsequences are "", "g", "gg"
and "ggg" 

The problem of counting distinct subsequences is easy if all characters of input string are distinct. The count is equal to nC0 + nC1 + nC2 + … nCn = 2n.

How to count distinct subsequences when there can be repetition in input string?
A Simple Solution to count distinct subsequences in a string with duplicates is to generate all subsequences. For every subsequence, store it in a hash table if it doesn’t exist already. Time complexity of this solution is exponential and it requires exponential extra space.



Method 1(Naive Approach): Using a set (without Dynamic Programming)

Approach : Generate all the possible subsequences of a given string. The subsequences of a string can be generated in the following manner:
a) Include a particular element(say ith) in the output array and recursively call the function for the rest of the input string. This results in the subsequences of a string having ith character.
b) Exclude a particular element(say ith) and recursively call the function for the rest of the input string. This contains all the subsequences which don’t have the ith character.

Once we have generated a subsequence, in the base case of the function we insert that generated subsequence in an unordered set. Unordered Set is a Data structure, that stores distinct elements in an unordered manner. This way we insert all the generated subsequences in the set and print the size of the set as our answer because at last, the set will contain only distinct subsequences.

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program to print distinct
// subsequences of a given string
#include <bits/stdc++.h>
using namespace std;
  
// Create an empty set to store the subsequences
unordered_set<string> sn;
  
// Function for generating the subsequences
void subsequences(char s[], char op[], int i, int j)
{
  
    // Base Case
    if (s[i] == '\0') {
        op[j] = '\0';
  
        // Insert each generated
        // subsequence into the set
        sn.insert(op);
        return;
    }
  
    // Recursive Case
    else {
        // When a particular character is taken
        op[j] = s[i];
        subsequences(s, op, i + 1, j + 1);
  
        // When a particular character isn't taken
        subsequences(s, op, i + 1, j);
        return;
    }
}
  
// Driver Code
int main()
{
    char str[] = "ggg";
    int m = sizeof(str) / sizeof(char);
    int n = pow(2, m) + 1;
  
    // Output array for storing
    // the generating subsequences
    // in each call
    char op[n];
  
    // Function Call
    subsequences(str, op, 0, 0);
  
    // Output will be the number
    // of elements in the set
    cout << sn.size();
    sn.clear();
    return 0;
  
    // This code is contributed by Kishan Mishra
}

chevron_right


Output:

4

Time Complexity : O(2^n)
Auxillary Space : O(n)
where n is the length of the string.

Method 2(Efficient Approach): Using Dynamic Programming

An Efficient Solution doesn’t require the generation of subsequences.

Let countSub(n) be count of subsequences of 
first n characters in input string. We can
recursively write it as below. 

countSub(n) = 2*Count(n-1) - Repetition

If current character, i.e., str[n-1] of str has
not appeared before, then 
   Repetition = 0

Else:
   Repetition  =  Count(m)
   Here m is index of previous occurrence of
   current character. We basically remove all
   counts ending with previous occurrence of
   current character.

How does this work?
If there are no repetitions, then count becomes double of count for n-1 because we get count(n-1) more subsequences by adding current character at the end of all subsequences possible with n-1 length.
If there repetitions, then we find count of all distinct subsequences ending with previous occurrence. This count can be obtained be recursively calling for index of previous occurrence.

Since above recurrence has overlapping subproblems, we can solve it using Dynamic Programming.

Below is the implementation of above idea.

C++

filter_none

edit
close

play_arrow

link
brightness_4
code

// C++ program to count number of distinct
// subsequences of a given string.
#include <bits/stdc++.h>
using namespace std;
const int MAX_CHAR = 256;
  
// Returns count of distinct sunsequences of str.
int countSub(string str)
{
    // Create an array to store index
    // of last
    vector<int> last(MAX_CHAR, -1);
  
    // Length of input string
    int n = str.length();
  
    // dp[i] is going to store count of distinct
    // subsequences of length i.
    int dp[n + 1];
  
    // Empty substring has only one subsequence
    dp[0] = 1;
  
    // Traverse through all lengths from 1 to n.
    for (int i = 1; i <= n; i++) {
        // Number of subsequences with substring
        // str[0..i-1]
        dp[i] = 2 * dp[i - 1];
  
        // If current character has appeared
        // before, then remove all subsequences
        // ending with previous occurrence.
        if (last[str[i - 1]] != -1)
            dp[i] = dp[i] - dp[last[str[i - 1]]];
  
        // Mark occurrence of current character
        last[str[i - 1]] = (i - 1);
    }
  
    return dp[n];
}
  
// Driver code
int main()
{
    cout << countSub("gfg");
    return 0;
}

chevron_right


Java

filter_none

edit
close

play_arrow

link
brightness_4
code

// Java program to count number of distinct
// subsequences of a given string.
import java.util.ArrayList;
import java.util.Arrays;
public class Count_Subsequences {
  
    static final int MAX_CHAR = 256;
  
    // Returns count of distinct sunsequences of str.
    static int countSub(String str)
    {
        // Create an array to store index
        // of last
        int[] last = new int[MAX_CHAR];
        Arrays.fill(last, -1);
  
        // Length of input string
        int n = str.length();
  
        // dp[i] is going to store count of distinct
        // subsequences of length i.
        int[] dp = new int[n + 1];
  
        // Empty substring has only one subsequence
        dp[0] = 1;
  
        // Traverse through all lengths from 1 to n.
        for (int i = 1; i <= n; i++) {
            // Number of subsequences with substring
            // str[0..i-1]
            dp[i] = 2 * dp[i - 1];
  
            // If current character has appeared
            // before, then remove all subsequences
            // ending with previous occurrence.
            if (last[(int)str.charAt(i - 1)] != -1)
                dp[i] = dp[i] - dp[last[(int)str.charAt(i - 1)]];
  
            // Mark occurrence of current character
            last[(int)str.charAt(i - 1)] = (i - 1);
        }
  
        return dp[n];
    }
  
    // Driver code
    public static void main(String args[])
    {
        System.out.println(countSub("gfg"));
    }
}
// This code is contributed by Sumit Ghosh

chevron_right


Python3

filter_none

edit
close

play_arrow

link
brightness_4
code

# Python3 program to count number of 
# distinct subseqences of a given string
  
MAX_CHAR = 256
  
def countSub(ss):
  
    # create an array to store index of last
    last = [-1 for i in range(MAX_CHAR + 1)]
      
    # length of input string
    n = len(ss)
      
    # dp[i] is going to store count of 
    # discount subsequence of length of i
    dp = [-2 for i in range(n + 1)]
       
    # empty substring has only 
    # one subseqence
    dp[0] = 1
      
    # Traverse through all lengths
    # from 1 to n 
    for i in range(1, n + 1):
          
        # number of subseqence with 
        # substring str[0...i-1]
        dp[i] = 2 * dp[i - 1]
  
        # if current character has appeared
        # before, then remove all subseqences
        # ending with previous occurrence.
        if last[ord(ss[i - 1])] != -1:
            dp[i] = dp[i] - dp[last[ord(ss[i - 1])]]
        last[ord(ss[i - 1])] = i - 1
      
    return dp[n]
      
# Driver code
print(countSub("gfg"))
  
# This code is contributed 
# by mohit kumar 29

chevron_right


C#

filter_none

edit
close

play_arrow

link
brightness_4
code

// C# program to count number of distinct
// subsequences of a given string.
using System;
  
public class Count_Subsequences {
  
    static readonly int MAX_CHAR = 256;
  
    // Returns count of distinct sunsequences of str.
    static int countSub(String str)
    {
        // Create an array to store index
        // of last
        int[] last = new int[MAX_CHAR];
  
        for (int i = 0; i < MAX_CHAR; i++)
            last[i] = -1;
  
        // Length of input string
        int n = str.Length;
  
        // dp[i] is going to store count of
        // distinct subsequences of length i.
        int[] dp = new int[n + 1];
  
        // Empty substring has only one subsequence
        dp[0] = 1;
  
        // Traverse through all lengths from 1 to n.
        for (int i = 1; i <= n; i++) {
            // Number of subsequences with substring
            // str[0..i-1]
            dp[i] = 2 * dp[i - 1];
  
            // If current character has appeared
            // before, then remove all subsequences
            // ending with previous occurrence.
            if (last[(int)str[i - 1]] != -1)
                dp[i] = dp[i] - dp[last[(int)str[i - 1]]];
  
            // Mark occurrence of current character
            last[(int)str[i - 1]] = (i - 1);
        }
        return dp[n];
    }
  
    // Driver code
    public static void Main(String[] args)
    {
        Console.WriteLine(countSub("gfg"));
    }
}
  
// This code is contributed 29AjayKumar

chevron_right



Output:

7

Time Complexity : O(n)
Auxiliary Space : O(n)

Don’t stop now and take your learning to the next level. Learn all the important concepts of Data Structures and Algorithms with the help of the most trusted course: DSA Self Paced. Become industry ready at a student-friendly price.




My Personal Notes arrow_drop_up