Given a string, find the count of distinct subsequences of it.
Input : str = "gfg" Output : 7 The seven distinct subsequences are "", "g", "f", "gf", "fg", "gg" and "gfg" Input : str = "ggg" Output : 4 The four distinct subsequences are "", "g", "gg" and "ggg"
The problem of counting distinct subsequences is easy if all characters of input string are distinct. The count is equal to nC0 + nC1 + nC2 + … nCn = 2n.
How to count distinct subsequences when there can be repetition in input string?
A Simple Solution to count distinct subsequences in a string with duplicates is to generate all subsequences. For every subsequence, store it in a hash table if it doesn’t exist already. Time complexity of this solution is exponential and it requires exponential extra space.
Method 1(Naive Approach): Using a set (without Dynamic Programming)
Approach : Generate all the possible subsequences of a given string. The subsequences of a string can be generated in the following manner:
a) Include a particular element(say ith) in the output array and recursively call the function for the rest of the input string. This results in the subsequences of a string having ith character.
b) Exclude a particular element(say ith) and recursively call the function for the rest of the input string. This contains all the subsequences which don’t have the ith character.
Once we have generated a subsequence, in the base case of the function we insert that generated subsequence in an unordered set. Unordered Set is a Data structure, that stores distinct elements in an unordered manner. This way we insert all the generated subsequences in the set and print the size of the set as our answer because at last, the set will contain only distinct subsequences.
Time Complexity : O(2^n)
Auxillary Space : O(n)
where n is the length of the string.
Method 2(Efficient Approach): Using Dynamic Programming
An Efficient Solution doesn’t require the generation of subsequences.
Let countSub(n) be count of subsequences of first n characters in input string. We can recursively write it as below. countSub(n) = 2*Count(n-1) - Repetition If current character, i.e., str[n-1] of str has not appeared before, then Repetition = 0 Else: Repetition = Count(m) Here m is index of previous occurrence of current character. We basically remove all counts ending with previous occurrence of current character.
How does this work?
If there are no repetitions, then count becomes double of count for n-1 because we get count(n-1) more subsequences by adding current character at the end of all subsequences possible with n-1 length.
If there repetitions, then we find count of all distinct subsequences ending with previous occurrence. This count can be obtained be recursively calling for index of previous occurrence.
Since above recurrence has overlapping subproblems, we can solve it using Dynamic Programming.
Below is the implementation of above idea.
Time Complexity : O(n)
Auxiliary Space : O(n)
- Count of subsequences having maximum distinct elements
- Generating distinct subsequences of a given string in lexicographic order
- Count all subsequences having product less than K
- Count all increasing subsequences
- Count of 'GFG' Subsequences in the given string
- Find the count of subsequences where each element is divisible by K
- Count of AP (Arithmetic Progression) Subsequences in an array
- Count the number of subsequences of length k having equal LCM and HCF
- Count number of increasing subsequences of size k
- Count the number of contiguous increasing and decreasing subsequences in a sequence
- Count minimum number of subsets (or subsequences) with consecutive numbers
- Count distinct occurrences as a subsequence
- Count number of substrings with exactly k distinct characters
- Count number of distinct substrings of a given length
- Count distinct substrings that contain some characters at most k times