Open In App

Python – N sized substrings with K distinct characters

Last Updated : 26 Apr, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Given a String, the task is to write a Python program to extract strings of N size and have K distinct characters.

Examples:

Input : test_str = ‘geeksforgeeksforgeeks’, N = 3, K = 2 
Output : [‘gee’, ‘eek’, ‘gee’, ‘eek’, ‘gee’, ‘eek’] 
Explanation : 3 lengths have 2 unique characters are extracted.

Input : test_str = ‘geeksforgeeksforgeeks’, N = 4, K = 2 
Output : [] 
Explanation : No strings with 4 length and just have 2 elements. 

Method #1 : Using slicing + set() + loop

In this we perform task of getting N chunks using slicing, set() is used to check for unique elements. Loop is used to iterate for all the possible chunks.

Python3




# Python3 code to demonstrate working of
# N sized substrings with K distinct characters
# Using slicing + set() + loop
 
# initializing string
test_str = 'geeksforgeeksforgeeks'
 
# printing original string
print("The original string is : " + str(test_str))
 
# initializing N
N = 3
 
# initializing K
K = 2
 
res = []
for idx in range(0, len(test_str) - N + 1):
 
    # getting unique elements off sliced string
    if (len(set(test_str[idx: idx + N])) == K):
        res.append(test_str[idx: idx + N])
 
# printing result
print("Extracted Strings : " + str(res))


Output

The original string is : geeksforgeeksforgeeks
Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']

Time Complexity: O(n*n)
Auxiliary Space: O(n)

Method #2 : Using list comprehension + len() + set() + slicing

Similar to above method, only difference being list comprehension is used instead of loop, just to provide shorthand to solve this task.

Python3




# Python3 code to demonstrate working of
# N sized substrings with K distinct characters
# Using list comprehension + len() + set() + slicing
 
# initializing string
test_str = 'geeksforgeeksforgeeks'
 
# printing original string
print("The original string is : " + str(test_str))
 
# initializing N
N = 3
 
# initializing K
K = 2
 
# list comprehension used to slice
res = [test_str[idx: idx + N]
       for idx in range(0, len(test_str) - N + 1)
       if len(set(test_str[idx: idx + N])) == K]
 
# printing result
print("Extracted Strings : " + str(res))


Output

The original string is : geeksforgeeksforgeeks
Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']

The time and space complexity for all the methods are the same:

Time Complexity: O(n)

Space Complexity: O(n)

Approach 3 : Using Counter and Sliding Window

This approach uses the Counter method from the collection module, which allows us to count the frequency of elements in an iterable. We maintain a sliding window with the length of N and move it to the right, updating the count of characters in the window as we move. If the count of unique characters in the window becomes less than or equal to K, we append the window to the result list.

Python3




# Python3 code to demonstrate working of
# N sized substrings with K distinct characters
# Using Counter and Sliding Window
  
from collections import Counter
  
# initializing string
test_str = 'geeksforgeeksforgeeks'
  
# printing original string
print("The original string is : " + str(test_str))
  
# initializing N
N = 3
  
# initializing K
K = 2
  
res = []
  
# Sliding window of length N
for i in range(len(test_str) - N + 1):
    window = test_str[i:i+N]
    c = Counter(window)
  
    # if count of unique characters in the window <= K
    if len(c) <= K:
        res.append(window)
  
# printing result
print("Extracted Strings : " + str(res))


Output

The original string is : geeksforgeeksforgeeks
Extracted Strings : ['gee', 'eek', 'gee', 'eek', 'gee', 'eek']

Time Complexity: O(n)
Auxiliary Space: O(n)

Approach 4: Using sliding window and hash map

  1. Initialize a dictionary char_count to store the count of characters in the current window.
  2. Initialize two pointers start and end to represent the left and right ends of the window. Also, initialize a variable count to keep track of the number of distinct characters in the window.
  3. Traverse the string from left to right using the end pointer end and do the following:
    a. If the count of the character at the end index is 0 in the dictionary, increment the count variable.
    b. Increment the count of the character at the end index in the dictionary.
    c. If the length of the current window is greater than N, move the start pointer start to the right and do the following:
    i. Decrement the count of the character at the start index in the dictionary.
    ii. If the count of the character at the start index becomes 0 in the dictionary, decrement the count variable.
    d. If the length of the current window is equal to N and the count variable is equal to K, add the current substring to the result list.
  4. Return the result list.

Python3




def n_size_substrings_with_k_distinct_characters(s, N, K):
    res = []
    char_count = {}
    start = 0
    count = 0
    for end in range(len(s)):
        if s[end] not in char_count or char_count[s[end]] == 0:
            count += 1
        char_count[s[end]] = char_count.get(s[end], 0) + 1
        if end - start + 1 > N:
            char_count[s[start]] -= 1
            if char_count[s[start]] == 0:
                count -= 1
            start += 1
        if end - start + 1 == N and count == K:
            res.append(s[start:end+1])
    return res
 
# example usage
test_str = 'geeksforgeeksforgeeks'
print(n_size_substrings_with_k_distinct_characters(test_str, 3, 2))


Output

['gee', 'eek', 'gee', 'eek', 'gee', 'eek']

Time complexity: O(n), where n is the length of the string.
Auxiliary space: O(k), where k is the number of distinct 



Like Article
Suggest improvement
Previous
Next
Share your thoughts in the comments

Similar Reads