Python | Substring Frequency between Uppercases

Last Updated : 25 Apr, 2023

Sometimes while working with Strings, we can have a problem in which we have to find substrings that occur between the uppercases and find their frequencies. This is a very unique problem and has less of applications. Let us discuss certain ways in which this task can be performed.

Method #1 : Using groupby() + regex + loop

The combination of the above functions can be used to perform this task. In this, we use groupby() to group the regex-extracted substrings. And all is compiled using loop.

Python3

# Python3 code to demonstrate working of
# Substring Frequency between Uppercases
# Using groupby() + regex + loop
 
from itertools import groupby
import re
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
# Using groupby() + regex + loop
res = {}
for i, j in groupby(re.findall(r'[A-Z][a-z]*\d*', test_str)):
    res[i] = res[i] + 1 if i in res.keys() else 1
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Method #2: Using dictionary comprehension + sorted() + groupby() + regex

The combination of the above functions can be used to perform this task. In this, the task of cumulation is performed using dictionary comprehension.

Python3

# Python3 code to demonstrate working of
# Substring Frequency between Uppercases
# Using dictionary comprehension + sorted() + groupby() + regex
 
from itertools import groupby
import re
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
# Using dictionary comprehension + sorted() + groupby() + regex
res = {i: len(list(j)) for i, j in groupby(
    sorted(re.findall(r'[A-Z][a-z]*\d*', test_str)))}
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ek': 2, 'For': 1, 'Ge': 2, 'S': 2}

Time Complexity: O(n)
Auxiliary Space: O(n)

Method #3 : Using isupper(),split(),list(),set(),remove(),count() methods

Approach:

Replace uppercase character with “*”+uppercase character(using isupper())
Split on * that will result in a list(using split())
Create a new list with unique strings of above list(using list(),set())
Create a dictionary with unique strings as key and its count in original list. (using count() method)

Example:

Python3

# Python3 code to demonstrate working of
# Substring Frequency between Uppercases
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
res = dict()
x = ""
for i in test_str:
    if i.isupper():
        x += "*"+i
    else:
        x += i
 
y = x.split("*")
y.remove('')
 
z = list(set(y))
 
for i in z:
    res[i] = y.count(i)
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'S': 2, 'Ge': 2, 'Ek': 2, 'For': 1}

Time Complexity : O(N)
Auxiliary Space : O(N)

Method #4: Using regular expression and Counter()

A regular expression pattern is defined to match all the substrings that start with an uppercase letter followed by zero or more lowercase letters.
The re.findall() method is used to find all the matches of the pattern in the given string. This method returns a list of all the matches.
The Counter() method is used to count the frequency of each match in the list.
The resulting dictionary of the Counter() method is the desired output.

Example:

Python3

import re
from collections import Counter
 
# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
pattern = r'[A-Z][a-z]*'
matches = re.findall(pattern, test_str)
res = dict(Counter(matches))
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Time complexity: O(n), where n is the length of the input string.
Auxiliary space: O(n), where n is the length of the input string.

Method #5: Using a for loop and string slicing

Initialize an empty dictionary res to store the substring frequencies.
Initialize a variable start to 0.
Loop through each character in the test_str starting from the second character.
If the current character is uppercase, slice the substring between start and the current index (exclusive) and add it to the res dictionary. Update the start variable to the current index.
After the loop, slice the substring between the last uppercase letter and the end of the string and add it to the res dictionary.
Print the res dictionary.

Example:

Python3

# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# Substring Frequency between Uppercases
res = {}
start = 0
for i in range(1, len(test_str)):
    if test_str[i].isupper():
        substr = test_str[start:i]
        if substr in res:
            res[substr] += 1
        else:
            res[substr] = 1
        start = i
 
substr = test_str[start:]
if substr in res:
    res[substr] += 1
else:
    res[substr] = 1
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Time complexity: O(n), where n is the length of test_str.
Auxiliary space: O(k), where k is the number of unique substrings between uppercase letters in test_str.

Method #6: Using a list comprehension and zip

Steps:

Create a list comprehension that uses zip to iterate over the string and create substrings between uppercase letters.
Use a for loop to iterate over the list of substrings.
Use the count method to count the occurrences of the current substring in the original string.
Add the substring and its count to a dictionary.
Print the resulting dictionary.

Example:

Python3

# initializing string
test_str = "GeEkSForGeEkS"
 
# printing original string
print("The original string is : " + test_str)
 
# create a list of substrings between uppercase letters using a list comprehension and zip
substr_list = [test_str[i:j] for i, j in zip([0] + [j
                                                    for j in range(1, len(test_str))
                                                    if test_str[j].isupper()],
                                             [j for j in range(1, len(test_str)) if test_str[j].isupper()] + [None])]
 
# create a dictionary to store the counts of each substring
res = {}
for substr in substr_list:
    res[substr] = test_str.count(substr)
 
# printing result
print("The grouped Substring Frequency : " + str(res))

Output

The original string is : GeEkSForGeEkS
The grouped Substring Frequency : {'Ge': 2, 'Ek': 2, 'S': 2, 'For': 1}

Time complexity: O(n^2) (due to the use of the count method)
Auxiliary space: O(k) (where k is the number of distinct substrings)

Suggest improvement

Python - Sort by Uppercase Frequency

Share your thoughts in the comments

Python | Substring Frequency between Uppercases

Python3

Python3

Python3

Python3

Python3

Python3

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?