Skip to content
Related Articles
Open in App
Not now

Related Articles

Python – Maximum occurring Substring from list

Improve Article
Save Article
Like Article
  • Last Updated : 14 Mar, 2023
Improve Article
Save Article
Like Article

Sometimes, while working with Python strings, we can have a problem in which we need to check for maximum occurring substring from strings list. This can have application in DNA sequencing in Biology and other application. Lets discuss certain way in which this task can be performed.

Method 1 : Using regex() + groupby() + max() + lambda 
The combination of above functionalities can be used to solve this particular problem. In this, we first extract the sequences using regex function. Then the counter grouping is performed using groupby(). The last step is extracting maximum which is done using max() along with lambda function.
 

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using regex() + groupby() + max() + lambda
import re
import itertools
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
 
# Maximum occurring Substring from list
# Using regex() + groupby() + max() + lambda
seqs = re.findall(str.join('|', test_list), test_str)
grps = [(key, len(list(j))) for key, j in itertools.groupby(seqs)]
res = max(grps, key = lambda ele : ele[1])
         
# printing result
print("Maximum frequency substring : " + str(res[0]))

Output : 

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

 

Time complexity: O(n), where n is the length of the input string. The time complexity of regex(), groupby(), and max() is O(n).
Auxiliary space: O(k), where k is the length of the input list. This is the space needed to store the list of substrings. The space complexity of regex(), groupby(), and max() is O(1).

Method 2:  Using count() and max() methods

count() returns the occurrence of particular element in a sequence and max() method returns the maximum of that.

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
res=[]
for i in test_list:
    res.append(test_str.count(i))
x=max(res)
result=test_list[res.index(x)]
# printing result
print("Maximum frequency substring : " + str(result))

Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 3: Using re.findall() + Counter

This is an alternate approach which uses re.findall() and Counter module. In this, we extract the sequence using re.findall() and count the occurrence of each element using Counter() from collections module.

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using re.findall() + Counter
  
# importing modules
import collections
import re
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
  
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
  
# Maximum occurring Substring from list
# Using re.findall() + Counter
seqs = re.findall(str.join('|', test_list), test_str)
res = collections.Counter(seqs).most_common(1)[0][0]
  
# printing result
print("Maximum frequency substring : " + str(res))
#This code is contributed by Edula Vinay Kumar Reddy

Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time Complexity: O(n)
Auxiliary Space: O(n)

Method 4 : Using operator.countOf() and max() methods

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
res=[]
for i in test_list:
    import operator
    res.append(operator.countOf(test_str,i))
x=max(res)
result=test_list[res.index(x)]
# printing result
print("Maximum frequency substring : " + str(result))

Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time Complexity : O(n)
Auxiliary Space : O(n)

Method 5: Using a dictionary to count occurrences

In this approach, we can use a dictionary to count the occurrences of each substring in the list. We can iterate over the string and for each substring in the list, we can count the number of occurrences of that substring in the string and update the count in the dictionary. Finally, we can find the substring with the maximum count in the dictionary.

Step-by-step approach:

  • Initialize an empty dictionary to count the occurrences of substrings.
  • Iterate over the string using a for loop.
  • For each substring in the list, find the number of occurrences of that substring in the string using the count() method and update the count in the dictionary.
  • Find the substring with the maximum count in the dictionary.
  • Return the maximum frequency substring.

Python3




# Python3 code to demonstrate working of
# Maximum occurring Substring from list
# Using dictionary
 
# initializing string
test_str = "gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb"
test_list = ['gfg', 'is', 'best']
 
# printing original string and list
print("The original string is : " + test_str)
print("The original list is : " + str(test_list))
 
# Maximum occurring Substring from list
# Using dictionary
count_dict = {}
for sub in test_list:
    count_dict[sub] = test_str.count(sub)
res = max(count_dict, key=count_dict.get)
 
# printing result
print("Maximum frequency substring : " + str(res))

Output

The original string is : gfghsisbjknlmkesbestgfgsdcngfgcsdjnisdjnlbestdjsklgfgcdsbestbnjdsgfgdbhisbhsbestdkgfgb
The original list is : ['gfg', 'is', 'best']
Maximum frequency substring : gfg

Time complexity: O(n*m), where n is the length of the string and m is the total number of substrings in the list.
Auxiliary space: O(m), where m is the total number of substrings in the list.


My Personal Notes arrow_drop_up
Like Article
Save Article
Related Articles

Start Your Coding Journey Now!