Python | Duplicate substring removal from list

Last Updated : 05 Apr, 2023

Sometimes we can come to the problem in which we need to deal with certain strings in a list that are separated by some separator and we need to remove the duplicates in each of these kinds of strings. Simple shorthands to solve this kind of problem is always good to have. Let’s discuss certain ways in which this can be done.

Method #1: Using split() and for loops

Python3

# Python3 code to demonstrate
# removing duplicate substrings
 
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
 
# removing duplicate substrings
res = []
for i in test_list:
    x=i.split("-")
    a=[]
    for j in x:
        if j not in a:
            a.append(j)
    res.append(a)
 
# print result
print("The list after duplicate removal : " + str(res))

Output

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]

Time Complexity: O(n*n), where n is the length of the input list. This is because we’re using the split() and for loops which has a time complexity of O(n*n) in the worst case.
Auxiliary Space: O(n), as we’re using additional space res other than the input list itself with the same size of input list

Method #2: Using set() + split() This particular problem can be solved using the split function to have target string and then set that actually would remove the duplicacy from the string.

Python3

# Python3 code to demonstrate
# removing duplicate substrings
# using set() + split()
 
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
# using set() + split()
# removing duplicate substrings
res = [set(sub.split('-')) for sub in test_list]
 
# print result
print("The list after duplicate removal : " + str(res))

Output :

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [{'aa', 'bb'}, {'cc', 'bb'}, {'gg', 'ff'}, {'hh'}]

Method #3: Using {} + split() + list comprehension
For the cases in which we require to fully segregate the strings as a separate component, we can use these set of methods to achieve this task. The curly braces convert to set and rest all the functionality is similar to method above.

Python3

# Python3 code to demonstrate
# removing duplicate substrings
# using {} + split() + list comprehension
 
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
# using {} + split() + list comprehension
# removing duplicate substrings
res = list({i for sub in test_list for i in sub.split('-')})
 
# print result
print("The list after duplicate removal : " + str(res))

Output :

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : ['cc', 'ff', 'aa', 'hh', 'gg', 'bb']

Method #4:Using Counter() function

Python3

# Python3 code to demonstrate
# removing duplicate substrings
from collections import Counter
# initializing list
test_list = ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
 
# removing duplicate substrings
res = []
for i in test_list:
    x = i.split("-")
    freq = Counter(x)
    tempresult = []
    for j in x:
        if freq[j] > 0:
            tempresult.append(j)
            freq[j] = 0
    res.append(tempresult)
 
# print result
print("The list after duplicate removal : " + str(res))

Output

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]

Method#5: Using Recursive method.

Python3

# Recursive function to remove duplicate substrings
def remove_duplicates(substrings):
    if not substrings:
        return []
    result = []
    for substring in substrings:
        if substring not in result:
            result.append(substring)
    return result
 
 
 
# Initialize the list of strings
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
 
# Split each string into substrings and remove duplicates
result = [remove_duplicates(string.split("-")) for string in test_list]
 
 
# print result
print("The list after duplicate removal : " + str(result))
#this code contributed by tvsk

Output

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]

Time Complexity: O(n)

Space Complexity: O(n)

Method#6: Using list comprehension and set():

Python3

# Python3 code to demonstrate
# removing duplicate substrings
# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
# printing original list
print("The original list : " + str(test_list))
# removing duplicate substrings
res = [list(set(i.split("-"))) for i in test_list]
# print result
print("The list after duplicate removal : " + str(res))
 
#This code is contributed by Jyothi Pinjala

Output

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['cc', 'bb'], ['gg', 'ff'], ['hh']]

Time Complexity: O(n)

Space Complexity: O(n)

Method#7:Using dict.fromkeys()

The given code removes duplicate substrings in each string of a list by splitting each string by the “-” character and using a dictionary to remove duplicates.

Here’s a step-by-step explanation of the algorithm:

Initialize a list of strings test_list.
Initialize an empty list res to store the modified strings.
Loop through each string s in test_list using a for loop.
Split the string s by the “-” character using the split() function, and create a list of the resulting substrings.
Convert the list to a dictionary using the dict() function, which automatically removes duplicates because dictionaries cannot have duplicate keys.
Convert the dictionary back to a list using the list() function to get the unique substrings.
Append the list of unique substrings to the res list.
After the loop, return res.

Python3

# initializing list
test_list = [ 'aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
# removing duplicate substrings
res = []
for s in test_list:
    res.append(list(dict.fromkeys(s.split("-"))))
 
# print result
print("The list after duplicate removal : " + str(res))
#This code is contributed by Vinay pinjala.

Output

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]

The time complexity of this algorithm is O(n*m), where n is the number of strings in the list and m is the maximum length of each string. This is because we iterate through each string and split it into substrings, which takes O(m) time for each string.

The auxiliary space of this algorithm is also O(n*m), since we create a new list of modified strings that has the same length and size as the original list, and we use a dictionary to store the unique substrings. However, the actual space usage may be smaller than nm, depending on how many duplicates are removed from each string.

Method#8:Using reduce():

Algorithm:

Import the reduce function from functools module.
Create a list test_list and initialize it with some string values.
Print the original list.
Use the reduce function to remove duplicate substrings. The reduce function takes three arguments: a lambda function, the list to iterate over, and an optional initial value.
The lambda function is used to merge the lists by concatenating them with the + operator. The lambda function takes two arguments: the accumulator x and the current element y.
Use the split function to split each string in test_list into a list of substrings based on the delimiter “-“.
Convert the list of substrings into a set to remove duplicates.
Convert the setback to a list.
Append the list to the accumulator.
Print the final result.

Python3

from functools import reduce
 
# initializing list
test_list = ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
 
# printing original list
print("The original list : " + str(test_list))
 
# removing duplicate substrings using reduce() and set()
res = reduce(lambda x, y: x + [list(set(y.split('-')))], test_list, [])
 
# print result
print("The list after duplicate removal : " + str(res))
 
# This code is contributed by Rayudu.

Output

The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['ff', 'gg'], ['hh']]

Time Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list.
Space Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list. This is because the function creates a new list for each substring in the input list, which could potentially be as long as the input strings themselves