Sometimes we can come to the problem in which we need to deal with certain strings in a list that are separated by some separator and we need to remove the duplicates in each of these kinds of strings. Simple shorthands to solve this kind of problem is always good to have. Let’s discuss certain ways in which this can be done.
Method #1: Using split() and for loops
Python3
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ( "The original list : " + str (test_list))
res = []
for i in test_list:
x = i.split( "-" )
a = []
for j in x:
if j not in a:
a.append(j)
res.append(a)
print ( "The list after duplicate removal : " + str (res))
|
Output
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
Time Complexity: O(n*n), where n is the length of the input list. This is because we’re using the split() and for loops which has a time complexity of O(n*n) in the worst case.
Auxiliary Space: O(n), as we’re using additional space res other than the input list itself with the same size of input list
Method #2: Using set() + split() This particular problem can be solved using the split function to have target string and then set that actually would remove the duplicacy from the string.
Python3
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ("The original list : " + str (test_list))
res = [ set (sub.split( '-' )) for sub in test_list]
print ("The list after duplicate removal : " + str (res))
|
Output :
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [{'aa', 'bb'}, {'cc', 'bb'}, {'gg', 'ff'}, {'hh'}]
Method #3: Using {} + split() + list comprehension
For the cases in which we require to fully segregate the strings as a separate component, we can use these set of methods to achieve this task. The curly braces convert to set and rest all the functionality is similar to method above.
Python3
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ("The original list : " + str (test_list))
res = list ({i for sub in test_list for i in sub.split( '-' )})
print ("The list after duplicate removal : " + str (res))
|
Output :
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : ['cc', 'ff', 'aa', 'hh', 'gg', 'bb']
Method #4:Using Counter() function
Python3
from collections import Counter
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ( "The original list : " + str (test_list))
res = []
for i in test_list:
x = i.split( "-" )
freq = Counter(x)
tempresult = []
for j in x:
if freq[j] > 0 :
tempresult.append(j)
freq[j] = 0
res.append(tempresult)
print ( "The list after duplicate removal : " + str (res))
|
Output
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
Method#5: Using Recursive method.
Python3
def remove_duplicates(substrings):
if not substrings:
return []
result = []
for substring in substrings:
if substring not in result:
result.append(substring)
return result
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ( "The original list : " + str (test_list))
result = [remove_duplicates(string.split( "-" )) for string in test_list]
print ( "The list after duplicate removal : " + str (result))
|
Output
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
Time Complexity: O(n)
Space Complexity: O(n)
Method#6: Using list comprehension and set():
Python3
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ( "The original list : " + str (test_list))
res = [ list ( set (i.split( "-" ))) for i in test_list]
print ( "The list after duplicate removal : " + str (res))
|
Output
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['cc', 'bb'], ['gg', 'ff'], ['hh']]
Time Complexity: O(n)
Space Complexity: O(n)
Method#7:Using dict.fromkeys()
The given code removes duplicate substrings in each string of a list by splitting each string by the “-” character and using a dictionary to remove duplicates.
Here’s a step-by-step explanation of the algorithm:
- Initialize a list of strings test_list.
- Initialize an empty list res to store the modified strings.
- Loop through each string s in test_list using a for loop.
- Split the string s by the “-” character using the split() function, and create a list of the resulting substrings.
- Convert the list to a dictionary using the dict() function, which automatically removes duplicates because dictionaries cannot have duplicate keys.
- Convert the dictionary back to a list using the list() function to get the unique substrings.
- Append the list of unique substrings to the res list.
- After the loop, return res.
Python3
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ( "The original list : " + str (test_list))
res = []
for s in test_list:
res.append( list ( dict .fromkeys(s.split( "-" ))))
print ( "The list after duplicate removal : " + str (res))
|
Output
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]
The time complexity of this algorithm is O(n*m), where n is the number of strings in the list and m is the maximum length of each string. This is because we iterate through each string and split it into substrings, which takes O(m) time for each string.
The auxiliary space of this algorithm is also O(n*m), since we create a new list of modified strings that has the same length and size as the original list, and we use a dictionary to store the unique substrings. However, the actual space usage may be smaller than nm, depending on how many duplicates are removed from each string.
Method#8:Using reduce():
Algorithm:
- Import the reduce function from functools module.
- Create a list test_list and initialize it with some string values.
- Print the original list.
- Use the reduce function to remove duplicate substrings. The reduce function takes three arguments: a lambda function, the list to iterate over, and an optional initial value.
- The lambda function is used to merge the lists by concatenating them with the + operator. The lambda function takes two arguments: the accumulator x and the current element y.
- Use the split function to split each string in test_list into a list of substrings based on the delimiter “-“.
Convert the list of substrings into a set to remove duplicates.
- Convert the setback to a list.
- Append the list to the accumulator.
- Print the final result.
Python3
from functools import reduce
test_list = [ 'aa-aa-bb' , 'bb-cc' , 'gg-ff-gg' , 'hh-hh' ]
print ( "The original list : " + str (test_list))
res = reduce ( lambda x, y: x + [ list ( set (y.split( '-' )))], test_list, [])
print ( "The list after duplicate removal : " + str (res))
|
Output
The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['ff', 'gg'], ['hh']]
Time Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list.
Space Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list. This is because the function creates a new list for each substring in the input list, which could potentially be as long as the input strings themselves
Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape,
GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out -
check it out now!
Last Updated :
05 Apr, 2023
Like Article
Save Article