# Python | Duplicate substring removal from list

Sometimes we can come to the problem in which we need to deal with certain strings in a list that are separated by some separator and we need to remove the duplicates in each of these kinds of strings. Simple shorthands to solve this kind of problem is always good to have. Let’s discuss certain ways in which this can be done.

Method #1: Using split() and for loops

## Python3

 `# Python3 code to demonstrate` `# removing duplicate substrings`   `# initializing list` `test_list ``=` `[ ``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``(``"The original list : "` `+` `str``(test_list))`     `# removing duplicate substrings` `res ``=` `[]` `for` `i ``in` `test_list:` `    ``x``=``i.split(``"-"``)` `    ``a``=``[]` `    ``for` `j ``in` `x:` `        ``if` `j ``not` `in` `a:` `            ``a.append(j)` `    ``res.append(a)`   `# print result` `print``(``"The list after duplicate removal : "` `+` `str``(res))`

Output

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]```

Time Complexity: O(n*n), where n is the length of the input list. This is because we’re using the split() and for loops which has a time complexity of O(n*n) in the worst case.
Auxiliary Space: O(n), as we’re using additional space res other than the input list itself with the same size of input list

Method #2: Using set() + split() This particular problem can be solved using the split function to have target string and then set that actually would remove the duplicacy from the string.

## Python3

 `# Python3 code to demonstrate` `# removing duplicate substrings` `# using set() + split()`   `# initializing list` `test_list ``=` `[ ``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``("The original ``list` `: " ``+` `str``(test_list))`   `# using set() + split()` `# removing duplicate substrings` `res ``=` `[``set``(sub.split(``'-'``)) ``for` `sub ``in` `test_list]`   `# print result` `print``("The ``list` `after duplicate removal : " ``+` `str``(res))`

Output :

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [{'aa', 'bb'}, {'cc', 'bb'}, {'gg', 'ff'}, {'hh'}]```

Method #3: Using {} + split() + list comprehension
For the cases in which we require to fully segregate the strings as a separate component, we can use these set of methods to achieve this task. The curly braces convert to set and rest all the functionality is similar to method above.

## Python3

 `# Python3 code to demonstrate` `# removing duplicate substrings` `# using {} + split() + list comprehension`   `# initializing list` `test_list ``=` `[ ``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``("The original ``list` `: " ``+` `str``(test_list))`   `# using {} + split() + list comprehension` `# removing duplicate substrings` `res ``=` `list``({i ``for` `sub ``in` `test_list ``for` `i ``in` `sub.split(``'-'``)})`   `# print result` `print``("The ``list` `after duplicate removal : " ``+` `str``(res))`

Output :

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : ['cc', 'ff', 'aa', 'hh', 'gg', 'bb']```

Method #4:Using Counter() function

## Python3

 `# Python3 code to demonstrate` `# removing duplicate substrings` `from` `collections ``import` `Counter` `# initializing list` `test_list ``=` `[``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``(``"The original list : "` `+` `str``(test_list))`     `# removing duplicate substrings` `res ``=` `[]` `for` `i ``in` `test_list:` `    ``x ``=` `i.split(``"-"``)` `    ``freq ``=` `Counter(x)` `    ``tempresult ``=` `[]` `    ``for` `j ``in` `x:` `        ``if` `freq[j] > ``0``:` `            ``tempresult.append(j)` `            ``freq[j] ``=` `0` `    ``res.append(tempresult)`   `# print result` `print``(``"The list after duplicate removal : "` `+` `str``(res))`

Output

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]```

Method#5: Using Recursive method.

## Python3

 `# Recursive function to remove duplicate substrings` `def` `remove_duplicates(substrings):` `    ``if` `not` `substrings:` `        ``return` `[]` `    ``result ``=` `[]` `    ``for` `substring ``in` `substrings:` `        ``if` `substring ``not` `in` `result:` `            ``result.append(substring)` `    ``return` `result`       `# Initialize the list of strings` `test_list ``=` `[ ``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``(``"The original list : "` `+` `str``(test_list))`     `# Split each string into substrings and remove duplicates` `result ``=` `[remove_duplicates(string.split(``"-"``)) ``for` `string ``in` `test_list]`     `# print result` `print``(``"The list after duplicate removal : "` `+` `str``(result))` `#this code contributed by tvsk`

Output

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]```

Time Complexity: O(n)

Space Complexity: O(n)

Method#6: Using  list comprehension and set():

## Python3

 `# Python3 code to demonstrate` `# removing duplicate substrings` `# initializing list` `test_list ``=` `[ ``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]` `# printing original list` `print``(``"The original list : "` `+` `str``(test_list))` `# removing duplicate substrings` `res ``=` `[``list``(``set``(i.split(``"-"``))) ``for` `i ``in` `test_list]` `# print result` `print``(``"The list after duplicate removal : "` `+` `str``(res))`   `#This code is contributed by Jyothi Pinjala`

Output

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['cc', 'bb'], ['gg', 'ff'], ['hh']]```

Time Complexity: O(n)

Space Complexity: O(n)

Method#7:Using dict.fromkeys()

The given code removes duplicate substrings in each string of a list by splitting each string by the “-” character and using a dictionary to remove duplicates.

Here’s a step-by-step explanation of the algorithm:

1. Initialize a list of strings test_list.
2. Initialize an empty list res to store the modified strings.
3. Loop through each string s in test_list using a for loop.
4. Split the string s by the “-” character using the split() function, and create a list of the resulting substrings.
5. Convert the list to a dictionary using the dict() function, which automatically removes duplicates because dictionaries cannot have duplicate keys.
6. Convert the dictionary back to a list using the list() function to get the unique substrings.
7. Append the list of unique substrings to the res list.
8. After the loop, return res.

## Python3

 `# initializing list` `test_list ``=` `[ ``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``(``"The original list : "` `+` `str``(test_list))`   `# removing duplicate substrings` `res ``=` `[]` `for` `s ``in` `test_list:` `    ``res.append(``list``(``dict``.fromkeys(s.split(``"-"``))))`   `# print result` `print``(``"The list after duplicate removal : "` `+` `str``(res))` `#This code is contributed by Vinay pinjala.`

Output

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['gg', 'ff'], ['hh']]```

The time complexity of this algorithm is O(n*m), where n is the number of strings in the list and m is the maximum length of each string. This is because we iterate through each string and split it into substrings, which takes O(m) time for each string.

The auxiliary space of this algorithm is also O(n*m), since we create a new list of modified strings that has the same length and size as the original list, and we use a dictionary to store the unique substrings. However, the actual space usage may be smaller than nm, depending on how many duplicates are removed from each string.

Method#8:Using reduce():

Algorithm:

1. Import the reduce function from functools module.
2. Create a list test_list and initialize it with some string values.
3. Print the original list.
4. Use the reduce function to remove duplicate substrings. The reduce function takes three arguments: a lambda function, the list to iterate over, and an optional initial value.
5. The lambda function is used to merge the lists by concatenating them with the + operator. The lambda function takes two arguments: the accumulator x and the current element y.
6. Use the split function to split each string in test_list into a list of substrings based on the delimiter “-“.
Convert the list of substrings into a set to remove duplicates.
7. Convert the setback to a list.
8. Append the list to the accumulator.
9. Print the final result.

## Python3

 `from` `functools ``import` `reduce`   `# initializing list` `test_list ``=` `[``'aa-aa-bb'``, ``'bb-cc'``, ``'gg-ff-gg'``, ``'hh-hh'``]`   `# printing original list` `print``(``"The original list : "` `+` `str``(test_list))`   `# removing duplicate substrings using reduce() and set()` `res ``=` `reduce``(``lambda` `x, y: x ``+` `[``list``(``set``(y.split(``'-'``)))], test_list, [])`   `# print result` `print``(``"The list after duplicate removal : "` `+` `str``(res))`   `# This code is contributed by Rayudu.`

Output

```The original list : ['aa-aa-bb', 'bb-cc', 'gg-ff-gg', 'hh-hh']
The list after duplicate removal : [['aa', 'bb'], ['bb', 'cc'], ['ff', 'gg'], ['hh']]```

Time Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list.
Space Complexity: O(n*m), where n is the length of the input list and m is the maximum length of any substring in the input list. This is because the function creates a new list for each substring in the input list, which could potentially be as long as the input strings themselves

Whether you're preparing for your first job interview or aiming to upskill in this ever-evolving tech landscape, GeeksforGeeks Courses are your key to success. We provide top-quality content at affordable prices, all geared towards accelerating your growth in a time-bound manner. Join the millions we've already empowered, and we're here to do the same for you. Don't miss out - check it out now!

Previous
Next