Python – Remove duplicate words from Strings in List

Last Updated : 20 Mar, 2023

Sometimes, while working with Python list we can have a problem in which we need to perform removal of duplicated words from string list. This can have application when we are in data domain. Let’s discuss certain ways in which this task can be performed.

Method #1 : Using set() + split() + loop The combination of above methods can be used to perform this task. In this, we first split each list into combined words and then employ set() to perform the task of duplicate removal.

Python3

# Python3 code to demonstrate
# Remove duplicate words from Strings in List
# using loop + set() + split()
 
# Initializing list
test_list = ['gfg, best, gfg', 'I, am, I', 'two, two, three']
 
# printing original list
print("The original list is : " + str(test_list))
 
# Remove duplicate words from Strings<code></code> in List
# using loop + set() + split()
res = []
for strs in test_list:
    res.append(set(strs.split(", ")))
 
# printing result
print("The list after duplicate words removal is : " + str(res))

Output :

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

Time complexity: O(n), where n is the number of elements in the test_list. This is because the loop that iterates through the elements of test_list is the dominant factor in terms of time complexity, taking O(n) time.

Auxiliary space complexity: O(n), where n is the number of elements in the test_list. This is because the res list takes O(n) space, with each element in the list being a set of split strings which takes O(n) space.

Method #2 : Using list comprehension + set() + split() This is similar method to above. The difference is that we employ list comprehension instead of loops to perform the iteration part.

Python3

# Python3 code to demonstrate
# Remove duplicate words from Strings in List
# using list comprehension + set() + split()
 
# Initializing list
test_list = ['gfg, best, gfg', 'I, am, I', 'two, two, three']
 
# printing original list
print("The original list is : " + str(test_list))
 
# Remove duplicate words from Strings in List
# using list comprehension + set() + split()
res = [set(strs.split(", ")) for strs in test_list]
 
# printing result
print("The list after duplicate words removal is : " + str(res))

Output :

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

Time Complexity: O(n) where n is the number of elements in the list “test_list”.
Auxiliary Space: O(n) where n is the number of elements in the list “test_list”.

Method: Using sorted()+index()+split()

Python3

test_list = ['gfg best gfg', 'I am I', 'two two three' ];a=[]
for i in test_list:
  words = i.split()
  print(" ".join(sorted(set(words), key=words.index)),end=" ")

Output

gfg best I am two three

Time Complexity: O(nlogn), where n is the length of the list test_list
Auxiliary Space: O(1) constant additional space of is created

Method : Using split() and set() functions

Python3

# Python3 code to demonstrate
# Remove duplicate words from Strings in List
def fun(x):
    y=[]
    for i in x:
        if i not in y:
            y.append(i)
    return y
# Initializing list
test_list = ['gfg,best,gfg', 'I,am,I', 'two,two,three']
 
# printing original list
print("The original list is : " + str(test_list))
res=[]
for strs in test_list:
    x=strs.split(",")
    res.append(set(fun(x)))
 
# printing result
print("The list after duplicate words removal is : " + str(res))

Output

The original list is : ['gfg,best,gfg', 'I,am,I', 'two,two,three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'two', 'three'}]

Time Complexity : O(N)
Auxiliary Space : O(N)

Method : Using operator.countOf() method

Python3

# Python3 code to demonstrate
# Remove duplicate words from Strings in List
import operator as op
 
 
def fun(x):
    y = []
    for i in x:
        if op.countOf(y, i) == 0:
            y.append(i)
    return y
 
 
# Initializing list
test_list = ['gfg,best,gfg', 'I,am,I', 'two,two,three']
 
# printing original list
print("The original list is : " + str(test_list))
res = []
for strs in test_list:
    x = strs.split(",")
    res.append(set(fun(x)))
 
# printing result
print("The list after duplicate words removal is : " + str(res))

Output

The original list is : ['gfg,best,gfg', 'I,am,I', 'two,two,three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'am', 'I'}, {'two', 'three'}]

Time Complexity : O(N)
Auxiliary Space : O(N)

Method: Using Recursive method.

Algorithm:

If the input list is empty, return an empty list.
Split the first element of the list by commas and convert it to a set to remove duplicates.
Recursively, call remove_duplicates_recursive function on the rest of the list (i.e., all elements except the first).
Combine the set of unique words of the first element and the recursive result (i.e., unique sets of words of the rest of the list) into a new list.
Return the new list.

Python3

def remove_duplicates_recursive(lst):
    if not lst:
        return []
    else:
        first = set(lst[0].split(", "))
        rest = remove_duplicates_recursive(lst[1:])
        return [first] + rest
# Initializing list
test_list = ['gfg, best, gfg', 'I, am, I', 'two, two, three']
 
# printing original list
print("The original list is : " + str(test_list))
res = remove_duplicates_recursive(test_list)
 
# printing result
print("The list after duplicate words removal is : " + str(res))
#this code contributed by tvsk

Output

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'gfg', 'best'}, {'am', 'I'}, {'three', 'two'}]

Time complexity: O(n * m * log(m)), where n is the length of the input list, and m is the maximum length of any element in the list. This is because we need to split each element by commas, which takes O(m) time, and then convert the resulting list to a set to remove duplicates, which takes O(m * log(m)) time in the worst case (when all words in the list have maximum length m and the set needs to be sorted). We need to do this for each element in the list, so the overall time complexity is O(n * m * log(m)).

Auxiliary Space: O(n * m), where n is the length of the input list and m is the maximum length of any element in the list. This is because we are creating a new list of sets of unique words, which takes up O(n * m) space in the worst case (when all elements in the list have maximum length m). Additionally, the recursive call stack can take up to O(n) space, since we need to make n recursive calls in the worst case (when the input list is not empty).

Suggest improvement

Remove Duplicate Strings from a List in Python

Share your thoughts in the comments