Open In App

Python – Remove duplicate words from Strings in List

Improve
Improve
Improve
Like Article
Like
Save Article
Save
Share
Report issue
Report

Sometimes, while working with Python list we can have a problem in which we need to perform removal of duplicated words from string list. This can have application when we are in data domain. Let’s discuss certain ways in which this task can be performed. 

Method #1 : Using set() + split() + loop The combination of above methods can be used to perform this task. In this, we first split each list into combined words and then employ set() to perform the task of duplicate removal. 

Python3




# Python3 code to demonstrate
# Remove duplicate words from Strings in List
# using loop + set() + split()
 
# Initializing list
test_list = ['gfg, best, gfg', 'I, am, I', 'two, two, three']
 
# printing original list
print("The original list is : " + str(test_list))
 
# Remove duplicate words from Strings<code></code> in List
# using loop + set() + split()
res = []
for strs in test_list:
    res.append(set(strs.split(", ")))
 
# printing result
print("The list after duplicate words removal is : " + str(res))


Output : 

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

Time complexity: O(n), where n is the number of elements in the test_list. This is because the loop that iterates through the elements of test_list is the dominant factor in terms of time complexity, taking O(n) time.

Auxiliary space complexity: O(n), where n is the number of elements in the test_list. This is because the res list takes O(n) space, with each element in the list being a set of split strings which takes O(n) space.

Method #2 : Using list comprehension + set() + split() This is similar method to above. The difference is that we employ list comprehension instead of loops to perform the iteration part. 

Python3




# Python3 code to demonstrate
# Remove duplicate words from Strings in List
# using list comprehension + set() + split()
 
# Initializing list
test_list = ['gfg, best, gfg', 'I, am, I', 'two, two, three']
 
# printing original list
print("The original list is : " + str(test_list))
 
# Remove duplicate words from Strings in List
# using list comprehension + set() + split()
res = [set(strs.split(", ")) for strs in test_list]
 
# printing result
print("The list after duplicate words removal is : " + str(res))


Output : 

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]

Time Complexity: O(n) where n is the number of elements in the list “test_list”. 
Auxiliary Space: O(n) where n is the number of elements in the list “test_list”.

Method: Using sorted()+index()+split()

Python3




test_list = ['gfg best gfg', 'I am I', 'two two three' ];a=[]
for i in test_list:
  words = i.split()
  print(" ".join(sorted(set(words), key=words.index)),end=" ")


Output

gfg best I am two three 

Time Complexity: O(nlogn), where n is the length of the list test_list 
Auxiliary Space: O(1) constant additional space of is created 

Method : Using split() and set() functions

Python3




# Python3 code to demonstrate
# Remove duplicate words from Strings in List
def fun(x):
    y=[]
    for i in x:
        if i not in y:
            y.append(i)
    return y
# Initializing list
test_list = ['gfg,best,gfg', 'I,am,I', 'two,two,three']
 
# printing original list
print("The original list is : " + str(test_list))
res=[]
for strs in test_list:
    x=strs.split(",")
    res.append(set(fun(x)))
 
# printing result
print("The list after duplicate words removal is : " + str(res))


Output

The original list is : ['gfg,best,gfg', 'I,am,I', 'two,two,three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'two', 'three'}]

Time Complexity : O(N)
Auxiliary Space : O(N)

Method : Using operator.countOf() method

Python3




# Python3 code to demonstrate
# Remove duplicate words from Strings in List
import operator as op
 
 
def fun(x):
    y = []
    for i in x:
        if op.countOf(y, i) == 0:
            y.append(i)
    return y
 
 
# Initializing list
test_list = ['gfg,best,gfg', 'I,am,I', 'two,two,three']
 
# printing original list
print("The original list is : " + str(test_list))
res = []
for strs in test_list:
    x = strs.split(",")
    res.append(set(fun(x)))
 
# printing result
print("The list after duplicate words removal is : " + str(res))


Output

The original list is : ['gfg,best,gfg', 'I,am,I', 'two,two,three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'am', 'I'}, {'two', 'three'}]

Time Complexity : O(N)
Auxiliary Space : O(N)

Method: Using Recursive method.

Algorithm:

  1. If the input list is empty, return an empty list.
  2. Split the first element of the list by commas and convert it to a set to remove duplicates.
  3. Recursively, call remove_duplicates_recursive function on the rest of the list (i.e., all elements except the first).
  4. Combine the set of unique words of the first element and the recursive result (i.e., unique sets of words of the rest of the list) into a new list.
  5. Return the new list.

Python3




def remove_duplicates_recursive(lst):
    if not lst:
        return []
    else:
        first = set(lst[0].split(", "))
        rest = remove_duplicates_recursive(lst[1:])
        return [first] + rest
# Initializing list
test_list = ['gfg, best, gfg', 'I, am, I', 'two, two, three']
 
# printing original list
print("The original list is : " + str(test_list))
res = remove_duplicates_recursive(test_list)
 
# printing result
print("The list after duplicate words removal is : " + str(res))
#this code contributed by tvsk


Output

The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'gfg', 'best'}, {'am', 'I'}, {'three', 'two'}]

Time complexity: O(n * m * log(m)), where n is the length of the input list, and m is the maximum length of any element in the list. This is because we need to split each element by commas, which takes O(m) time, and then convert the resulting list to a set to remove duplicates, which takes O(m * log(m)) time in the worst case (when all words in the list have maximum length m and the set needs to be sorted). We need to do this for each element in the list, so the overall time complexity is O(n * m * log(m)).

Auxiliary Space: O(n * m), where n is the length of the input list and m is the maximum length of any element in the list. This is because we are creating a new list of sets of unique words, which takes up O(n * m) space in the worst case (when all elements in the list have maximum length m). Additionally, the recursive call stack can take up to O(n) space, since we need to make n recursive calls in the worst case (when the input list is not empty).



Last Updated : 20 Mar, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads