Python – Remove duplicate words from Strings in List
Sometimes, while working with Python list we can have a problem in which we need to perform removal of duplicated words from string list. This can have application when we are in data domain. Let’s discuss certain ways in which this task can be performed.
Method #1 : Using set() + split() + loop The combination of above methods can be used to perform this task. In this, we first split each list into combined words and then employ set() to perform the task of duplicate removal.
Python3
test_list = [ 'gfg, best, gfg' , 'I, am, I' , 'two, two, three' ]
print ( "The original list is : " + str (test_list))
res = []
for strs in test_list:
res.append( set (strs.split( ", " )))
print ( "The list after duplicate words removal is : " + str (res))
|
Output :
The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]
Time complexity: O(n), where n is the number of elements in the test_list. This is because the loop that iterates through the elements of test_list is the dominant factor in terms of time complexity, taking O(n) time.
Auxiliary space complexity: O(n), where n is the number of elements in the test_list. This is because the res list takes O(n) space, with each element in the list being a set of split strings which takes O(n) space.
Method #2 : Using list comprehension + set() + split() This is similar method to above. The difference is that we employ list comprehension instead of loops to perform the iteration part.
Python3
test_list = [ 'gfg, best, gfg' , 'I, am, I' , 'two, two, three' ]
print ( "The original list is : " + str (test_list))
res = [ set (strs.split( ", " )) for strs in test_list]
print ( "The list after duplicate words removal is : " + str (res))
|
Output :
The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'three', 'two'}]
Time Complexity: O(n) where n is the number of elements in the list “test_list”.
Auxiliary Space: O(n) where n is the number of elements in the list “test_list”.
Method: Using sorted()+index()+split()
Python3
test_list = [ 'gfg best gfg' , 'I am I' , 'two two three' ];a = []
for i in test_list:
words = i.split()
print ( " " .join( sorted ( set (words), key = words.index)),end = " " )
|
Output
gfg best I am two three
Time Complexity: O(nlogn), where n is the length of the list test_list
Auxiliary Space: O(1) constant additional space of is created
Method : Using split() and set() functions
Python3
def fun(x):
y = []
for i in x:
if i not in y:
y.append(i)
return y
test_list = [ 'gfg,best,gfg' , 'I,am,I' , 'two,two,three' ]
print ( "The original list is : " + str (test_list))
res = []
for strs in test_list:
x = strs.split( "," )
res.append( set (fun(x)))
print ( "The list after duplicate words removal is : " + str (res))
|
Output
The original list is : ['gfg,best,gfg', 'I,am,I', 'two,two,three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'I', 'am'}, {'two', 'three'}]
Time Complexity : O(N)
Auxiliary Space : O(N)
Method : Using operator.countOf() method
Python3
import operator as op
def fun(x):
y = []
for i in x:
if op.countOf(y, i) = = 0 :
y.append(i)
return y
test_list = [ 'gfg,best,gfg' , 'I,am,I' , 'two,two,three' ]
print ( "The original list is : " + str (test_list))
res = []
for strs in test_list:
x = strs.split( "," )
res.append( set (fun(x)))
print ( "The list after duplicate words removal is : " + str (res))
|
Output
The original list is : ['gfg,best,gfg', 'I,am,I', 'two,two,three']
The list after duplicate words removal is : [{'best', 'gfg'}, {'am', 'I'}, {'two', 'three'}]
Time Complexity : O(N)
Auxiliary Space : O(N)
Method: Using Recursive method.
Algorithm:
- If the input list is empty, return an empty list.
- Split the first element of the list by commas and convert it to a set to remove duplicates.
- Recursively, call remove_duplicates_recursive function on the rest of the list (i.e., all elements except the first).
- Combine the set of unique words of the first element and the recursive result (i.e., unique sets of words of the rest of the list) into a new list.
- Return the new list.
Python3
def remove_duplicates_recursive(lst):
if not lst:
return []
else :
first = set (lst[ 0 ].split( ", " ))
rest = remove_duplicates_recursive(lst[ 1 :])
return [first] + rest
test_list = [ 'gfg, best, gfg' , 'I, am, I' , 'two, two, three' ]
print ( "The original list is : " + str (test_list))
res = remove_duplicates_recursive(test_list)
print ( "The list after duplicate words removal is : " + str (res))
|
Output
The original list is : ['gfg, best, gfg', 'I, am, I', 'two, two, three']
The list after duplicate words removal is : [{'gfg', 'best'}, {'am', 'I'}, {'three', 'two'}]
Time complexity: O(n * m * log(m)), where n is the length of the input list, and m is the maximum length of any element in the list. This is because we need to split each element by commas, which takes O(m) time, and then convert the resulting list to a set to remove duplicates, which takes O(m * log(m)) time in the worst case (when all words in the list have maximum length m and the set needs to be sorted). We need to do this for each element in the list, so the overall time complexity is O(n * m * log(m)).
Auxiliary Space: O(n * m), where n is the length of the input list and m is the maximum length of any element in the list. This is because we are creating a new list of sets of unique words, which takes up O(n * m) space in the worst case (when all elements in the list have maximum length m). Additionally, the recursive call stack can take up to O(n) space, since we need to make n recursive calls in the worst case (when the input list is not empty).
Last Updated :
20 Mar, 2023
Like Article
Save Article
Share your thoughts in the comments
Please Login to comment...