Python | Remove duplicates based on Kth element tuple list

Last Updated : 04 Apr, 2023

Sometimes, while working with records, we may have a problem in which we need to remove duplicates based on Kth element of a tuple in the list. This problem has application in domains that uses records as input. Let’s discuss certain ways in which this problem can be solved.

Method #1: Using loop

This is a brute-force method to perform this particular task. In this, we check for the Kth index of tuple and add a set to keep a record. If that value is already in the memory set, we discard it from result as it’s duplicate.

Python3

# Python3 code to demonstrate working of
# Remove duplicates based on Kth element tuple list
# Using loop
 
# initialize list 
test_list = [(3, 1, 5), (1, 3, 6), (2, 1, 7),
                        (5, 2, 8), (6, 3, 0)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# initialize K 
K = 1
 
# Remove duplicates based on Kth element tuple list
# Using loop
temp = set()   
res = []
for ele in test_list:
    if ele[K] not in temp:
        res.append(ele)
        temp.add(ele[K])
 
# printing result
print("The list after removal of K based duplicates : " + str(res))

Output :

The original list is : [(3, 1, 5), (1, 3, 6), (2, 1, 7), (5, 2, 8), (6, 3, 0)] The list after removal of K based duplicates : [(3, 1, 5), (1, 3, 6), (5, 2, 8)]

Time complexity: O(n), where n is the length of the input list,
Auxiliary space: O(n)

Method #2: Using reduce() + lambda + keys()

In this method, we perform the task of filtering using reduce() + lambda, and decide to append or not using the extracted keys using keys(). If key has already occurred, its discarded or otherwise added.

Python3

# Python3 code to demonstrate working of
# Remove duplicates based on Kth element tuple list
# Using reduce() + lambda + keys()
from functools import reduce
 
# initialize list 
test_list = [(3, 1), (1, 3), (3, 2), (5, 2), (5, 3)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# initialize K 
K = 0
 
# Remove duplicates based on Kth element tuple list
# Using reduce() + lambda + keys()
res = reduce(lambda sub, ele : ele[K] in dict(sub).keys() 
                   and sub or sub + [ele], test_list, [])
 
# printing result
print("The list after removal of K based duplicates : " + str(res))

Output :

The original list is : [(3, 1), (1, 3), (3, 2), (5, 2), (5, 3)] The list after removal of K based duplicates : [(3, 1), (1, 3), (5, 2)]

Time complexity: O(n log n) due to the use of dict() which has an average case time complexity of O(1) for lookup.
Auxiliary Space: O(n) because we create a dictionary to store the unique elements of the list, and the size of the dictionary can be at most n.

Method #3: Using List comprehension

In this program removes duplicates from a list of tuples based on the value of the Kth element in each tuple using list comprehension. The resulting list of non-duplicate tuples is printed.

Python3

# Python3 code to demonstrate working of
# Remove duplicates based on Kth element tuple list
# Using List comprehension
 
# initialize list
test_list = [(3, 1, 5), (1, 3, 6), (2, 1, 7), (5, 2, 8), (6, 3, 0)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# initialize K
K = 1
 
# Remove duplicates based on Kth element tuple list
# Using List comprehension
res = [ele for i, ele in enumerate(test_list) if ele[K] not in [x[K] for x in test_list[:i]]]
 
# printing result
print("The list after removal of K based duplicates : " + str(res))

Output

The original list is : [(3, 1, 5), (1, 3, 6), (2, 1, 7), (5, 2, 8), (6, 3, 0)]
The list after removal of K based duplicates : [(3, 1, 5), (1, 3, 6), (5, 2, 8)]

Time complexity: O(n^2), where n is the length of the input list test_list.
Auxiliary space: O(n), where n is the length of the input list test_list.

Method #4: Using dictionary

We can use a dictionary to keep track of the Kth element of each tuple that you have encountered so far. If you encounter a tuple with a Kth element that is already in the dictionary, it means that you have already encountered a tuple with that Kth element and you should skip it. If the Kth element is not in the dictionary, you should add it to the dictionary and add the tuple to the result list.

Python3

# Python3 code to demonstrate working of
# Remove duplicates based on Kth element tuple list
# Using dictionary
 
# initialize list
test_list = [(3, 1, 5), (1, 3, 6), (2, 1, 7),
                        (5, 2, 8), (6, 3, 0)]
 
# printing original list
print("The original list is : " + str(test_list))
 
# initialize K
K = 1
 
# Remove duplicates based on Kth element tuple list
# Using dictionary
temp = {}
res = []
for ele in test_list:
    if ele[K] not in temp:
        res.append(ele)
        temp[ele[K]] = True
 
# printing result
print("The list after removal of K based duplicates : " + str(res))

Output

The original list is : [(3, 1, 5), (1, 3, 6), (2, 1, 7), (5, 2, 8), (6, 3, 0)]
The list after removal of K based duplicates : [(3, 1, 5), (1, 3, 6), (5, 2, 8)]

Time complexity: O(n), where n is the length of the input list.
Auxiliary space: O(k), where k is the number of unique values of the Kth element in the input list.

Method #5: Using set

One additional method for removing duplicates based on the Kth element of a tuple list is using a set. This method involves iterating over the list, extracting the Kth element of each tuple and adding it to a set. If the element is already in the set, then it is a duplicate and should be ignored. Otherwise, the tuple can be added to the result list. Here’s an example implementation

Python3

test_list = [(3, 1, 5), (1, 3, 6), (2, 1, 7), (5, 2, 8), (6, 3, 0)]
K = 1
 
seen = set()
res = []
for tup in test_list:
    kth_element = tup[K]
    if kth_element not in seen:
        seen.add(kth_element)
        res.append(tup)
 
print("The list after removal of K based duplicates : " + str(res))

Output

The list after removal of K based duplicates : [(3, 1, 5), (1, 3, 6), (5, 2, 8)]

Time complexity: O(n), where n is the length of the input list test_list.
Auxiliary space: O(n), since the worst case scenario is when all tuples have unique Kth elements and thus the size of the seen set and the res list will be equal to the size of the input list test_list.

Method#6: Using Recursive method.

In this algorithm, we use a set temp to keep track of the Kth elements that have been seen so far, and a list res to store the non-duplicate elements.

In the base case, when the input list is empty, we simply return the result list res.

In the recursive case, we get the first element from the input list, and check if its Kth element is already in the set temp. If it is not, we append the first element to the result list res, and add its Kth element to the set temp.

We then recursively call the remove_duplicates function with the remaining elements in the input list (i.e., all elements except the first one), the same K value, the updated set temp, and the updated result list res.

Finally, we return the result list res.

Python3

def remove_duplicates(test_list, K, temp=set(), res=[]):
    if not test_list:
        return res
 
    ele = test_list[0]
    if ele[K] not in temp:
        res.append(ele)
        temp.add(ele[K])
 
    return remove_duplicates(test_list[1:], K, temp, res)
 
 
test_list = [(3, 1, 5), (1, 3, 6), (2, 1, 7), (5, 2, 8), (6, 3, 0)]
K = 1
 
result = remove_duplicates(test_list, K)
print("The list after removal of K based duplicates : " + str(result))

Output

The list after removal of K based duplicates : [(3, 1, 5), (1, 3, 6), (5, 2, 8)]

Time complexity: O(n^2) in the worst case, where n is the length of the input list. This is because in the worst case, each element in the list needs to be compared to all the previous elements in the list to check for duplicates. However, in practice, the time complexity may be much lower, as duplicates may be found early in the list and not all elements need to be compared.
Auxiliary Space: O(n^2) in the worst case, as each element in the list and its Kth element may need to be stored in the set. However, in practice, the space complexity may be much lower, as duplicates may be found early in the list and not all elements need to be stored in the set.

Suggest improvement

Python - Summation in Dual element Records List

Python | Record Similar tuple occurrences

Share your thoughts in the comments