Python – Remove Similar Rows from Tuple Matrix

Last Updated : 27 Apr, 2023

Sometimes, while working with Tuple Matrix, we can have a problem in which we get lots of data, which are similar, i.e elements are same in rows, just the ordering of tuples is different, it’s sometimes, desired to get them removed. This kind of problem can have application in domains such as web development and day-day programming. Let’s discuss certain ways in which this task can be performed.

Input : test_list = [[(4, 5), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
Output : {((4, 5), (3, 2))}

Input : test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
Output : {((3, 2), (4, 6)), ((4, 5), (3, 2))}

Method #1: Using set() + tuple() + sorted() + list comprehension The combination of above functions can be used to solve this problem. In this, we first, perform the sorting, and then convert the rows to set, which automatically removes the duplicates.

Python3

# Python3 code to demonstrate working of 
# Remove Similar Rows from Tuple Matrix
# Using set() + tuple() + sorted() + list comprehension
 
# initializing lists
test_list = [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Remove Similar Rows from Tuple Matrix
# Using set() + tuple() + sorted() + list comprehension
res = set([tuple(set(sub)) for sub in test_list])
 
# printing result 
print("Tuple matrix after removal : " + str(res))

Output :

The original list is : [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
Tuple matrix after removal : {((4, 6), (2, 2)), ((4, 5), (3, 2))}

Time Complexity: O(n), where n is the length of the list test_dict
Auxiliary Space: O(n) additional space of size n is created where n is the number of elements in the res list

Method #1: Using set() + tuple() + sorted() + list comprehension The combination of above functions can be used to solve this problem. In this, we perform the task of sorting and tuple conversion using frozenset().

Python3

# Python3 code to demonstrate working of 
# Remove Similar Rows from Tuple Matrix
# Using set() + frozenset() + generator expression
 
# initializing lists
test_list = [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
 
# printing original list
print("The original list is : " + str(test_list))
 
# Remove Similar Rows from Tuple Matrix
# Using set() + frozenset() + generator expression
res = set([frozenset(sub) for sub in test_list])
 
# printing result 
print("Tuple matrix after removal : " + str(res))

Output :

The original list is : [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
Tuple matrix after removal : {frozenset({(4, 5), (3, 2)}), frozenset({(4, 6), (2, 2)})}

Method #3 : Using dict + tuple() + list

Convert each sublist in the matrix to a tuple so that we can use it as a dictionary key later.
Create an empty dictionary to store the unique tuples.
Loop through each tuple in the matrix and check if it exists in the dictionary.
If it doesn’t exist, add it to the dictionary.
Convert the dictionary values back to tuples and return the set of unique tuples.

Python3

def remove_similar_rows(matrix):
    # convert each sublist in the matrix to a tuple
    tuples = [tuple(row) for row in matrix]
 
    # create an empty dictionary
    seen = {}
 
    # iterate over the tuples
    for tup in tuples:
        # use the sorted tuple as a key to avoid issues with tuple order
        key = tuple(sorted(tup))
        # add the tuple to the dictionary if it hasn't been seen before
        if key not in seen:
            seen[key] = tup
 
    # return the values (i.e. unique tuples) of the dictionary as a list
    return list(seen.values())
 
# test the function with the same input matrices as before
test_list = [[(4, 5), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
print(remove_similar_rows(test_list))  # output: [((4, 5), (3, 2))]
 
test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
print(remove_similar_rows(test_list))  # output: [((3, 2), (4, 5)), ((4, 6), (3, 2))]

Output

[((4, 5), (3, 2))]
[((4, 6), (3, 2)), ((3, 2), (4, 5))]

Time complexity: O(n)
Auxiliary Space: O(n)

Method #4: Using the numpy library to remove duplicates:

Algorithm:

1.Initialize the input list test_list.
2.Convert the list to a numpy array using np.array().
3.Use the numpy’s unique() function to remove duplicates by specifying the axis as 0.
4.Convert the numpy array back to a list using tolist() function.
5.Print the unique list.

Python3

import numpy as np
 
# initializing lists
test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
 
# convert the list to a numpy array
arr = np.array(test_list)
 
# use numpy's unique() function to remove duplicates
unique_arr = np.unique(arr, axis=0)
 
# convert the numpy array back to a list of lists
unique_list = unique_arr.tolist()
 
# print the unique list
print(unique_list)
#This code is contributed by Jyothi pinjala

Output:

The original list is : [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
[[[3, 2], [4, 5]], [[4, 6], [3, 2]]]

Time complexity:
The time complexity of this algorithm is O(n log n), where n is the number of elements in the input list. The reason for this is that the np.unique() function uses sorting algorithms to find the unique elements, which takes O(n log n) time in the worst-case scenario.

Space complexity:
The space complexity of this algorithm is O(n), where n is the number of elements in the input list. This is because we are creating a numpy array and a list of the same size as the input list to store the unique elements.

Method 5: Using a loop to iterate over each tuple and checking if it has already been added to the unique list using a separate list.

Step-by-step approach:

Initialize an empty list unique_list to store the unique tuples.
Initialize an empty list seen to keep track of the tuples that have been seen already.
Loop through each tuple in the original list test_list.
If the tuple is not in the seen list, add it to both unique_list and seen.
Print the unique_list.

Below is the implementation of the above approach:

Python3

# initializing lists
test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
 
# initialize an empty list to store unique tuples
unique_list = []
 
# initialize an empty list to keep track of seen tuples
seen = []
 
# loop through each tuple in the original list
for tpl in test_list:
    # if the tuple has not been seen yet, add it to the unique list and seen list
    if tpl not in seen:
        unique_list.append(tpl)
        seen.append(tpl)
 
# print the unique list
print(unique_list)

Output

[[(4, 6), (3, 2)], [(3, 2), (4, 5)]]

Time complexity: O(n^2), where n is the length of the original list.
Auxiliary space: O(n), as we need to store the seen list.

Method#6: Using Recursive method.

Algorithm:

1. Initialize an empty list to store unique tuples
2. Initialize an empty list to keep track of seen tuples
3. Loop through each tuple in the original list:
a. If the tuple has not been seen yet, add it to the unique list and seen list
4. Return the unique list

Python3

def remove_duplicates_recursive(lst, unique_lst=[], seen=[]):
    # base case: the list is empty
    if not lst:
        return unique_lst
    # check if the first tuple in the list has already been seen
    elif lst[0] not in seen:
        # if not, add it to the unique list and seen list
        unique_lst.append(lst[0])
        seen.append(lst[0])
    # recursively call the function with the rest of the list
    return remove_duplicates_recursive(lst[1:], unique_lst, seen)
test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
unique_list = remove_duplicates_recursive(test_list)
print(unique_list)

Output

[[(4, 6), (3, 2)], [(3, 2), (4, 5)]]

Time complexity:
– The loop iterates through each tuple in the list once, giving us a time complexity of O(n), where n is the length of the list.
– The `if` statement inside the loop has a time complexity of O(1).
– Therefore, the overall time complexity of the algorithm is O(n).

Space complexity:
– We create two lists to store unique tuples and seen tuples, respectively, so the space complexity of the algorithm is O(n), where n is the length of the list.

Method#7: Using heapq:

Algorithm:

Initialize the input list
Convert the list to a numpy array
Use numpy’s unique() function to remove duplicates
Convert the numpy array back to a list of lists
Print the unique list

Python3

import heapq
 
# initializing lists
test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
# printing original list
print("The original list is : " + str(test_list))
# create a set of tuples from the original list
tuple_set = set(map(tuple, test_list))
 
# convert the set back to a list of lists
unique_list = [list(t) for t in tuple_set]
 
# print the unique list
print(unique_list)
#This code is contributed by Rayudu.

Output

The original list is : [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
[[(3, 2), (4, 5)], [(4, 6), (3, 2)]]

Time complexity:
The time complexity of this algorithm is O(n log n), where n is the number of elements in the input list. This is because the numpy unique() function uses a sorting algorithm, which has a time complexity of O(n log n).

Auxiliary Space:
The space complexity of this algorithm is O(n), where n is the number of elements in the input list. This is because we are converting the input list to a numpy array, which takes up space in memory. Additionally, the numpy unique() function creates a new array to store the unique elements, which also takes up space in memory. Finally, we convert the numpy array back to a list of lists, which also takes up space in memory.

Suggest improvement

Python | Remove similar element rows in tuple Matrix

Share your thoughts in the comments