Python – Remove Similar Rows from Tuple Matrix

Sometimes, while working with Tuple Matrix, we can have a problem in which we get lots of data, which are similar, i.e elements are same in rows, just the ordering of tuples is different, it’s sometimes, desired to get them removed. This kind of problem can have application in domains such as web development and day-day programming. Let’s discuss certain ways in which this task can be performed.

Input : test_list = [[(4, 5), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
Output : {((4, 5), (3, 2))}

Input : test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
Output : {((3, 2), (4, 6)), ((4, 5), (3, 2))}

Method #1: Using set() + tuple() + sorted() + list comprehension The combination of above functions can be used to solve this problem. In this, we first, perform the sorting, and then convert the rows to set, which automatically removes the duplicates.

Python3

# Python3 code to demonstrate working of 
# Remove Similar Rows from Tuple Matrix
# Using set() + tuple() + sorted() + list comprehension
 
# initializing lists

test_list = [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
 
# printing original list

print("The original list is : " + str(test_list))
 
# Remove Similar Rows from Tuple Matrix
# Using set() + tuple() + sorted() + list comprehension

res = set([tuple(set(sub)) for sub in test_list])
 
# printing result 

print("Tuple matrix after removal : " + str(res))

Output :

The original list is : [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
Tuple matrix after removal : {((4, 6), (2, 2)), ((4, 5), (3, 2))}

Time Complexity: O(n), where n is the length of the list test_dict
Auxiliary Space: O(n) additional space of size n is created where n is the number of elements in the res list

Method #1: Using set() + tuple() + sorted() + list comprehension The combination of above functions can be used to solve this problem. In this, we perform the task of sorting and tuple conversion using frozenset().

Python3

# Python3 code to demonstrate working of 
# Remove Similar Rows from Tuple Matrix
# Using set() + frozenset() + generator expression
 
# initializing lists

test_list = [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
 
# printing original list

print("The original list is : " + str(test_list))
 
# Remove Similar Rows from Tuple Matrix
# Using set() + frozenset() + generator expression

res = set([frozenset(sub) for sub in test_list])
 
# printing result 

print("Tuple matrix after removal : " + str(res))

Output :

The original list is : [[(4, 5), (3, 2)], [(2, 2), (4, 6)], [(3, 2), (4, 5)]]
Tuple matrix after removal : {frozenset({(4, 5), (3, 2)}), frozenset({(4, 6), (2, 2)})}

Method #3 : Using dict + tuple() + list

Convert each sublist in the matrix to a tuple so that we can use it as a dictionary key later.
Create an empty dictionary to store the unique tuples.
Loop through each tuple in the matrix and check if it exists in the dictionary.
If it doesn’t exist, add it to the dictionary.
Convert the dictionary values back to tuples and return the set of unique tuples.

Python3

def remove_similar_rows(matrix):

    # convert each sublist in the matrix to a tuple

    tuples = [tuple(row) for row in matrix]
 
    # create an empty dictionary

    seen = {}
 
    # iterate over the tuples

    for tup in tuples:

        # use the sorted tuple as a key to avoid issues with tuple order

        key = tuple(sorted(tup))

        # add the tuple to the dictionary if it hasn't been seen before

        if key not in seen:

            seen[key] = tup
 
    # return the values (i.e. unique tuples) of the dictionary as a list

    return list(seen.values())
 
# test the function with the same input matrices as before

test_list = [[(4, 5), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]

print(remove_similar_rows(test_list))  # output: [((4, 5), (3, 2))]
 
test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]

print(remove_similar_rows(test_list))  # output: [((3, 2), (4, 5)), ((4, 6), (3, 2))]

Output

[((4, 5), (3, 2))]
[((4, 6), (3, 2)), ((3, 2), (4, 5))]

Time complexity: O(n)
Auxiliary Space: O(n)

Method #4: Using the numpy library to remove duplicates:

Algorithm:

1.Initialize the input list test_list.
2.Convert the list to a numpy array using np.array().
3.Use the numpy’s unique() function to remove duplicates by specifying the axis as 0.
4.Convert the numpy array back to a list using tolist() function.
5.Print the unique list.

Python3

import numpy as np
 
# initializing lists

test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
 
# convert the list to a numpy array

arr = np.array(test_list)
 
# use numpy's unique() function to remove duplicates

unique_arr = np.unique(arr, axis=0)
 
# convert the numpy array back to a list of lists

unique_list = unique_arr.tolist()
 
# print the unique list

print(unique_list)
#This code is contributed by Jyothi pinjala

Output:

The original list is : [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
[[[3, 2], [4, 5]], [[4, 6], [3, 2]]]

Time complexity:
The time complexity of this algorithm is O(n log n), where n is the number of elements in the input list. The reason for this is that the np.unique() function uses sorting algorithms to find the unique elements, which takes O(n log n) time in the worst-case scenario.

Space complexity:
The space complexity of this algorithm is O(n), where n is the number of elements in the input list. This is because we are creating a numpy array and a list of the same size as the input list to store the unique elements.

Method 5: Using a loop to iterate over each tuple and checking if it has already been added to the unique list using a separate list.

Step-by-step approach:

Initialize an empty list unique_list to store the unique tuples.
Initialize an empty list seen to keep track of the tuples that have been seen already.
Loop through each tuple in the original list test_list.
If the tuple is not in the seen list, add it to both unique_list and seen.
Print the unique_list.

Below is the implementation of the above approach:

Python3

# initializing lists

test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
 
# initialize an empty list to store unique tuples

unique_list = []
 
# initialize an empty list to keep track of seen tuples

seen = []
 
# loop through each tuple in the original list

for tpl in test_list:

    # if the tuple has not been seen yet, add it to the unique list and seen list

    if tpl not in seen:

        unique_list.append(tpl)

        seen.append(tpl)
 
# print the unique list

print(unique_list)

Output

[[(4, 6), (3, 2)], [(3, 2), (4, 5)]]

Time complexity: O(n^2), where n is the length of the original list.
Auxiliary space: O(n), as we need to store the seen list.

Method#6: Using Recursive method.

Algorithm:

1. Initialize an empty list to store unique tuples
2. Initialize an empty list to keep track of seen tuples
3. Loop through each tuple in the original list:
a. If the tuple has not been seen yet, add it to the unique list and seen list
4. Return the unique list

Python3

def remove_duplicates_recursive(lst, unique_lst=[], seen=[]):

    # base case: the list is empty

    if not lst:

        return unique_lst

    # check if the first tuple in the list has already been seen

    elif lst[0] not in seen:

        # if not, add it to the unique list and seen list

        unique_lst.append(lst[0])

        seen.append(lst[0])

    # recursively call the function with the rest of the list

    return remove_duplicates_recursive(lst[1:], unique_lst, seen)

test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]

unique_list = remove_duplicates_recursive(test_list)

print(unique_list)

Output

[[(4, 6), (3, 2)], [(3, 2), (4, 5)]]

Time complexity:
– The loop iterates through each tuple in the list once, giving us a time complexity of O(n), where n is the length of the list.
– The `if` statement inside the loop has a time complexity of O(1).
– Therefore, the overall time complexity of the algorithm is O(n).

Space complexity:
– We create two lists to store unique tuples and seen tuples, respectively, so the space complexity of the algorithm is O(n), where n is the length of the list.

Method#7: Using heapq:

Algorithm:

Initialize the input list
Convert the list to a numpy array
Use numpy’s unique() function to remove duplicates
Convert the numpy array back to a list of lists
Print the unique list

Python3

import heapq
 
# initializing lists

test_list = [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
# printing original list

print("The original list is : " + str(test_list))
# create a set of tuples from the original list

tuple_set = set(map(tuple, test_list))
 
# convert the set back to a list of lists

unique_list = [list(t) for t in tuple_set]
 
# print the unique list

print(unique_list)
#This code is contributed by Rayudu.

Output

The original list is : [[(4, 6), (3, 2)], [(3, 2), (4, 5)], [(3, 2), (4, 5)]]
[[(3, 2), (4, 5)], [(4, 6), (3, 2)]]

Time complexity:
The time complexity of this algorithm is O(n log n), where n is the number of elements in the input list. This is because the numpy unique() function uses a sorting algorithm, which has a time complexity of O(n log n).

Auxiliary Space:
The space complexity of this algorithm is O(n), where n is the number of elements in the input list. This is because we are converting the input list to a numpy array, which takes up space in memory. Additionally, the numpy unique() function creates a new array to store the unique elements, which also takes up space in memory. Finally, we convert the numpy array back to a list of lists, which also takes up space in memory.

Article Tags :

Python

Python Programs

Python tuple-programs