PostgreSQL – Deleting Duplicate Rows using Subquery

Last Updated : 28 Aug, 2020

PostgreSQL has various techniques to delete duplicate rows. Subqueries can be used for the same purpose.

For the purpose of demonstration let’s set up a sample table(say, basket) that stores fruits as follows:

CREATE TABLE basket(
    id SERIAL PRIMARY KEY,
    fruit VARCHAR(50) NOT NULL
);

Now let’s add some data to the newly created basket table.

INSERT INTO basket(fruit) values('apple');
INSERT INTO basket(fruit) values('apple');

INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('orange');
INSERT INTO basket(fruit) values('orange');

INSERT INTO basket(fruit) values('banana');

Now let’s verify the basket table using the below statement:

SELECT * FROM basket;

This should result into below:

Now that we have set up the sample table, we will query for the duplicates using the following:

SELECT
    fruit,
    COUNT( fruit )
FROM
    basket
GROUP BY
    fruit
HAVING
    COUNT( fruit )> 1
ORDER BY
    fruit;

This should lead to the following results:

The following statement uses a subquery to delete duplicate rows and keep the row with the lowest id.

DELETE FROM basket
WHERE id IN
    (SELECT id
    FROM 
        (SELECT id,
         ROW_NUMBER() OVER( PARTITION BY fruit
        ORDER BY  id ) AS row_num
        FROM basket ) t
        WHERE t.row_num > 1 );

In this example, the subquery returned the duplicate rows except for the first row in the duplicate group. And the outer DELETE statement deleted the duplicate rows returned by the subquery.

If you want to keep the duplicate row with highest id, just change the order in the subquery:

DELETE FROM basket
WHERE id IN
    (SELECT id
    FROM 
        (SELECT id,
         ROW_NUMBER() OVER( PARTITION BY fruit
        ORDER BY  id ) AS row_num
        FROM basket ) t
        WHERE t.row_num > 1 );

In case you want to delete duplicate based on values of multiple columns, here is the query template:

DELETE FROM table_name
WHERE id IN
    (SELECT id
    FROM 
        (SELECT id,
         ROW_NUMBER() OVER( PARTITION BY column_1,
         column_2
        ORDER BY  id ) AS row_num
        FROM table_name ) t
        WHERE t.row_num > 1 );

In this case, the statement will delete all rows with duplicate values in the column_1 and column_2 columns. To verify the above use the below query:

SELECT
    fruit,
    COUNT( fruit )
FROM
    basket
GROUP BY
    fruit
HAVING
    COUNT( fruit )> 1
ORDER BY
    fruit;

Output:

Suggest improvement

PostgreSQL - Temporary table name

PostgreSQL - Random Number Generation

Share your thoughts in the comments

PostgreSQL – Deleting Duplicate Rows using Subquery

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?