PostgreSQL – DISTINCT ON expression
PostgreSQL also provides on an expression as DISTINCT ON that is used with the SELECT statement to remove duplicates from a query set result just like the DISTINCT clause.In addition to that it also keeps the “first row” of each row of duplicates in the query set result.
SELECT DISTINCT ON (column_1) column_alias, column_2 FROM table_name ORDER BY column_1, column_2;
As the order of rows returned from the SELECT statement is unpredictable which means the “first row” of each group of the duplicate is also unpredictable. It is good practice to use the ORDER BY clause with the DISTINCT ON(expression) to make the result set in the desired order.
Note: The DISTINCT ON expression must always match the leftmost expression in the ORDER BY clause.
Now, let’s look into a few examples for better understanding. For the sake of example, we will create a sample database as explained below:
Create a database(say, Favourite_colours) using the commands shown below:
CREATE DATABASE Favourite_colours;
Now add a table(say, my_table) with columns(say, id, colour_1 and colour_2) to the database using the command below:
CREATE TABLE my_table( id serial NOT NULL PRIMARY KEY, colour_1 VARCHAR, colour_2 VARCHAR );
Now insert some data in the table that we just added to our database using the command below:
INSERT INTO my_table(colour_1, colour_2) VALUES ('red', 'red'), ('red', 'red'), ('red', NULL), (NULL, 'red'), ('red', 'green'), ('red', 'blue'), ('green', 'red'), ('green', 'blue'), ('green', 'green'), ('blue', 'red'), ('blue', 'green'), ('blue', 'blue');
Now check if everything is as intended by making a query as below:
SELECT id, colour_1, colour_2 FROM my_table;
If everything is as intended, the output will be like as shown below:
Since, our database is good to go, we move onto the implementation of the SELECT DISTINCT ON expression.
SELECT DISTINCT ON (colour_1) colour_1, colour_2 FROM my_table ORDER BY colour_1, colour_2;
Notice the change in the query set result order.
SELECT DISTINCT ON (colour_2) colour_2, colour_1 FROM my_table ORDER BY colour_2, colour_1;