How to Restrict Results to Top N Rows per Group in PostgreSQL?

Last Updated : 26 Mar, 2024

In data analysis, understanding how to extract meaningful insights from grouped data is crucial. PostgreSQL, a powerful open–source relational database management system, provides robust features to handle such scenarios effectively.

One common analytical task is limiting results to the top N rows within each group, which can be invaluable for various analyses, such as tracking student performance, analyzing sales data, or evaluating product popularity.

In this article, we’ll explore three effective methods in PostgreSQL to restrict results to the top N rows per group, allowing you to make the most of your data analysis efforts.

How to Get Top N Rows in PostgreSQL?

When working with databases, restricting results to the top N rows within specific groups can be challenging but is often necessary for insightful analysis. PostgreSQL offers several method to achieve this goal which helps provide flexibility and efficiency in data analysis tasks.

Using ROW_NUMBER() with a Common Table Expression (CTE)
Using a Subquery
Using a Window Function with Filtering in the WHERE clause

Let’s set up an Environment

To understand How to Restrict results to top N rows per group in PostgreSQL we need a table on which we will perform various operations and queries. Here we will consider a table called sales which contains id, product_id, and amount as Columns.

-- Step 1: Create a table
CREATE TABLE sales (
    id SERIAL PRIMARY KEY,  -- Unique identifier for each sale
    product_id INT,         -- ID of the product sold
    amount NUMERIC          -- Amount of the sale
);

-- Step 2: Insert sample data
INSERT INTO sales (product_id, amount) VALUES 
    (1, 100), (1, 150), (1, 200),  -- Product 1 sales
    (2, 50), (2, 75), (2, 100),    -- Product 2 sales
    (3, 200), (3, 300), (3, 400);  -- Product 3 sales

SELECT * from sales;

After Inserting Some data, Our Table Looks:

sales table

1. Using ROW_NUMBER() with a Common Table Expression (CTE)

Let’s see How to retrieve the top 2 rows per product_id from a sales table, ranked by amount in descending order.

WITH ranked_sales AS (
    SELECT 
        id, 
        product_id, 
        amount,
        ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY amount DESC) AS row_num
    FROM 
        sales
)
-- Select top 2 rows per product_id
SELECT 
    id, 
    product_id, 
    amount
FROM 
    ranked_sales
WHERE 
    row_num <= 2; -- or N according to the user need

Output:

Using-ROW_NUMBER()-with-a-Common-Table-Expression-(CTE)

Output

Explanation:

In this method, a temporary result set called “ranked_sales” is created using a Common Table Expression (CTE).
To assign a unique row number to each row within a group defined by the “product_id” column in the CTE, we use the ROW_NUMBER() window function, ordering by the “amount” column in descending order.
Lastly, we retrieve the top two rows per “product_id” by choosing rows from the “ranked_sales” CTE where the row number is less than or equal to 2 or N.

2. Using a Subquery

Let’s see How to retrieve the top 2 rows per product_id based on the amount in descending order from a table called “sales”

-- Method 2: Using a Subquery
SELECT 
    id, 
    product_id, 
    amount
FROM (
    SELECT 
        id, 
        product_id, 
        amount,
        ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY amount DESC) AS row_num
    FROM 
        sales
) AS ranked_sales
WHERE 
    row_num <= 2;

Output:

Output

Explanation:

This approach is comparable to Method 1 but substitutes a subquery for the CTE.
We build a subquery that applies Method 1’s logic: arranging the rows in each group according to decreasing order based on the “amount” column.
The top two rows per “product_id” are then retrieved by the outer query by choosing rows from the subquery where the row number is less than or equal to 2.

3. Using a Window Function with Filtering in the WHERE Clause

Let’s see How can we retrieve the top N rows per group from a PostgreSQL table, “sales,” based on the “amount” column in descending order,

SELECT 
    id, 
    product_id, 
    amount
FROM (
    SELECT 
        id, 
        product_id, 
        amount,
        DENSE_RANK() OVER (PARTITION BY product_id ORDER BY amount DESC) AS rank
    FROM 
        sales
) AS ranked_sales
WHERE 
    rank <= 2; -- or N

Output:

Output

Explanation:

In this method, a subquery contains the DENSE_RANK() window function.
We order the results of the subquery by the “amount” column in descending order, giving each row within each group defined by the “product_id” column a dense rank.
Afterwards, the outer query retrieves the top two rows for each “product_id” by choosing rows from the subquery where the rank is less than or equal to 2 or N.

Conclusion

Overall, PostgreSQL provides robust features for handling grouped data effectively, making it a valuable tool for data analysis tasks. By restricting results to the top N rows per group, analysts can gain valuable insights into various aspects of their data, such as student performance, sales analysis, and product popularity.

The methods discussed in this article, including using ROW_NUMBER() with a Common Table Expression, a Subquery, or a Window Function with Filtering, offer flexibility and efficiency in extracting meaningful information from PostgreSQL databases. heir requirements.

Gaining knowledge of these methods makes it possible to analyze grouped data effectively and extract insightful information from PostgreSQL databases.

Suggest improvement

How to Restrict Results to top N Rows per Group in PL/SQL?

Share your thoughts in the comments

How to Restrict Results to Top N Rows per Group in PostgreSQL?

How to Get Top N Rows in PostgreSQL?

1. Using ROW_NUMBER() with a Common Table Expression (CTE)

2. Using a Subquery

3. Using a Window Function with Filtering in the WHERE Clause

Conclusion

Please Login to comment...

Similar Reads

What kind of Experience do you want to share?