Open In App

How to SELECT Top N Rows For Each Group in SQL Server

Last Updated : 20 Feb, 2024
Improve
Improve
Like Article
Like
Save
Share
Report

SQL Server’s ROW_Number() window function is often used to get the top number of rows in a group. In this article, we’ll look at how to get the top N Row Counts in SQL Server using this function. We’ll provide step-by-step instructions, syntax examples, and examples to help you get the results you need.

ROW_Number() is used in conjunction with PARTITION BY to rank and filter within a group. By using this function, you can get the top n Row Counts based on a specific column, improving the analytical performance of your SQL query.

SELECT Top N Rows For Each Group in SQL Server

The ROW_NUMBER() window function in SQL Server is used to assign unique row numbers to the rows within a partition. Combined with the PARTITION BY clause, it becomes an effective way to rank and filter results within each group.

Steps:

  1. Apply the ROW_NUMBER() function to your result set, and specify the PARTITION BY clause to define the grouping based on a specific column.
  2. Create a Common Table Expression (CTE) to organize the query and simplify the final selection of rows.
  3. Filter the results from the CTE based on the row number and select only the top N rows within each group.

Syntax:

WITH CTE AS (
SELECT
<group_column>,
<value_column>,
ROW_NUMBER() OVER (PARTITION BY <group_column> ORDER BY <value_column> DESC) AS row_num
FROM
<Table>
)
SELECT * FROM CTE WHERE row_num <= N;
  • <group_column>: This represents the column by which you want to group your data. In the context of the example, it could be the Region column as we are grouping by region.
  • <value_column>: This represents the column based on which you want to order the data within each group. In the example, it’s Revenue, as we want to rank salespeople within each region by their revenue.
  • <Table>: This represents the table from which you are selecting data. In the example, it’s the Sales table.

Example of SELECT Top N Rows For Each Group in SQL Server

Example 1: Retrieving Top 2 Salespersons per Region

In this example, we have a Sales table with columns Region, Salesperson, and Revenue and we will retrieve the top 2 salespersons based on revenue for each region.

-- Create Sales table
CREATE TABLE Sales (
Region VARCHAR(50),
Salesperson VARCHAR(50),
Revenue DECIMAL(10, 2)
);

-- Insert sample data
INSERT INTO Sales (Region, Salesperson, Revenue)
VALUES
('North', 'John', 5000),
('North', 'Alice', 7000),
('North', 'Bob', 6000),
('South', 'Emma', 8000),
('South', 'Chris', 7500),
('South', 'David', 9000);

-- Retrieve top 2 salespersons per region
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Region ORDER BY Revenue DESC) AS row_num
FROM
Sales
)
SELECT * FROM CTE WHERE row_num <= 2;

Output:

Output-Table-with-top-2-rows-per-group

Output Table with top 2 rows per group

Explanation: The Sales table is created with 6 records and then the ROW_NUMBER() approach is used to retrieve the top 2 salespersons per region. The North and south are the top regions whose top two matched records are printed.

Example 2: Retrieving Latest User Activity

Here you will create a table UserActivity with columns UserID, ActivityType, and Timestamp to retrieve the latest activity for each user.

-- Create UserActivity table
CREATE TABLE UserActivity (
UserID INT,
ActivityType VARCHAR(50),
Timestamp DATETIME
);

-- Insert sample data
INSERT INTO UserActivity (UserID, ActivityType, Timestamp)
VALUES
(1, 'Login', '2024-02-14 08:00:00'),
(1, 'UpdateProfile', '2024-02-14 09:30:00'),
(2, 'Login', '2024-02-14 10:00:00'),
(2, 'Logout', '2024-02-14 11:45:00'),
(3, 'Login', '2024-02-14 12:30:00');

-- Retrieve the latest activity for each user
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY Timestamp DESC) AS row_num
FROM
UserActivity
)
SELECT * FROM CTE WHERE row_num = 1;

Output:

Table-with-top-1-rows-per-group

Table with top 1 rows per group

Explanation: The UserActivity table is created and then the ROW_NUMBER() approach is used to retrieve the most recent 1 activity for each user. So, every partition is ordered by timestamp in descending order to get the users latest activity when one row is selected from CTE.

Conclusion

The PARTITION BY clause allows you to restrict the results to the first N rows of each group in SQL Server, making your queries more efficient. This way, you can filter and extract only the most important information. Whether you’re looking at sales data or user activity, using this method allows you to make accurate and data-driven decisions quickly and easily.


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads