Open In App

PARTITION BY vs GROUP BY in SQL

In SQL both PARTITION BY and GROUP BY are important clauses used for data aggregation and analysis. Sometimes they work as same but they serve different purposes and are applied in different situations. In this article, we’ll understand both of them along with the syntax, multiple examples for both clauses and also the differences between them.

What is the PARTITION BY Clause?

The PARTITION BY clause is a type of clause that is used with window functions to divide the result set into partitions on which the function is applied. It allows us to perform calculations and aggregations within each partition independently.



Syntax:

SELECT column1, column2, ..., function(column) OVER (PARTITION BY column_name1, column_name2, ...)
FROM table_name;

Explanation: Here the function(column) is the window function that we want to apply and column_name1, column_name2 are the columns which are used to partition the result set.



What is GROUP BY Clause?

The GROUP BY clause is used to aggregate data based on one or more columns and group rows with identical values into summary rows. It is widely used via aggregate functions like SUM, COUNT, and AVG to perform calculations on each group.

Syntax:

SELECT column1, column2, aggregate_function(column3)
FROM table_name
WHERE condition
GROUP BY column1, column2;

Explanation:

Examples of PARTITION BY and GROUP BY

To understand PARTITION BY vs GROUP BY in SQL we need a table on which we will perform various operations and queries. Here we will consider a table called sales which contains product_id, category and sales_amount as Columns.

Query:

CREATE TABLE sales (
product_id INT,
category VARCHAR(50),
sales_amount DECIMAL(10, 2)
);

INSERT INTO sales (product_id, category, sales_amount) VALUES (1, 'Electronics' , 500);
INSERT INTO sales (product_id, category, sales_amount) VALUES (2, 'Clothing' , 300);
INSERT INTO sales (product_id, category, sales_amount) VALUES (3, 'Electronics' , 700);
INSERT INTO sales (product_id, category, sales_amount) VALUES (4, 'Clothing' , 400);
INSERT INTO sales (product_id, category, sales_amount) VALUES (5, 'Electronics' , 600);

Our sales table looks like:

Examples of GROUP BY Clause

Example 1: Total Sales Amount by Category

Query:

SELECT category, SUM(sales_amount) AS total_sales
FROM sales
GROUP BY category;

Output

Explanation: In the above query, We uses GROUP BY to group rows by category and SUM() to calculate the total sales amount for each category in the sales table.

Example 2: Average Sales Amount by Category

Query:

SELECT category, AVG(sales_amount) AS avg_sales
FROM sales
GROUP BY category;

Output:

Explanation: In the above query, We uses GROUP BY to group rows by category and AVG() to calculate the average sales amount for each category in the sales table.

Examples of PARTITION BY Clause

Example 1: Rank of Products within Each Category

Query:

SELECT product_id, category, sales_amount,
RANK() OVER (PARTITION BY category ORDER BY sales_amount DESC) AS rank_within_category
FROM sales;

Output:

Explanation: In the above query, We calculates the rank of each product within its category based on the sales amount, using the RANK() window function with PARTITION BY to group products by category.

Example 2: Cumulative Sales Amount within Each Category

SELECT product_id, category, sales_amount,
SUM(sales_amount) OVER (PARTITION BY category ORDER BY product_id) AS cumulative_sales
FROM sales;

Output:

Explanation: In the above query, We calculates the cumulative sales amount for each product within its category, using the SUM() window function with PARTITION BY to group products by category and ordering by product_id.

Difference Between PARTITION BY and GROUP BY

Let’s compare PARTITION BY and GROUP BY in a tabular format.

Feature

PARTITION BY

GROUP BY

Purpose

It is Used with window functions for partitioning data

It is Used for aggregating data based on one or more columns

Aggregation

It Performs calculations within each partition

It Aggregates data across the entire result set

Output

It Retains all rows in the output

It Aggregates rows and summarizes data into groups

Result

It gives Set Size Result set size may remain unchanged

It gives Result set size may decrease due to aggregation

Conclusion

Overall, Understanding the differences between PARTITION BY and GROUP BY is important for effective data analysis and aggregation in SQL. While GROUP BY is used for summarizing data into groups, PARTITION BY allows for more advanced calculations within each partition. We have saw various examples of PARTITION BY and GROUP BY to get the better understanding of them.


Article Tags :