GROUP BY vs PARTITION BY in MySQL

Last Updated : 31 Jan, 2024

MySQL, like many relational database management systems, provides powerful tools for manipulating and analyzing data through the use of SQL (Structured Query Language). Two commonly used clauses, GROUP BY and PARTITION BY, play essential roles in aggregate queries and window functions, respectively.

In this article, we are going to learn about the differences between the GROUP BY and PARTITION BY clauses in MySQL.

Database and Table Creation

For better understanding let’s take an example by creating the database and table, created using the following commands:

1. To create the database

CREATE DATABASE School;

2. To show the all databases on the machine

SHOW DATABASES;

3. To select the database on which you have to work

USE School;

4. To create the table in the Database

CREATE TABLE student (
    Rollno int  NOT NULL PRIMARY KEY,
    Name varchar(50)  NOT NULL,
    Phone_num varchar(10) NOT NULL,
    city varchar(30) NOT NULL,
    }

5. To insert the data in the table

INSERT INTO student VALUES (1, 'Vishal', '9373533572', 'Pune');

Do same for the insert the data in the table.

6. For showing the table

SELECT * FROM student;

Then the below will be the Structure of the table

student Table

GROUP BY

The GROUP BY clause is a fundamental component of SQL queries when working with aggregated data. Its primary purpose is to group rows based on common values in specified columns, allowing the application of aggregate functions to each group independently.

Syntax:

SELECT column1, column2, aggregate_function(column3)

FROM table

[WHERE conditions]

GROUP BY column1, column2;

where,

column1, column2: Represents the columns by which you want to group the result set.
aggregate_function(column3): This is an optional part that specifies an aggregate function (e.g., SUM, AVG, COUNT, MIN, MAX) applied to a particular column.
WHERE conditions: It specifies the conditions that must be fulfilled for the records to be selected. It is optional.

Example of the GROUP BY Clause

Use the GROUP BY clause using the following command on student table:

SELECT city, count(*)
FROM student
GROUP BY city;

Output:

output

Explanation:

In this result, the count column is labeled as COUNT(*), which is the default label for the count function in a SELECT statement. This result indicates the count of students in each city based on the provided data.

PARTITION BY

In MySQL, the PARTITION BY clause is used in the context of window functions, also known as analytic functions. Window functions operate on a set of rows related to the current row, allowing for more complex and fine-grained calculations.

Syntax:

SELECT column1, column2, …,

window_function() OVER (PARTITION BY partition_column1, partition_column2, …)

FROM table_name;

Where,

column1, column2: Represents the columns you want to include in your result set.
window_function(): The window function you want to use, such as ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), SUM(), AVG(), MIN(), MAX(), etc.
OVER: Over Indicates that you’re defining a window for the function.
PARTITION BY: Divides the result set into partitions based on the specified columns.
partition_column1, partition_column2: Represents the columns by which you want to partition the result set.

Example of the PARTITION BY Clause

Use the GROUP BY clause using the following command on student table:

SELECT
Rollno,
Phone_num,
city,
(@row_number := CASE WHEN @current_city = city THEN @row_number + 1 ELSE 1 END) AS row_number,
@currrent_city  := city
FROM
student,
(SELECT @row_number :=0, @current_city := ' ') AS t
ORDER BY
city, Rollno;

Below is the output for the above query:

Output:

output

Explanation:

We’re using user-defined variables @row_number and @current_city to keep track of the row number and the current city, respectively.

For each row, we check if the current city is the same as the previous row’s city. If it is, we increment the row number. Otherwise, we reset the row number to 1.

We use the ORDER BY clause to ensure that rows are ordered by city and then by Rollno. This query will assign a row number to each record within each city, ordered by the Rollno.

Key Differences Between GROUP BY and PARTITION BY

Use Case:
- Use GROUP BY for aggregating data and creating summary rows.
- Use PARTITION BY in the context of window functions to perform calculations within specific partitions of the result set.
Aggregation vs. Window Functions:
- GROUP BY is used with aggregate functions like SUM, AVG, etc., to summarize data.
- PARTITION BY is used with window functions for analytical processing on a specified window of rows.
Result Set:
- GROUP BY typically results in a reduced dataset with one row per group.
- PARTITION BY does not reduce the number of rows in the result set; instead, it augments each row with calculated values based on the specified window.

Conclusion

While both GROUP BY and PARTITION BY are essential tools in SQL, they serve distinct purposes. GROUP BY is geared towards aggregating data and creating summary statistics, making it indispensable for reporting and analytics. PARTITION BY, on the other hand, enhances the capabilities of window functions, providing a way to perform complex calculations within specific partitions of the result set. Understanding the distinctions between these clauses allows SQL developers to leverage their full potential in crafting efficient and insightful queries.

Suggest improvement

MBR v/s GPT Partition in OS

Share your thoughts in the comments