SELECT DISTINCT vs GROUP BY in MySQL

In MySQL, the two most common ways of managing and retrieving unique values are with SELECT and GROUP BY. However, they are used for different reasons. With SELECT, we can get different values from the same column, so we don’t have to worry about duplicates. With GROUP BY, we can aggregate data and group results based on specific columns. However, there are some differences between the two operators. Both can be used to generate the same output. But we need to know the difference for better utilization of resources and time.

SELECT DISTINCT

The SELECT DISTINCT statement is used to retrieve unique values from a single column or a combination of columns in a table. It is often used when you want to eliminate duplicate rows from the result set.

Syntax:

SELECT DISTINCT column1, column2 

FROM table_name

Parameters Used:

column1, column2: Names of the fields of the table.
table_name: Table from where we want to fetch the records.

GROUP BY

The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows, like calculating aggregates (e.g., COUNT, SUM, AVG) for each group. It is typically used with aggregate functions.

Syntax:

SELECT column1, column2, ..., aggregate_function(column_name)
FROM table_name
WHERE condition
GROUP BY column1, column2, ...;

Parameters Used:

column1, column2, etc.: Columns by which you want to group the rows.
aggregate_function(column_name): Aggregate functions like COUNT, SUM, AVG, etc., applied to the grouped rows.
table_name: The name of the table you are querying.
condition: Optional. Conditions to filter the rows before they are grouped.
GROUP BY column1, column2, …: Specifies the columns used for grouping. Rows with the same values in these columns will be grouped together.

Let’s understand the difference between SELECT DISTINCT and GROUP BY through some examples:

Example of SELECT DISTINCT vs GROUP BY in MySQL

Let’s assume we have a customers table and an orders table. We are going to use these tables to show how DISTINCT and GROUP BY can be used for different use cases:

Query:

CREATE TABLE customers (
  customer_id INT PRIMARY KEY,
  name VARCHAR(255) NOT NULL,
  city VARCHAR(255) NOT NULL
);

INSERT INTO customers (customer_id, name, city) VALUES
  (1, 'John Doe', 'New York'),
  (2, 'Jane Smith', 'London'),
  (3, 'Mike Brown', 'Paris'),
  (2, 'Jane Smith', 'London');

Output:

customers table

Orders Table

Query:

CREATE TABLE orders (
  order_id INT PRIMARY KEY,
  customer_id INT NOT NULL,
  product VARCHAR(255) NOT NULL,
  price DECIMAL(10,2) NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

INSERT INTO orders (order_id, customer_id, product, price) VALUES
  (1, 1, 'Phone', 100.00),
  (2, 2, 'Laptop', 500.00),
  (3, 1, 'Tablet', 200.00),
  (4, 2, 'Watch', 150.00);

Output:

orders table

Example 1: Find distinct customer cities

Here, we want to get names of distinct cities only, so DISTINCT will be the obvious choice here.

Use SELECT DISTINCT to retrieve unique cities where customers reside:

SELECT DISTINCT city
FROM customers;

Output:

Distinct clause output

Explanation: The SQL query retrieves unique values from the “city” column in the “customers” table. The output provides a list of distinct cities where customers are located, eliminating duplicate entries and showcasing the unique cities in the dataset.

Example 2: Count Orders Per Customer City

Here, we also want to count the number of orders per city. Since an aggregate function(COUNT) is involved here, we will be using GROUP BY.

Use GROUP BY to group customers by city and count their orders:

SELECT city, COUNT(*) AS order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY city;

Output:

order by

Explanation: The SQL query joins the “customers” and “orders” tables on the customer_id, grouping the results by city. It counts the number of orders for each city, providing an output that displays the distinct cities along with the corresponding count of orders placed by customers in each city.

Key Differences Observed in This Example

SELECT DISTINCT: Returns only unique city values, ignoring duplicates.
GROUP BY: Groups customers by city and counts their orders, showing duplicate city entries with individual counts.
Aggregate function: COUNT(*) is used with GROUP BY to summarize data within each group.

Purposes and Functionalities

SELECT DISTINCT:

Removes duplicate rows: It eliminates rows where all the selected columns have identical values.
No aggregation: Cannot be used with aggregate functions like SUM, AVG, etc.
Simpler syntax: Easier to write and understand for basic non-duplication.
Performance: Can be faster than GROUP BY if no indexes are used, as it avoids sorting.

GROUP BY:

Grouping data: Creates groups of rows based on shared values in one or more columns.
Aggregation: Used with aggregate functions to summarize data within each group (e.g., COUNT, MAX, AVG).
More complex: Requires specifying group columns and aggregate functions.
Sorting: Generally involves sorting data, which can impact performance.

Choosing between them

Use SELECT DISTINCT when you simply want to eliminate duplicate rows without summarizing data.
Use GROUP BY when you need to group data and perform aggregations within each group.

Feature	SELECT DISTINCT	GROUP BY
Purpose	Remove duplicate rows	Group data and aggregate
Aggregate functions	No	Yes
Sorting	No (optional)	Yes (default)
Performance	Can be faster (no index)	Slower (sorting)
Syntax	Simpler	More Complex

Conclusion

In MySQL, SELECT DISTINCT is used to retrieve unique values from one or more columns in a result set, eliminating duplicates without aggregation. Conversely, GROUP BY is employed to group rows sharing common values in specified columns, enabling aggregation functions to summarize data within each group. While SELECT DISTINCT is suitable for obtaining unique values, GROUP BY is essential for performing complex aggregations and generating summary results based on grouping criteria, each serving distinct purposes in MySQL query operations.

Article Tags :

Databases

Geeks Premier League

MySQL

Geeks Premier League 2023

MySQL Query