How to Group and Aggregate Data Using SQL?
There can be many columns in a database table, so sometimes it can become difficult and time taking to find the same type of data in these columns. The GROUP BY statement groups the identical rows present in the columns of a table. GROUP BY statement in conjunction with SQL aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG() etc.) help us to analyze the data efficiently.
Now, let’s understand this with an example.
Attention reader! Don’t stop learning now. Learn SQL for interviews using SQL Course by GeeksforGeeks.
Before starting, we have to keep in mind a few important points-
- All the data that are not listed as parameters to GROUP BY needs an aggregation function applied to it.
- If you run this SQL command then you will see an error because the database does not know what to do about the state. There is only one city per group but there are many states. They all cannot be output as a value without any aggregation function.
- We cannot use the WHERE clause after the GROUP BY clause. In that case, we use the HAVING clause.
- To show records in a specific order (ascending or descending) we can use the ORDER BY clause. But using it is not necessary.
SELECT column1, column2..., Aggregate(column3, column4,..) FROM Table_name WHERE Condition... //OPTION-1 (depending on the condition) GROUP BY column1,column2,.. HAVING Condition... //OPTION-2 (depending on the condition) ORDER BY column1,column2,.. DESC(if required);
Here we use Microsoft SQL Server for executing queries.
Step 1: Creating the Database
For this use the below command to create a database named Sales.
CREATE DATABASE Sales;
Step 2: Using the Database
Use the below SQL statement to switch the database context to Sales:
Step 3: Table Definition
We are going to use the following Sales_Order table in our Sales database.
Create Table Sales_Order (ORDERNO VARCHAR(20) Primary Key, ClientNo VARCHAR(20), Orderdate DATE, SALESMANNO VARCHAR(20), Oredrstatus VARCHAR(30)0;
You can use the below statement to query the description of the created table:
EXEC SP_COLUMNS Sales_Order;
Step 4: Adding data to the table
Use the below statement to add data to the Sales_Order table:
INSERT INTO Sales_Order VALUES ('O19001', 'C00001', '2007-10-03', 'S00001', 'In Process'); INSERT INTO Sales_Order VALUES('O19002', 'C00002', '2007-11-01', 'S00002', 'Cancelled'); INSERT INTO Sales_Order VALUES('O19003', 'C00003', '2007-9-05', 'S00003', 'Fulfilled'); INSERT INTO Sales_Order VALUES('O19004', 'C00004', '2007-6-06', 'S00004', 'Fulfilled'); INSERT INTO Sales_Order VALUES('O19005', 'C00005', '2007-8-02', 'S00005', 'Cancelled'); INSERT INTO Sales_Order VALUES('O19006', 'C00006', '2007-8-01', 'S00006', 'In Process');
Step 5: Viewing the inserted data
SELECT * FROM Sales_Order;
Step 6: Suppose, we want to find out the Client-no of the clients whose orders are In Process.
To do this we’ll use the COUNT() function:
SELECT CLIENTNO, COUNT(ORDERSTATUS) AS Order_In_Process from Sales_Order WHERE ORDERSTATUS='In Process' GROUP BY CLIENTNO;
Run this command on your machine and see the output.
So, from this output, we can easily tell that C00001 and C00006 have 1 order still In_Process.
Now, create another table named ORDERED on your own following the same steps as before. You can give the database name whatever you want. The description of the table would be like this:
Create Table Ordered place VARCHAR(30), product VARCHAR(30), price DECIMAL);
Now insert the data in the following manner:
This time we want to know the total amount of money earned from both places.
To do this we’ll use the SUM() function:
SELECT place, SUM(price) FROM ORDERED GROUP BY place;
Now, just by looking at the output, we can tell from which place we are earning more money instead of going through each row.
You can try using AVG(), MIN(), MAX() aggregate functions with GROUP BY and make different kinds of queries like these.