What are the Best Ways to Write a SQL Query?

Last Updated : 03 Jul, 2022

An SQL Query is used to retrieve the required data from the database. However, there may be multiple SQL queries that yield the same results but with different levels of efficiency. An inefficient query can drain the database resources, reduce the database speed or result in a loss of service for other users. So it is very important to optimize the query to obtain the best database performance.

Let us consider some sample tables to understand these different methods to optimize a query.

Customers : Table Customers contains the details of the prospective customers for a shop.

CustomerID	LastName	FirstName	Address	Age
73001	Smith	John	45 Jump Street	21
73002	Parker	Anna	83 Wild Avenue	45
73003	James	Josie	99 Chestnut Avenue	25
73004	White	Anna	55 Paper Street	72
73005	Sparks	Harry	11 Wisteria Lane	23
73006	Parker	Jane	12 Quentin Road	50

Products : Table Products contains the details of the products available in the shop.

ProductID	ProductName	ProductPrice
1001	Shampoo	100
1002	Tooth paste	20
1003	Soap	15
1004	Hand Sanitizer	50
1005	Deodorant	100

Orders : Table Orders contains the details of the products ordered by the customers from the shop.

CustomerID	ProductID	ProductQuantity
73001	1003	5
73001	1001	1
73003	1002	1
73004	1003	2
73004	1005	1

Now that we have analyzed the tables Customers, Products and Orders, the different ways to optimize a query are given below with query examples from these tables:

1. Provide Correct Formatting for the Query

It is very important to provide the correct formatting while writing a query. This enhances the readability of the query and also makes reviewing and troubleshooting it easier. Some of the rules for formatting a query are given below:

Put each statement in the query in a new line.
Put SQL keywords in the query in uppercase.
Use CamelCase capitalization in the query and avoid underscore(Write ProductName and not Product_Name).

Example: This is a query that displays the CustomerID and LastName of the customers that have currently ordered products and are younger than 50 years.

Select distinct Customers.CustomerID, Customers.LastName from Customers INNER join Orders on Customers.CustomerID = Orders.CustomerID where Customers.Age < 50;

The above query looks unreadable as all the statements are in one line and the keywords are in lower case. So an optimized version is given below using the rules for formatting specified earlier.

SELECT DISTINCT Customers.CustomerID, Customers.LastName
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
WHERE Customers.Age < 50;

2. Specify the SELECT fields instead of using SELECT *

SELECT * is used to obtain all the data from a table. So it should not be used unless all of the data is actually required for a given condition as it is highly inefficient and slows the execution time of the query. It is much better to use SELECT along with the specific fields required to optimize the query.

Example: This is a query that displays all the data in the table Customers when only the CustomerID and LastName was required.

SELECT * 
FROM Customers;

It is better to use the select statement with the fields CustomerID and LastName to obtain the desired result.

SELECT CustomerID, LastName 
FROM Customers;

3. Remove Correlated Subqueries if not required

A correlated subquery is a nested query that depends on the outer query for its values. If there are millions of users in the database, the correlated subquery is inefficient and takes a lot of time as it will need to run millions of times. In that case, an inner join is more efficient.

Example: This is a query that displays the CustomerID of the customers that have currently ordered products using a correlated subquery.

SELECT CustomerID
FROM Customers
WHERE EXISTS (SELECT * FROM Orders
              WHERE Customers.CustomerID = Orders.CustomerID);

It is better to use the inner join in this case to obtain the same result.

SELECT DISTINCT Customers.CustomerID
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

Note: It is best to avoid a correlated subquery if almost all of the rows are needed from the database. However, in some cases, they are inevitable and have to be used.

4. Limit the results obtained by the query

In case only limited results are required, it is better to use the LIMIT statement. This statement limits the records and only displays the number of records specified. For Example: If there is a large database of a million records and only the first ten are required, it is better to use the LIMIT statement as this will ensure that only the relevant records are obtained without overtaxing the system.

Example: This is a query that displays the customer details with limit 3:

SELECT *
FROM Customers 
LIMIT 3;

5. Remove The DISTINCT Clause if not required

The DISTINCT clause is used to obtain distinct results from a query by eliminating the duplicates. However, this increases the execution time of the query as all the duplicate fields are grouped together. So, it is better to avoid the DISTINCT clause as much as possible. As an alternative, the GROUP BY clause can be used to obtain distinct results.

Example: This is a query that displays the distinct LastName of all the customers using the DISTINCT clause.

select distinct LastName
from Customers;

The distinct LastName of the customers can also be obtained using the GROUP BY clause which is demonstrated by the next example:

SELECT LastName
FROM  CUSTOMERS
GROUP BY LastName;

6. Avoid Functions in Predicates

Functions in SQL are used to perform specific actions. However, they are quite inefficient as they do not allow the usage of indexes which in turn slows the execution time of the query. So it is better to avoid functions in a query as much as possible to ensure its optimization.

Example: This is a query that displays the details of the products whose name starts with 'Sha'.

SELECT *
FROM Products
WHERE SUBSTR(ProductName, 1, 3) = 'Sha';

It is better to avoid the function and use the LIKE clause instead to obtain the same result.

SELECT *
FROM Products
WHERE ProductName LIKE 'Sha%';

7. Avoid OR, AND, NOT operators if possible

It is highly likely that indexes are not in use when OR, AND, NOT operators are used. In the case of large databases, it is better to find replacements for these to speed up the execution time of the query.

Examples of this for OR and AND operators are given below:

Example 1: This is a query that displays the details of the customers with CustomerID 73001, 73004 and 73005 using the OR operator.

SELECT * 
FROM Customers
WHERE CustomerID = 73001
OR CustomerID = 73004
OR CustomerID = 73005;

It is better to use the IN operator in this case to obtain the same result.

SELECT * 
FROM Customers
WHERE CustomerID IN (73001, 73004, 73005);

Example 2: This is a query that displays the details of the customers with age between 25 and 50 using the AND operator.

SELECT * 
FROM Customers
WHERE age >= 25 AND age <= 50;

It is better to use the BETWEEN operator in this case to obtain the same result.

SELECT * 
FROM Customers
WHERE age BETWEEN 25 AND 50;

8. Use WHERE clause instead of HAVING clause whenever possible

The HAVING clause is used with the GROUP BY clause to enforce conditions as the WHERE clause cannot be used with aggregate functions. However, the HAVING clause does not allow the usage of indexes which slows the execution time of the query. So it is better to use the WHERE clause instead of the HAVING clause whenever possible.

Example: This is a query that displays the Customer FirstNames with the count of customers who have them for the customers aged more than 25. This is done using the HAVING clause.

SELECT FirstName, COUNT(*)
FROM Customers
GROUP BY FirstName
HAVING Age > 25;

It is better to use the WHERE clause in this case as it applies the condition to individual rows rather than the HAVING clause that applies the condition to the result from the GROUP BY clause.

SELECT FirstName, COUNT(*)
FROM Customers
where Age > 25
GROUP BY FirstName;

9. Use INNER JOIN instead of WHERE clause for creating joins

Using the WHERE clause for creating joins results in a Cartesian Product where the number of rows is the product of the number of rows of the two tables. This is obviously problematic for large databases as more database resources are required. So it is much better to use INNER JOIN as that only combines the rows from both tables which satisfy the required condition.

Example: This is a query that displays the CustomerID of the customers that have currently ordered products using the WHERE clause.

SELECT DISTINCT Customers.CustomerID
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID;

It is better to use the Inner join in this case to obtain the same result.

SELECT DISTINCT Customers.CustomerID
FROM Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID;

10. Avoid Wildcard Characters at the beginning of a LIKE clause pattern

Wildcard characters such as % and _ are used to filter out the results of a LIKE clause. However, they should not be used at the beginning of the pattern as this disables the database from using the index. In that case, a full table scan is required to match the pattern which consumes more database resources. So it is better to avoid the wildcard characters at the beginning of the pattern and only use them at the end if possible.

Example:

SELECT * FROM Customers
WHERE FirstName LIKE '%A%'

The above query is inefficient as it uses the wildcard character % at the beginning of the pattern. A much more efficient version of the query that avoids this is given below:

SELECT * FROM Customers
WHERE FirstName LIKE 'A%'

Suggest improvement

SQL Query to Find the Year from Date

Share your thoughts in the comments