SQL Query Optimization Techniques

Learn how to optimize the queries written in SQL to make them execute faster and more memory efficient.

By Aryan Garg, KDnuggets on March 1, 2023 in SQL

Image by Author

At the beginner level, we only focus on just writing and running the SQL queries. We do not bother about how much time it takes to execute or whether it can handle millions of records. But at the intermediate level, people expect your query to be optimized and take minimum time to execute.

Writing an optimized query in large applications with millions of records, like e-commerce platforms or banking systems, is imperative. Suppose you own an e-commerce company with more than a million products, and a customer wants to search for a product. What if the query you wrote in the backend takes more than a minute to fetch that product from the database? Will you think the customers buy products from your website?

You have to understand the importance of SQL query optimization. In this tutorial, I will show you some tips and tricks to optimize your SQL queries and make them to execute faster. The primary pre-requisite is that you must have a basic knowledge of SQL.

1. Use EXIST() instead of COUNT() to find a Specific Element in the Table

To check whether a specific element is present in the table, use the EXIST() keyword instead of the COUNT() will run the query in a more optimized way.

Using COUNT(), the query needs to count all the occurrences of that particular element which may be inefficient when the database is extensive. On the other hand, EXIST() will check only the first occurrence of that element and then stop when it finds the first occurrence. This saves a lot of time.

Also, you are only interested in finding whether a particular element is present or not. You are not interested in finding the number of occurrences. That’s why also EXIST() is better.

SELECT 
  EXISTS(
    SELECT 
      * 
    FROM 
      table 
    WHERE 
      myColumn = 'val'
  );

The above query will return 1 if at least one table row contains an entry where a column named myColumn has a value equal to val. Otherwise, it will return 0.

2. Use of Varchar instead of Char

Both char and varchar data types are used to store character strings in the table. But varchar is much more memory efficient than char.

The char datatype can only store the character string of fixed length defined. If the length of the string is less than the fixed length, then it will pad the blank spaces to make its length equal to the set length. This will unnecessarily waste memory in padding. For example,CHAR(100) will take 100 bytes of memory even if a single character is stored.

On the other hand, varchar datatype stores the character string of variable length having a length less than the maximum length specified. It does not pad the blank spaces and only takes the memory equal to the string's actual length. For example, VARCHAR(100) takes only 1 byte of memory when storing a single character.

CREATE TABLE myTable (
  id INT PRIMARY KEY, 
  charCol CHAR(10), 
  varcharCol VARCHAR(10)
);

In the above example, a table myTable is created having two columns, charCol and varcharCol having char and varchar datatypes respectively. charCol will always take 10 bytes of memory. In contrast, varcharCol takes memory equal to the actual size of the character string stored in it.

3. Avoid Subqueries in WHERE Clause

We must avoid using subqueries inside the WHERE clause to optimize an SQL query. As the subqueries can be expensive and difficult to execute when they return a large number of rows.

Instead of using the subquery, you can get the same result by using a join operation or writing a correlated subquery. A correlated subquery is a subquery in which the inner query depends on the outer query. And they are very efficient as compared to non-correlated subquery.

Below is an example to understand the difference between the two.

# Using a subquery
SELECT 
  * 
FROM 
  orders 
WHERE 
  customer_id IN (
    SELECT 
      id 
    FROM 
      customers 
    WHERE 
      country = 'INDIA'
  );

# Using a join operation
SELECT 
  orders.* 
FROM 
  orders 
  JOIN customers ON orders.customer_id = customers.id 
WHERE 
  customers.country = 'INDIA';

In the 1st example, the subquery first collects all the customer ids that belong to INDIA, and then the outer query will get all the orders of the selected customer ids. And in the 2nd example, we have achieved the same result by joining the customers and orders tables and then selecting only orders where the customers belong from INDIA.

In this way, we can optimize the query by avoiding the use of subqueries inside the WHERE clause and making them easier to read and understand.

4. Ordering JOINs from a Larger Table to a Smaller Table

Applying the JOIN operation from a larger table to a smaller table is a common SQL optimization technique. Because joining from a larger table to a smaller table will make your query to execute faster. If we apply a JOIN operation from a smaller table to a larger table, our SQL engine has to search in a larger table for matching rows. This is more resource-intensive and time-consuming. But on the other hand, if the JOIN is applied from a larger table to a smaller table, then the SQL engine has to search in a smaller table for matching rows.

Here is an example for your better understanding.

# Order table is larger than the Customer table

# Join from a larger table to a smaller table
SELECT 
  * 
FROM 
  Order 
  JOIN Customer ON Customer.id = Order.id


# Join from a smaller table to a larger table
SELECT 
  * 
FROM 
  Customer 
  JOIN Order ON Customer.id = Order.id

5. Use `regexp_like` instead of `LIKE` Clause

Unlike the LIKE clause, regexp_like is also used for pattern searching. The LIKE clause is a basic pattern-matching operator that can perform only basic operations like _ or %, which are used to match a single character or any number of characters respectively. The LIKE clause must scan the complete database to find the particular pattern, which is slow for large tables.

On the other hand, regexp_like is a more efficient, optimized, and powerful pattern-searching technique. It uses more complex regular expressions to find specific patterns in a character string. These regular expressions are more specific than simple wildcard matching because they allow you to search for the exact pattern that we are finding. Due to this, the amount of data that needs to be searched is reduced, and the query executes faster.

Please note that regexp_like may not be present in all database management systems. Its syntax and functionality may vary in other systems.

Here is an example for your better understanding.

# Query using the LIKE clause
SELECT 
  * 
FROM 
  mytable 
WHERE 
  (
    name LIKE 'A%' 
    OR name LIKE 'B%'
  );
  
# Query using regexp_like clause
SELECT 
  * 
FROM 
  mytable 
WHERE 
  regexp_like(name, '^[AB].*');

The above queries are used to find the elements that name starts with A or B. In the first example, LIKE is used to search all the names that start with A or B. A% means that the first character is A; after that, any number of characters can be present. In the second example, regexp_like is used. Inside ^[AB], ^ represents that the symbol will match at the beginning of the string, [AB] represents that the beginning character can be A or B, and .* represents all the characters after that.

Using regexp_like, the database can quickly filter out the rows that don’t match the pattern, improving performance and reducing resource usage.

Conclusion

In this article, we have discussed various methods and tips to optimize the SQL query. This article gives you a clear understanding of how to write efficient SQL queries and the importance of optimizing them. There are many more ways of optimizing the queries, like preferring the use of integer values rather than characters or using Union All instead of Union when your table doesn’t contain duplicates, etc.

Aryan Garg is a B.Tech. Electrical Engineering student, currently in the final year of his undergrad. His interest lies in the field of Web Development and Machine Learning. He have pursued this interest and am eager to work more in these directions.