How To Write Better SQL Queries: The Definitive Guide – Part 1

Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.



3. Don’t Make Queries More Complex Than They Need To Be

 
The data type conversions bring you to a next point: you should not over-engineer your queries. Try to keep them simple and efficient. This might seem too simple or stupid to even be a tip, especially because queries can get complex.

However, you’ll see in the examples that are mentioned in the next sections that you can easily start making simple queries more complex than they need to be.

The OR Operator

When you use the OR operator in your query, it’s likely that you’re not using an index.

Remember that an index is a data structure that improves the speed of the data retrieval in your database table, but it comes at a cost: there will be additional writes and additional storage space is needed to maintain the index data structure. Indexes are used to quickly locate or look up data without having to search every row in a database every time the database table is accessed. Indexes can be created by using one or more columns in a database table.

If you don’t make use of the indexes that the database includes, your query will inevitably take longer to run. That’s why it’s best to look for alternatives to using the OR operator in your query;

Consider the following query:

SELECT driverslicensenr, name 
FROM Drivers 
WHERE driverslicensenr = 123456 
OR driverslicensenr = 678910 
OR driverslicensenr = 345678;


You can replace the operator by:

  • A condition with IN; or
SELECT driverslicensenr, name 
FROM Drivers 
WHERE driverslicensenr IN (123456, 678910, 345678);


  • Two SELECT statements with a UNION.

Tip: here, you need to be careful not to unnecessarily use the UNIONoperation because you go through the same table multiple times. At the same time, you have to realize that when you use a UNION in your query, the execution time will increase. Alternatives to the UNION operation are: reformulating the query in such a way that all conditions are placed in one SELECT instruction, or using an OUTER JOIN instead of UNION.

Tip: keep in mind also here that, even though OR -and also the other operators that will be mentioned in the following sections- likely isn’t using an index, index lookups aren’t always preferred!

The NOT Operator

When your query contains the NOT operator, it’s likely that the index is not used, just like with the OR operator. This will inevitably slow down your query. If you don’t know what is meant here, consider the following query:

SELECT driverslicensenr, name 
FROM Drivers 
WHERE NOT (year > 1980);


This query will definitely run slower than you would maybe expect, mainly because it’s formulated a lot more complex than it could be: in cases like this one, it’s best to look for an alternative. Consider replacing NOT by comparison operators, such as ><> or !>; The example above might indeed be rewritten and become something like this:

SELECT driverslicensenr, name FROM Drivers WHERE year <= 1980;


That already looks way neater, doesn’t it?

The AND Operator

The AND operator is another operator that doesn’t make use of the index and that can slow your query down if used in an overly complex and inefficient way, like in the example below:

SELECT driverslicensenr, name 
FROM Drivers 
WHERE year >= 1960 AND year <= 1980;


It’s better to rewrite this query and use BETWEEN operator:

SELECT driverslicensenr, name 
FROM Drivers 
WHERE year BETWEEN 1960 AND 1980;


The ANY and ALL Operators

Also the ALL and ALL operators are some that you should be careful with because, by including these into your queries, the index won’t be used. Alternatives that will come in handy here are aggregation functions like MINor MAX.

Tip: in cases where you make use of the proposed alternatives, you should be aware of the fact that all aggregation functions like SUMAVGMINMAXover many rows can result in a long-running query. In such cases, you can try to either minimize the amount of rows to handle or pre-calculate these values. You see once again that it’s important to be aware of your environment, your query goal, … when you make decisions on which query to use!

Isolate Columns in Conditions

Also in cases where a column is used in a calculation or in a scalar function, the index isn’t used. A possible solution would be to simply isolate the specific column so that it no longer is a part of the calculation or the function. Consider the following example:

SELECT driverslicensenr, name 
FROM Drivers 
WHERE year + 10 = 1980;


This is looks funky, huh? Instead, try to reconsider the calculation and rewrite the query to something like this:

SELECT driverslicensenr, name 
FROM Drivers 
WHERE year = 1970;


 

4. No Brute Force

 
This last tip actually means that you shouldn’t try to restrict the query too much because it can affect its performance. This is especially true for joins and for the HAVING clause.

Joins

  • The Order Of Tables

When you join two tables, it can be important to consider the order of the tables in your join. If you notice that one table is considerably larger than the other one, you might want to rewrite your query so that the biggest table is placed last in the join.

  • Redundant Conditions on Joins

When you add too many conditions to your joins, you basically obligate SQL to choose a certain path. It could be, though, that this path isn’t always the more performant one.

The HAVING Clause

The HAVING clause was originally added to SQL because the WHERE keyword could not be used with aggregate functions. HAVING is typically used with the GROUP BY clause to restrict the groups of returned rows to only those that meet certain conditions. However, if you use this clause in your query, the index is not used, which -as you already know- can result in a query that doesn’t really perform all that well.

If you’re looking for an alternative, consider using the WHERE clause. Consider the following queries:

SELECT state, COUNT(*) 
FROM Drivers 
WHERE state IN ('GA', 'TX') 
GROUP BY state 
ORDER BY state


SELECT state, COUNT(*) 
FROM Drivers 
GROUP BY state 
HAVING state IN ('GA', 'TX') 
ORDER BY state


The first query uses the WHERE clause to restrict the number of rows that need to be summed, whereas the second query sums up all the rows in the table and then uses HAVING to throw away the sums it calculated. In these types of cases, the alternative with the WHERE clause is obviously the better one, as you don’t waste any resources.

You see that this is not about limiting the result set, rather about limiting the intermediate number of records within a query.

Note that the difference between these two clauses lies in the fact that the WHERE clause introduces a condition on individual rows, while the HAVINGclause introduces a condition on aggregations or results of a selection where a single result, such as MINMAXSUM,… has been produced from multiple rows.

You see, evaluating the quality, writing and rewriting of queries is not an easy job when you take into account that they need to be as performant as possible; Avoiding anti-patterns and considering alternatives will also be a part of responsibility when you write queries that you want to run on databases in a professional environment.

This list was just a small overview of some anti-patterns and tips that will hopefully help beginners; If you’d like to get an insight into what more senior developers consider the most frequent anti-patterns, check out this discussion.

 
Bio: Karlijn Willems is a data science journalist and writes for the DataCamp community, focusing on data science education, the latest news and the hottest trends. She holds degrees in Literature and Linguistics and Information Management.

Original. Reposted with permission.

Related: