How To Write Better SQL Queries: The Definitive Guide – Part 1
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.
3. Don’t Make Queries More Complex Than They Need To Be
The data type conversions bring you to a next point: you should not over-engineer your queries. Try to keep them simple and efficient. This might seem too simple or stupid to even be a tip, especially because queries can get complex.
However, you’ll see in the examples that are mentioned in the next sections that you can easily start making simple queries more complex than they need to be.
The OR
Operator
When you use the OR
operator in your query, it’s likely that you’re not using an index.
Remember that an index is a data structure that improves the speed of the data retrieval in your database table, but it comes at a cost: there will be additional writes and additional storage space is needed to maintain the index data structure. Indexes are used to quickly locate or look up data without having to search every row in a database every time the database table is accessed. Indexes can be created by using one or more columns in a database table.
If you don’t make use of the indexes that the database includes, your query will inevitably take longer to run. That’s why it’s best to look for alternatives to using the OR
operator in your query;
Consider the following query:
SELECT driverslicensenr, name
FROM Drivers
WHERE driverslicensenr = 123456
OR driverslicensenr = 678910
OR driverslicensenr = 345678;
You can replace the operator by:
- A condition with IN; or
SELECT driverslicensenr, name
FROM Drivers
WHERE driverslicensenr IN (123456, 678910, 345678);
- Two
SELECT
statements with aUNION
.
Tip: here, you need to be careful not to unnecessarily use the UNION
operation because you go through the same table multiple times. At the same time, you have to realize that when you use a UNION
in your query, the execution time will increase. Alternatives to the UNION
operation are: reformulating the query in such a way that all conditions are placed in one SELECT
instruction, or using an OUTER JOIN
instead of UNION
.
Tip: keep in mind also here that, even though OR
-and also the other operators that will be mentioned in the following sections- likely isn’t using an index, index lookups aren’t always preferred!
The NOT
Operator
When your query contains the NOT
operator, it’s likely that the index is not used, just like with the OR
operator. This will inevitably slow down your query. If you don’t know what is meant here, consider the following query:
SELECT driverslicensenr, name
FROM Drivers
WHERE NOT (year > 1980);
This query will definitely run slower than you would maybe expect, mainly because it’s formulated a lot more complex than it could be: in cases like this one, it’s best to look for an alternative. Consider replacing NOT
by comparison operators, such as >
, <>
or !>
; The example above might indeed be rewritten and become something like this:
SELECT driverslicensenr, name FROM Drivers WHERE year <= 1980;
That already looks way neater, doesn’t it?
The AND
Operator
The AND
operator is another operator that doesn’t make use of the index and that can slow your query down if used in an overly complex and inefficient way, like in the example below:
SELECT driverslicensenr, name
FROM Drivers
WHERE year >= 1960 AND year <= 1980;
It’s better to rewrite this query and use BETWEEN
operator:
SELECT driverslicensenr, name
FROM Drivers
WHERE year BETWEEN 1960 AND 1980;
The ANY and ALL Operators
Also the ALL
and ALL
operators are some that you should be careful with because, by including these into your queries, the index won’t be used. Alternatives that will come in handy here are aggregation functions like MIN
or MAX
.
Tip: in cases where you make use of the proposed alternatives, you should be aware of the fact that all aggregation functions like SUM
, AVG
, MIN
, MAX
over many rows can result in a long-running query. In such cases, you can try to either minimize the amount of rows to handle or pre-calculate these values. You see once again that it’s important to be aware of your environment, your query goal, … when you make decisions on which query to use!
Isolate Columns in Conditions
Also in cases where a column is used in a calculation or in a scalar function, the index isn’t used. A possible solution would be to simply isolate the specific column so that it no longer is a part of the calculation or the function. Consider the following example:
SELECT driverslicensenr, name
FROM Drivers
WHERE year + 10 = 1980;
This is looks funky, huh? Instead, try to reconsider the calculation and rewrite the query to something like this:
SELECT driverslicensenr, name
FROM Drivers
WHERE year = 1970;
4. No Brute Force
This last tip actually means that you shouldn’t try to restrict the query too much because it can affect its performance. This is especially true for joins and for the HAVING
clause.
Joins
- The Order Of Tables
When you join two tables, it can be important to consider the order of the tables in your join. If you notice that one table is considerably larger than the other one, you might want to rewrite your query so that the biggest table is placed last in the join.
- Redundant Conditions on Joins
When you add too many conditions to your joins, you basically obligate SQL to choose a certain path. It could be, though, that this path isn’t always the more performant one.
The HAVING
Clause
The HAVING
clause was originally added to SQL because the WHERE
keyword could not be used with aggregate functions. HAVING
is typically used with the GROUP BY
clause to restrict the groups of returned rows to only those that meet certain conditions. However, if you use this clause in your query, the index is not used, which -as you already know- can result in a query that doesn’t really perform all that well.
If you’re looking for an alternative, consider using the WHERE
clause. Consider the following queries:
SELECT state, COUNT(*)
FROM Drivers
WHERE state IN ('GA', 'TX')
GROUP BY state
ORDER BY state
SELECT state, COUNT(*)
FROM Drivers
GROUP BY state
HAVING state IN ('GA', 'TX')
ORDER BY state
The first query uses the WHERE
clause to restrict the number of rows that need to be summed, whereas the second query sums up all the rows in the table and then uses HAVING
to throw away the sums it calculated. In these types of cases, the alternative with the WHERE
clause is obviously the better one, as you don’t waste any resources.
You see that this is not about limiting the result set, rather about limiting the intermediate number of records within a query.
Note that the difference between these two clauses lies in the fact that the WHERE
clause introduces a condition on individual rows, while the HAVING
clause introduces a condition on aggregations or results of a selection where a single result, such as MIN
, MAX
, SUM
,… has been produced from multiple rows.
You see, evaluating the quality, writing and rewriting of queries is not an easy job when you take into account that they need to be as performant as possible; Avoiding anti-patterns and considering alternatives will also be a part of responsibility when you write queries that you want to run on databases in a professional environment.
This list was just a small overview of some anti-patterns and tips that will hopefully help beginners; If you’d like to get an insight into what more senior developers consider the most frequent anti-patterns, check out this discussion.
Bio: Karlijn Willems is a data science journalist and writes for the DataCamp community, focusing on data science education, the latest news and the hottest trends. She holds degrees in Literature and Linguistics and Information Management.
Original. Reposted with permission.
Related: