How To Write Better SQL Queries: The Definitive Guide – Part 1
Most forget that SQL isn’t just about writing queries, which is just the first step down the road. Ensuring that queries are performant or that they fit the context that you’re working in is a whole other thing. This SQL tutorial will provide you with a small peek at some steps that you can go through to evaluate your query.
3. Don’t Make Queries More Complex Than They Need To Be
The data type conversions bring you to a next point: you should not over-engineer your queries. Try to keep them simple and efficient. This might seem too simple or stupid to even be a tip, especially because queries can get complex.
However, you’ll see in the examples that are mentioned in the next sections that you can easily start making simple queries more complex than they need to be.
When you use the
OR operator in your query, it’s likely that you’re not using an index.
Remember that an index is a data structure that improves the speed of the data retrieval in your database table, but it comes at a cost: there will be additional writes and additional storage space is needed to maintain the index data structure. Indexes are used to quickly locate or look up data without having to search every row in a database every time the database table is accessed. Indexes can be created by using one or more columns in a database table.
If you don’t make use of the indexes that the database includes, your query will inevitably take longer to run. That’s why it’s best to look for alternatives to using the
OR operator in your query;
Consider the following query:
SELECT driverslicensenr, name FROM Drivers WHERE driverslicensenr = 123456 OR driverslicensenr = 678910 OR driverslicensenr = 345678;
You can replace the operator by:
- A condition with IN; or
SELECT driverslicensenr, name FROM Drivers WHERE driverslicensenr IN (123456, 678910, 345678);
SELECTstatements with a
Tip: here, you need to be careful not to unnecessarily use the
UNIONoperation because you go through the same table multiple times. At the same time, you have to realize that when you use a
UNION in your query, the execution time will increase. Alternatives to the
UNION operation are: reformulating the query in such a way that all conditions are placed in one
SELECT instruction, or using an
OUTER JOIN instead of
Tip: keep in mind also here that, even though
OR -and also the other operators that will be mentioned in the following sections- likely isn’t using an index, index lookups aren’t always preferred!
When your query contains the
NOT operator, it’s likely that the index is not used, just like with the
OR operator. This will inevitably slow down your query. If you don’t know what is meant here, consider the following query:
SELECT driverslicensenr, name FROM Drivers WHERE NOT (year > 1980);
This query will definitely run slower than you would maybe expect, mainly because it’s formulated a lot more complex than it could be: in cases like this one, it’s best to look for an alternative. Consider replacing
NOT by comparison operators, such as
!>; The example above might indeed be rewritten and become something like this:
SELECT driverslicensenr, name FROM Drivers WHERE year <= 1980;
That already looks way neater, doesn’t it?
AND operator is another operator that doesn’t make use of the index and that can slow your query down if used in an overly complex and inefficient way, like in the example below:
SELECT driverslicensenr, name FROM Drivers WHERE year >= 1960 AND year <= 1980;
It’s better to rewrite this query and use
SELECT driverslicensenr, name FROM Drivers WHERE year BETWEEN 1960 AND 1980;
The ANY and ALL Operators
ALL operators are some that you should be careful with because, by including these into your queries, the index won’t be used. Alternatives that will come in handy here are aggregation functions like
Tip: in cases where you make use of the proposed alternatives, you should be aware of the fact that all aggregation functions like
MAXover many rows can result in a long-running query. In such cases, you can try to either minimize the amount of rows to handle or pre-calculate these values. You see once again that it’s important to be aware of your environment, your query goal, … when you make decisions on which query to use!
Isolate Columns in Conditions
Also in cases where a column is used in a calculation or in a scalar function, the index isn’t used. A possible solution would be to simply isolate the specific column so that it no longer is a part of the calculation or the function. Consider the following example:
SELECT driverslicensenr, name FROM Drivers WHERE year + 10 = 1980;
This is looks funky, huh? Instead, try to reconsider the calculation and rewrite the query to something like this:
SELECT driverslicensenr, name FROM Drivers WHERE year = 1970;
4. No Brute Force
This last tip actually means that you shouldn’t try to restrict the query too much because it can affect its performance. This is especially true for joins and for the
- The Order Of Tables
When you join two tables, it can be important to consider the order of the tables in your join. If you notice that one table is considerably larger than the other one, you might want to rewrite your query so that the biggest table is placed last in the join.
- Redundant Conditions on Joins
When you add too many conditions to your joins, you basically obligate SQL to choose a certain path. It could be, though, that this path isn’t always the more performant one.
HAVING clause was originally added to SQL because the
WHERE keyword could not be used with aggregate functions.
HAVING is typically used with the
GROUP BY clause to restrict the groups of returned rows to only those that meet certain conditions. However, if you use this clause in your query, the index is not used, which -as you already know- can result in a query that doesn’t really perform all that well.
If you’re looking for an alternative, consider using the
WHERE clause. Consider the following queries:
SELECT state, COUNT(*) FROM Drivers WHERE state IN ('GA', 'TX') GROUP BY state ORDER BY state
SELECT state, COUNT(*) FROM Drivers GROUP BY state HAVING state IN ('GA', 'TX') ORDER BY state
The first query uses the
WHERE clause to restrict the number of rows that need to be summed, whereas the second query sums up all the rows in the table and then uses
HAVING to throw away the sums it calculated. In these types of cases, the alternative with the
WHERE clause is obviously the better one, as you don’t waste any resources.
You see that this is not about limiting the result set, rather about limiting the intermediate number of records within a query.
Note that the difference between these two clauses lies in the fact that the
WHERE clause introduces a condition on individual rows, while the
HAVINGclause introduces a condition on aggregations or results of a selection where a single result, such as
SUM,… has been produced from multiple rows.
You see, evaluating the quality, writing and rewriting of queries is not an easy job when you take into account that they need to be as performant as possible; Avoiding anti-patterns and considering alternatives will also be a part of responsibility when you write queries that you want to run on databases in a professional environment.
This list was just a small overview of some anti-patterns and tips that will hopefully help beginners; If you’d like to get an insight into what more senior developers consider the most frequent anti-patterns, check out this discussion.
Bio: Karlijn Willems is a data science journalist and writes for the DataCamp community, focusing on data science education, the latest news and the hottest trends. She holds degrees in Literature and Linguistics and Information Management.
Original. Reposted with permission.