10 Essential SQL Commands for Data Science
Learn SQL commands for filtering, string operations, alias, joining tables, if-else statements, and grouping.
Image by Author
It is a new day to learn SQL commands that will help you in your data science career. You will use SQL query to extract, save, and modify the database to your requirements.
So, what is stopping you from learning the most used command for extracting the data and performing a quick analysis on a SQL database?
In this post, we will learn essential SQL commands with code examples using Online SQL Editor. It allows users to run online queries without any issues. The platform comes with an editor and 3 SQL tables for practice. The database consists of three tables: Customers, Orders, and Shippings.
- ORDER BY
- GROUP BY
The SELECT command allows us to extract the selective or all columns from the table. It is the basic building block of SQL queries.
In our case, we are displaying all of the columns using `*` from the Customer table.
SELECT * FROM Customers;
In the second example, we are displaying only three columns from the Customers table. You can type column names separated by commas “,” and end it with FROM command and the table name.
SELECT first_name, last_name, country FROM Customers;
The DISTINCT is used to display unique values from the column. Instead of showing two Johns, it is now showing one John.
SELECT DISTINCT first_name FROM Customers;
The WHERE command is used for conditional and filtering. We will be filtering the table to show customers with age greater than 25. Apart from greater than you can also use lesser than <, lesser than or equal to <=, greater than or equal to >=, and equal =.
SELECT * FROM Customers WHERE age > 25;
The results show that we have only two customers with the age greater than 25.
You can also combine two or more than two using AND, OR, BETWEEN, and IN. In our case, we are looking for customers based in the USA with the age greater than 25.
SELECT * FROM Customers WHERE age > 25 AND country == 'USA';
The LIKE command is used for string filtering. You will provide the expression and it will use it to find the values that are matching the expression. For example, if you want all the names that start with J use “J%”. To find the name that ends with J we will use “%J”. It is pretty simple.
SELECT * FROM Customers WHERE first_name LIKE "J%";
Let’s try finding the customer whose first_name starts with “J” and last_name ends with “e”. You can check out all of the other ways to create expressions for your particular situation by reading the SQL LIKE Operator Examples tutorial.
SELECT * FROM Customers WHERE first_name LIKE "J%" OR last_name LIKE "%e";
The ORDER BY is used for sorting the query results. We have used ORDER BY to sort the result based on first_name in ascending order.
SELECT * FROM Customers ORDER BY first_name;
You can try to sort the results in decreasing order by using DESC at the end.
SELECT * FROM Customers ORDER BY first_name DESC;
The AS command is used for creating aliases or renaming the column names. In the example below, we are renaming “customer_id” with “ID” and “first_name” with “Name”.
SELECT customer_id AS ID, first_name AS Name FROM Customers;
You will be joining multiple tables for data analysis and it is quite easy. Just use LEFT JOIN, INNER JOIN, RIGHT JOIN, or FULL JOIN after the first table. Write the second table name followed by ON to join tables on certain columns. In our case, both tables have a common column called customer_id. We will join the table using both column names with “==” sign.
SELECT first_name, item, amount FROM Customers LEFT JOIN Orders ON Customers.customer_id == Orders.customer_id
UNION is used for displaying two query results together. In the example, we are displaying “first_name” as “Name_item” from Customer table and “item” from Orders table together.
Note: make sure both queries are displaying the same number of columns.
SELECT first_name AS Name_item FROM Customers UNION SELECT item FROM Orders
As you can see, the results have both first_name from the Customers and item from Orders. The column is sorted from A-Z.
GROUP By command is frequently used in data analytical tasks. You can group any columns and get a better understanding of data distribution.
The GROUP BY requires aggregate functions:
- COUNT: total number of rows
- SUM: sum of all the values
- MAX: maximum value
- MIN: minimum value
- AVG: average value
You can use aggregate functions and GROUP BY to combine values into categories within the columns. In the example below, we will SUM the amount and group it by item names.
SELECT item, SUM(amount) FROM Orders GROUP BY item
A single keyboard cost 400 and we have sold 2 keyboards. It has summed it to display 800. I know it is a simple example, but this is a start.
The CASE command is similar to if-else statements in Python or any other language. We will use it to create categories based on the conditions.
If the amount is less than 1000 return ‘Low’ else return ‘High’.
The command starts with CASE and ends with END and the column name.
If is replaced with WHEN and else is replaced with ELSE.
SELECT item, amount, CASE WHEN amount < 1000 THEN 'Low' ELSE 'High' END AS Priority FROM Orders;
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.