10 Essential SQL Commands for Data Science

Learn SQL commands for filtering, string operations, alias, joining tables, if-else statements, and grouping.



10 Essential SQL Commands for Data Science
Image by Author

 

It is a new day to learn SQL commands that will help you in your data science career. You will use SQL query to extract, save, and modify the database to your requirements. 

So, what is stopping you from learning the most used command for extracting the data and performing a quick analysis on a SQL database?  

In this post, we will learn essential SQL commands with code examples using Online SQL Editor. It allows users to run online queries without any issues. The platform comes with an editor and 3 SQL tables for practice. The database consists of three tables: Customers, Orders, and Shippings.

  1. SELECT
  2. DISTINCT
  3. WHERE
  4. LIKE
  5. ORDER BY
  6. AS
  7. JOIN
  8. UNION
  9. GROUP BY
  10. CASE

 

SELECT

 

The SELECT command allows us to extract the selective or all columns from the table. It is the basic building block of SQL queries. 

In our case, we are displaying all of the columns using `*` from the Customer table.

SELECT *
FROM Customers;


customer_id first_name last_name age country
1 John Doe 31 USA
2 Robert Luna 22 USA
3 David Robinson 22 UK
4 John Reinhardt 25 UK
5 Betty Doe 28 UAE

 

In the second example, we are displaying only three columns from the Customers table. You can type column names separated by commas “,” and end it with FROM command and the table name. 

SELECT first_name,
       last_name,
       country
FROM Customers;


first_name last_name country
John Doe USA
Robert Luna USA
David Robinson UK
John Reinhardt UK
Betty Doe UAE

 

DISTINCT

 

The DISTINCT is used to display unique values from the column. Instead of showing two Johns, it is now showing one John. 

SELECT DISTINCT first_name
FROM Customers;


first_name
John
Robert
David
Betty

 

WHERE

 

The WHERE command is used for conditional and filtering. We will be filtering the table to show customers with age greater than 25. Apart from greater than you can also use lesser than <, lesser than or equal to <=, greater than or equal to >=, and equal =.

SELECT *
FROM Customers
WHERE age > 25;


The results show that we have only two customers with the age greater than 25. 

customer_id first_name last_name age country
1 John Doe 31 USA
5 Betty Doe 28 UAE

 

You can also combine two or more than two using AND, OR, BETWEEN, and IN. In our case, we are looking for customers based in the USA with the age greater than 25.  

SELECT *
FROM Customers
WHERE age > 25
  AND country == 'USA';


customer_id first_name last_name age country
1 John Doe 31 USA

 

LIKE

 

The LIKE command is used for string filtering. You will provide the expression and it will use it to find the values that are matching the expression. For example, if you want all the names that start with J use “J%”. To find the name that ends with J we will use “%J”. It is pretty simple.  

SELECT *
FROM Customers
WHERE first_name LIKE "J%";


customer_id first_name last_name age country
1 John Doe 31 USA
4 John Reinhardt 25 UK

 

Let’s try finding the customer whose first_name starts with “J” and last_name ends with “e”. You can check out all of the other ways to create expressions for your particular situation by reading the SQL LIKE Operator Examples tutorial. 

SELECT *
FROM Customers
WHERE first_name LIKE "J%"
  OR last_name LIKE "%e";


customer_id first_name last_name age country
1 John Doe 31 USA
4 John Reinhardt 25 UK
5 Betty Doe 28 UAE

 

ORDER BY

 

The ORDER BY is used for sorting the query results. We have used ORDER BY to sort the result based on first_name in ascending order. 

SELECT *
FROM Customers
ORDER BY first_name;


customer_id first_name last_name age country
5 Betty Doe 28 UAE
3 David Robinson 22 UK
1 John Doe 31 USA
4 John Reinhardt 25 UK
2 Robert Luna 22 USA

 

You can try to sort the results in decreasing order by using DESC at the end. 

SELECT *
FROM Customers
ORDER BY first_name DESC;


customer_id first_name last_name age country
2 Robert Luna 22 USA
1 John Doe 31 USA
4 John Reinhardt 25 UK
3 David Robinson 22 UK
5 Betty Doe 28 UAE

 

AS

 

The AS command is used for creating aliases or renaming the column names. In the example below, we are renaming “customer_id” with “ID” and “first_name” with “Name”. 

SELECT customer_id AS ID,
       first_name AS Name
FROM Customers;


ID Name
1 John
2 Robert
3 David
4 John
5 Betty

 

JOIN

 

You will be joining multiple tables for data analysis and it is quite easy. Just use LEFT JOIN, INNER JOIN, RIGHT JOIN, or FULL JOIN after the first table. Write the second table name followed by ON to join tables on certain columns. In our case, both tables have a common column called customer_id. We will join the table using both column names with “==” sign. 

SELECT first_name,
       item,
       amount
FROM Customers
LEFT JOIN Orders 
ON Customers.customer_id == Orders.customer_id


first_name item amount
John Keyboard 400
Robert Mousepad 250
David Monitor 12000
John Keyboard 400
John Mouse 300
Betty

 

UNION

 

UNION is used for displaying two query results together. In the example, we are displaying “first_name” as “Name_item” from Customer table and “item” from Orders table together. 

Note: make sure both queries are displaying the same number of columns. 

SELECT first_name AS Name_item
FROM Customers
UNION
SELECT item
FROM Orders


As you can see, the results have both first_name from the Customers and item from Orders. The column is sorted from A-Z.   

Name_item
Betty
David
John
Keyboard
Monitor
Mouse
Mousepad
Robert

 

GROUP BY

 

GROUP By command is frequently used in data analytical tasks. You can group any columns and get a better understanding of data distribution. 

The GROUP BY requires aggregate functions:

  • COUNT: total number of rows
  • SUM: sum of all the values
  • MAX: maximum value
  • MIN: minimum value
  • AVG: average value

You can use aggregate functions and GROUP BY to combine values into  categories within the columns. In the example below, we will SUM the amount and group it by item names. 

SELECT item,
       SUM(amount)
FROM Orders
GROUP BY item


A single keyboard cost 400 and we have sold 2 keyboards. It has summed it to display 800. I know it is a simple example, but this is a start. 

item SUM(amount)
Keyboard 800
Monitor 12000
Mouse 300
Mousepad 250

 

CASE

 

The CASE command is similar to if-else statements in Python or any other language. We will use it to create categories based on the conditions. 

If the amount is less than 1000 return ‘Low’ else return ‘High’. 

The command starts with CASE and ends with END and the column name.  

If is replaced with WHEN and else is replaced with ELSE.

SELECT item,
       amount,
       CASE
           WHEN amount < 1000 THEN 'Low'
           ELSE 'High'
       END AS Priority
FROM Orders;


item amount Priority
Keyboard 400 Low
Mouse 300 Low
Monitor 12000 High
Keyboard 400 Low
Mousepad 250 Low

 
 
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master's degree in Technology Management and a bachelor's degree in Telecommunication Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.