Practising SQL without your own database
SQL is a very important skill for data analysts and data scientists. However, when you are just starting out learning in the field, how can you practice querying with SQL if you don’t have any data stored in a database?
By Hui XiangChua, Data Scientist.
Many organisations have data stored in databases, and SQL is a querying language commonly used to extract the data from these databases. In fact, many large organisations such as Facebook and Amazon have included SQL as a component of their technical skill tests. This is because being able to put together the relevant data is important before performing any necessary data analysis.
One challenge to practising SQL is that we need databases to begin with, which is something we often do not have. However, data.world allows us to do exactly that without having our own database. We can perform SQL queries on publicly available datasets on data.world directly. It is free to create an account on data.world, and you do not need to incur any costs performing SQL queries!
For those not familiar, data.world is a platform that offers a lot of open data relating to many different domains and industries (you can also upload your own datasets). Their mission is to:
- Build the most meaningful, collaborative, and abundant data resource in the world in order to maximize data’s societal problem-solving utility;
- Advocate publicly for improving the adoption, usability, and proliferation of open data and linked data;
- Serve as an accessible historical repository of the world’s data.
Here, I outline an example using a dataset on Singapore’s Covid-19 trends. The query feature is sometimes hidden under the dropdown of the three dots button, or it might appear at the top right corner of the dataset represented with a >_ symbol.
The following SQL computes the moving 7-day average of the number of confirmed cases, with the output ordered by descending date order. A link to the query can be found here.
On the other hand, the SQL below computes the percentage increase day-on-day in the cumulative number of individuals who’ve completed their vaccination, with the output ordered by descending date order. This requires a self-join, i.e., joining a table with itself. I’ve made use of the DATE_SUB function as part of the self-join to get the data from the previous day against the current day. Clicking on the text DATE_SUB also returns more information about the function and the parameters necessary. A link to the query can be found here.
There’s also an SQL tutorial on data.world that covers basic, intermediate, and advanced SQL concepts. Hence it is a good open-source platform to learn and practise SQL.
On a separate note, if you’re in a hiring position and exploring alternatives to how SQL tests can be conducted, this might also be a good platform to consider.
- Data context and how to get started with understanding COVID-19 data
- How to Get Practical Data Science Experience to be Career-Ready
- Pandas vs SQL: When Data Scientists Should Use Each Tool