7 Steps to Mastering SQL for Data Science

Follow these 7 steps to go from SQL data science newbie to seasoned practitioner quickly. No nonsense, just the necessities.



Step 4: Creating, Dropping, Deleting

 
Our second set of commands include those used to CREATE and DROP tables, as well as to DELETE records. With an understanding of this growing collection of commands, suddenly much of what could be referred to as regular data management and query is attainable (with practice, of course).

Create

Drop

Delete

Step 5: Views and Joins

 
On to some slightly more advanced SQL topics. First, we have a look at views, which can be thoughts of as virtual tables populated by the results of queries, useful for a number of different scenarios including application development, data security, and eased data sharing.

First, get the details on what views are:

For beginner SQL practitioners in the data science realm, I would call views a "nice to have." Since the focus is likely to be more on data exploration, I would say that the next topic, joins, is a "must have."

Read about what joins are, and why they are important (and get some examples):

Joins come in different flavors, and likely one of the more complex topics you will cover while learning SQL is getting them straight. That's really more of a testament to the ease of SQL than the actual difficulty of learning about joins.

Watch a video explaining inner joins, then check out a video on outer and cross joins:


Visual representation of SQL joins

A visual representation of SQL joins.

See this visual representation of SQL joins:

Finally, this tutorial reviews both joins and views:

Step 6: SQL for Data Science

 
OK, so you've made some headway with learning SQL. You can query some data, create and manage some tables, make a view if you had to, and even use joins in some more complex querying. But why are you learning all this again? For data science, right? Let's take a bit of a break from the technical to get an idea of this very topic.

Here are a couple of discussions regarding what SQL can be used for in data science:

Step 7: SQL Integration with Python, R

 
Often we will find that SQL is embedded in, or called from within, software written in other programming languages as part of a larger system. For example, in web development you may find PHP or Ruby or some other language making calls to a database via SQL to input, modify, or retrieve application-related data. In data science, you may see SQL being called as part of some application written in, perhaps, Python or R. To that end, having an understanding of how these languages play with SQL is not a bad idea.

SQLite Python
SQL queries with Python and SQLite.

Python

To gain an understanding of how Python and SQL can work together, read Sebastian Raschka's fantastic and detailed post on using SQLite in Python:

R

Here are a pair of resources for achieving R and SQL integration, which approach the topic from different directions:

Further

If you feel like a relentless regimen of reading about an SQL topic and following up with exercises, I recommend going through the following (freely-available) book:

Related: