KDnuggets Home » News » 2015 » Aug » Tutorials, Overviews, How-Tos » A Beginner’s Guide to SQL ( 15:n29 )

A Beginner’s Guide to SQL


SQL is one of the core skills of a data engineer and data scientist. This mini-tutorial explains the four fundamental SQL functions: Create, Read, Update, and Delete using a fun example of movie quotes database.



By Udemy (Original.)

Introduction

Maybe it is hard to believe, but SQL is used everywhere around us. Every application that is manipulating any kind of data needs to store that data somewhere. Whether it’s Big Data or just a table with few simple rows, a government or a small startups, or a big database that spans over multiple servers or a mobile phone that runs its own small database, SQL is ubiquitous.

But what is a SQL? SQL stands for Structured Query Language, and usually is pronounced as “ess-que-el”. SQL is the language of databases, and is specifically built to communicate with databases. SQL is a simple language and is similar to the English language, as commands are structured almost like English sentences. Those sentences are structured like declared statements, thus SQL is also called a declarative language.

image02

Why learn a whole new language when there are many available tools for writing SQL queries visually? When working with some SQL tools, it is important to know SQL language and to understand what the visual tools are doing, and why. Sometimes there are needs to write few SQL statements manually, not only because it is the fastest way but because it is more powerful and often the only way to achieve targeted goals.

What is a Database

We have mentioned that SQL is the language of databases. What is a database? Databases are a storage mechanism designed to offer access to stored information and their manipulation. Information in the database is stored in objects called tables. Tables are uniquely identified by their names and are comprised of columns and rows. Columns contain the column name, column data type, and any other attributes for the column. Rows contain the records or data for the columns. Many of the tables in a database will have relationships, or links between them, either in a one-to-one or a one-to-many relationship. This is why this kind of databases is called relational model databases.

The easiest way to describe a database structure is by comparing it with an Excel spreadsheet, with which many are familiar. A database is one spreadsheet file. Sheets in the spreadsheet are tables, each one with a given name. Columns and rows are the same in both. SQL language can be used to create new tables, or alter existing ones, and to fetch data, update data, or delete data.

Say we have a big collection of famous movie quotes stored in random separate text files. Even if we are more organized and use Excel spreadsheet, the problem we have is the same. Having quotes stored in that way, we can’t quickly get all quotes from one movie, or get all quotes from one character. If we move our text files or spreadsheet into a database, and create tables with relations between them, all this becomes possible. What does relational really mean? The relational model is a way to describe the data and the relationship between those data entities. In our example, a relation is a connection between every single quote with a table where movie titles are stored, or all characters are stored.

Here is a bit simplified example, with only one sample table called “Movie_quotes”. It has four columns, one for quote text, one for character that said that quote, and one for a movie and a year. We have collected eight movie quotes, and our sample table looks like this:

Movie_quotes
Q_TEXT Q_CHARACTER Q_MOVIE Q_YEAR
I’ll be back The Terminator The Terminator 1984
I find your lack of faith disturbing. Darth Vader Star Wars 1977
It’s a trap! Admiral Ackbar Star Wars 1983
Never tell me the odds. Han Solo Star Wars 1980
Do. Or do not. There is no try. Yoda Star Wars 1980
Stupid is as stupid does. Forrest Gump Forrest Gump 1994
My mama always said: Life was like a box of chocolates.You never know what you’re gonna get. Forrest Gump Forrest Gump 1994
Run, Forrest! Run! Jenny Curran Forrest Gump 1994

When talking about databases, it is worth mentioning that there is now a whole new set of databases and a sort of movement among people who need to store data, that’s called NoSQL. They are documented based systems, and while they are becoming very popular, there are still a large number of relational model databases in use today that are using SQL language. Even the NoSQL databases have some sort of query language, and many of them (because almost all of them were invented after SQL) have some similarity to SQL.

Four Basic SQL Operations, or CRUD

There are many SQL commands, but there are four general SQL operations that can do things to tables and its data:

  • Create – Filling data into tables.
  • Read – Query data out of a table.
  • Update – Change data already in a table.
  • Delete – Remove data from the table.

First letters of these basic SQL operations give the acronym “CRUD”, and they are considered as a fundamental set of four basic functions or features that every database must have.

By covering this for basic features, we will cover basic and most important SQL commands: `CREATE`, `INSERT`, `SELECT`, `UPDATE`, `DELETE`, and `DROP`.

Creating Data

F, we need to create our table in the database. To create a new table, `CREATE TABLE` statement is used. The format of a simple and `CREATE TABLE` statement syntax is as follows:

CREATE TABLE table_name
(column_1 data_type,
column_2 data_type,
column_3 data_type);

First are the keywords `CREATE TABLE` followed by the table name. This is a perfect example of the SQL simplicity and similarity with English language. Keywords are followed by an open parenthesis, where additional parameters are defined, column name and column data type, followed by a closing parenthesis. It is important to note that all SQL statements should end with a `;`.

There are few rules that need to be followed. The table and column names must start with a letter and can be followed by letters, numbers, or underscores. They must not exceed a total of 30 characters in length. SQL reserved keywords as names for tables or column names (such as `select`, `create`, `insert`, etc.) are forbidden.

In our example, the most simple column names would be `TEXT`, `CHARACTER`, `MOVIE`, and `YEAR`. But the problem is that most of those names are reserved words. To avoid any possible conflicts, in our example we will create column names with a prefix `Q_`. Data types differentiate between different databases, but here are the most common ones:

  • `char(size)` – Fixed-length character string, with parameter size specified in parenthesis.
  • `varchar(size)` – Variable-length character string, with parameter size specified in parenthesis.
  • `number(size)` – Number value with a max number of column digits specified in parenthesis.
  • `date` – Date value.
  • `number(size, d)` –  Number value with a maximum number of digits of `size` total, with a maximum number of `d` digits to the right of the decimal.

Data types specify what type of data can be stored in that particular column. If a column called `Q_CHARACTER` is to be used to hold names, then that particular column should have a `varchar` (variable-length character) data type. A column that will hold year of the movie will be type `number`, in our example column `Q_YEAR`. Final SQL command that will create our table with desired table structure is as follows:


CREATE TABLE Movie_quotes
(‘Q_TEXT’ varchar(200),
‘Q_CHARACTER’ varchar(20),
‘Q_MOVIE’ varchar(20),
‘Q_YEAR’ number(4));

Result of this SQL command will create an empty table with columns:

  • `Q_TEXT` that can accept a 200 character long string.
  • `Q_CHARACTER` that can accept a 20 character long string.
  • `Q_MOVIE` that can accept a 20 character long string
  • `Q_YEAR`that can accept four numbers for a year.

image03


Sign Up