SQL (Structured Query Language) is used to extract information from data stored in a Relational Database (RLDB). A RLDB consists of multiple tables, each containing rows and columns to record information. Think of it like an Excel workbook with spreadsheets, but with defined columns and tight relationships between tables.
Here’s an example of a RLDB for a music company:
Image source: sqlite tutorial
Each table in the database is connected through 1-to-1, 1-to-many, or many-to-1 relationships, important for writing queries, especially using JOIN functions. Each row or record is unique due to the primary key, and columns have constraints and data types (e.g., Playlistid: INTEGER).
In this post, basic functions for analytics projects are covered, categorized into manipulating functions, queries, and aggregate functions.
Manipulating Functions
CREATE
Creates new tables, views, or databases.
Purpose
To create a new table in the database from scratch.
Pre-requisite
No same table exists in the database. This is checked by SELECT * FROM table_name
.
Statement Example
CREATE TABLE students (
StudentID INTEGER PRIMARY KEY,
First_Name VARCHAR(40),
Last_Name VARCHAR(40)
);
Notes
- Table’s name always in plural form.
- No space should be in column name, words are connected by an underscore (‘_’).
INSERT INTO/ VALUES
Purpose
To insert a new row/record into a table.
Pre-requisite
None.
Statement Example
INSERT INTO students (StudentID, First_Name, Last_Name)
VALUES (1, 'James', 'Bonds');
-- Option for multiple rows
INSERT INTO students (StudentID, First_Name, Last_Name)
VALUES
(1, 'James', 'Bonds'),
(2, 'Katniss', 'Everdeen'),
(3, 'Sarah', 'Connor');
Notes:
- The
INSERT INTO
statement is used to add new rows to a table in the database. - The columns being inserted into must be specified in parentheses after the table name.
- The values being inserted into each column must be specified in the same order as the columns were listed in the parentheses.
- For multiple rows insertion, each set of values is separated by commas, and the entire set of values is enclosed within parentheses.
SELECT/ FROM students
Purpose
To fetch data from the students table.
Pre-requisite: Table data exist in the database. This is checked by SELECT * FROM table_name
.
Statement Example
SELECT *
FROM students;
Notes
- ‘*’ means showing all columns and rows in the table.
- By default, the rows will be selected in the order they appear in the table. To set a certain order for how the rows will be sorted, use the ORDER BY clause.
ALTER TABLE/ ADD COLUMN
Purpose
To add a new column into a table.
Statement Example
ALTER TABLE students
ADD COLUMN email TEXT;
Notes
- After the data are added for column
email
, NULL values are recorded for all rows underemail
. - It is unable to specify the position of the column in the table because the new added column will be at the end of the table by default. However, using SELECT functions you can specify the order in which columns are presented in the result.
UPDATE/SET/WHERE
Purpose
To edit a row.
Statement Example
UPDATE students
SET email = 'james.bond007@gmail.com'
WHERE StudentID = 1;
Notes
- This statement can not remove or add rows, but it can update an existing row.
- The ALTER statement can be used to modify, add, or remove columns. For example:
ALTER TABLE table_name DROP COLUMN column_name;
- The UPDATE statement only updates rows.
DELETE FROM/WHERE
Purpose
To remove one or more rows.
Statement Example
DELETE FROM students
WHERE email IS NULL;
-- Delete specific number of rows
DELETE FROM students
WHERE email IS NULL
LIMIT 5;
-- Delete a number of rows under a specific condition
DELETE FROM students
WHERE StudentID in (1,2,5,7) AND email IS NULL;
Queries
SELECT/AS
Purpose
To rename a column in the result.
Statement Example
SELECT StudentID AS 'Student_Number'
FROM students;
Notes
- The new name can be anything you want as long as you put it in single quotes.
SELECT DISTINCT
Purpose
To return only unique values from the table. It filters out all duplicate values in specified columns.
Pre-requisite
Table data must exist in the database.
Statement Example
SELECT DISTINCT First_Name, email
FROM students;
SELECT/FROM/WHERE
Purpose
To filter the results which only include rows satisfying a specific condition.
Statement Example
SELECT StudentID
FROM students
WHERE StudentID > 2;
Notes
- Comparison operators used with WHERE:
=
equal to!=
not equal to>
greater than<
less than>=
greater than or equal to<=
less than or equal to
IN
keyword can be used to define a set of satisfying values of the condition. E.g.WHERE First_Name IN ('Bond', 'Bob')
.NOT
keyword can be used:
SELECT *
FROM students
WHERE NOT email = 'James.bond007@gmail.com';
SELECT/FROM/WHERE/LIKE
Purpose
To return result which only include rows satisfying the condition that they contain characters following LIKE
.
Statement Example
SELECT First_Name
FROM students
WHERE First_Name LIKE 'Bo_';
Notes
LIKE
operator can be applied to numerical values.- Whether to match a number or a string, the pattern should always be wrapped within a pair of quotations.
'Bo_'
matches exactly two characters after'Bo'
.'Bo%'
matches unlimited characters after'Bo'
.- Actual characters
%
or_
can be differentiated using escape character\
. E.g:'100\%'
. LIKE
is not case-sensitive.
SELECT/FROM/WHERE/BETWEEN
Purpose
To filter the result within a certain range, the range can be defined by numbers, dates, or texts.
Statement Example
SELECT *
FROM students
WHERE Graduation_Year BETWEEN 2011 AND 2015;
Notes
- Above query will result all records from year 2011 up to and including year 2015 or
[2011, 2015]
. - It is different from a text range, which does not return the record associated with the upper limit value. E.g
[A, J)
. - All text values must be wrapped by a pair of quotations. E.g:
..BETWEEN 'A' AND 'J'
. - Text range can be case sensitive since the comparison follows lexicographical order. E.g:
…ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz…
.
SELECT/FROM/WHERE/AND
Purpose
To combine multiple conditions.
Statement Example
SELECT *
FROM students
WHERE First_Name = 'Bond' AND Last_Name = 'James';
SELECT/FROM/WHERE/OR
Purpose
To return a row if either of the conditions is true.
Notes
()
should be used to show the order of operation ifOR
,AND
are used multiple times. The order of precedence isAND
first thenOR
.- To avoid using
OR
multiple times, we can useWHERE/IN (set of values)
instead.
SELECT/FROM/WHERE/../ORDER BY
Purpose
To list the data in our result set in a particular order, either alphabetically or numerically.
Statement Example
SELECT *
FROM students
WHERE NOT First_Name = 'Bond'
ORDER BY StudentID DESC;
Notes
ASC
is a keyword used inORDER BY
to sort the results in ascending order (low to high or A-Z).DESC
is a keyword used inORDER BY
to sort the results in descending order (high to low or Z-A).ORDER BY
always goes afterWHERE
andGROUP BY
, but beforeLIMIT
.- When ordering by more than one column, it will first order the data on the first column, then, keeping the previous column order, it will order on the next column, and so on.
- Sorting rows that contain
NULL
is done by usingNULLS FIRST
(null before other non-null values) orNULLS LAST
(null after other non-null values) afterORDER BY
. - The
ORDER BY
statement with theDESC
option uses theNULLS FIRST
by default. - The
ORDER BY
statement with theASC
option uses theNULLS FIRST
by default.
SELECT/CASE…/FROM
Purpose
To create different results or outputs according to different cases.
Statement Example
SELECT *,
CASE
WHEN Grade > 8 THEN 'Excellent'
WHEN Grade > 7 THEN 'Good'
WHEN Grade > 6 THEN 'OK'
WHEN Grade > 5 THEN 'Pass'
ELSE 'Fail'
END
FROM students;
Notes
- A good practice is that you shorten the column name using
AS
clause. - After
THEN
, the results can be in any type: Integer, text, or date, etc. ELSE
clause is optional.- The order of
WHEN
clauses is important because it follows precedence from top to bottom. If the topWHEN
is not satisfied then it moves to the nextWHEN
, continued to the bottomWHEN
.
Aggregate Functions
SELECT COUNT
Purpose
Takes the name of a column as an argument and counts the number of non-empty values in that column.
Statement Example
SELECT COUNT(*) FROM students;
SELECT COUNT(DISTINCT subject) FROM students;
Notes
COUNT()
function on a column in SQL will include duplicate values by default.- If you want to count only the unique values in a column, use
DISTINCT
clause within theCOUNT()
function.
SELECT SUM
Purpose
Takes the name of a column as an argument and returns the sum of all the values in that column.
Statement Example
SELECT SUM(grade) FROM students;
SELECT MAX/MIN
Purpose
MAX
takes the name of a column as an argument and returns the largest value in that column.MIN
takes the name of a column as an argument and returns the smallest value in that column.
Notes
- When you have more than one row that contains the minimum or maximum value in a column, the topmost row containing that value will be returned in the result.
- To make the output show all rows that contain the maximum price instead of just the top one:
SELECT name, grade FROM students
WHERE grade = (SELECT max(grade) FROM students);
SELECT AVERAGE
Purpose
Takes the name of a column as an argument and calculates the average value of a particular column.
Statement Example
SELECT AVG(DISTINCT grade) FROM students;
Notes:
- To run the
AVG()
function on a column such that it only averages the unique values in the column, we could use theDISTINCT
clause right before the column name.
SELECT ROUND
Purpose:
The ROUND()
function takes two arguments inside the parenthesis: a column name, and an integer. It rounds the values in the column to the number of decimal places specified by the integer.
Statement Example:
SELECT ROUND(AVG(grade), 2)
FROM students;
SELECT / FROM / GROUP BY
Purpose:
GROUP BY
is normally used with aggregate functions, in collaboration with the SELECT
statement to arrange identical data into groups.
Statement Example:
SELECT year,
AVG(grade)
FROM students
GROUP BY year
ORDER BY year;
Notes:
- The
GROUP BY
statement comes after anyWHERE
statements, but beforeORDER BY
orLIMIT
. - It’s recommended to include the grouped-by column in the
SELECT
for more clarity, so that it’s easier to see what rows belong to which group. GROUP BY
takes input include several rows and turns them into one row, hence we should indicate what should be the result for those combined rows (e.g: average values, or sum value, etc.).GROUP BY
therefore is normally being used with aggregate functions.GROUP BY
can also be used without aggregate functions when all rows are unique under selected attributes.
SELECT / FROM / GROUP BY / HAVING
Purpose:
HAVING
is used after GROUP BY
to filter which groups are included in the output and which groups are not.
Statement Example:
SELECT subject,
grade,
COUNT(name)
FROM students
GROUP BY 1, 2
HAVING COUNT(name) > 10;
Notes:
HAVING
is different fromWHERE
.WHERE
is used to filter rows, whileHAVING
is used together withGROUP BY
to filter groups.HAVING
statement always comes afterGROUP BY / FROM / WHERE
, but beforeSELECT / DISTINCT / ORDER BY
andLIMIT
.- Since
HAVING
is evaluated beforeSELECT
, you cannot use column aliases in theHAVING
clause. HAVING
statement can replaceWHERE
conditions when we want to cross join two tables.
Functions for multiple tables
SELECT / FROM / JOIN / ON
Purpose:
Join multiple tables on certain attributes.
Statement Example:
SELECT *
FROM orders
JOIN customers
ON orders.customer_id = customers.customer_id;
# USING is used as another option for ON:
SELECT *
FROM orders
JOIN customers
USING(customer_id);
# Another way to do a cross join using WHERE:
SELECT person.name AS name, email.email AS email
FROM person, email
WHERE person.id = email.person_id;
Notes:
- When NULL values are there in a column, they are not matched to other NULL values since NULL means the absence of any value and it will never be equal to anything. NULL = NULL results in FALSE.
~LEFT JOIN / ON
Purpose:
Join multiple tables and keep unmatched rows of the attributes on the left-hand side.
Notes:
- If one primary key on the left table has multiple matches on the right table, the result table will contain duplicated primary key of the left table.
- The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance, when we join
orders
andcustomers
, we join oncustomer_id
, which is a foreign key inorders
and the primary key incustomers
. - A column that can uniquely identify a record of data is known as a “Candidate Key”. Tables can have multiple “Candidate Key”s, each of which could potentially be the “Primary Key”, but there must only be one “Primary Key” per table. The column chosen as the “Primary Key” follows the naming convention like
customer_id
orproduct_id
.
~CROSS JOIN / ON
Purpose:
To combine all rows of one table with all rows of another table. For instance, if we had a table of shirts and a table of pants, we might want to know all the possible combinations to create different outfits.
Statement Example:
SELECT *
FROM shirts
CROSS JOIN pants;
SELECT shirts.shirt_color,
pants.pants_color
FROM shirts
CROSS JOIN pants;
# alternative way:
SELECT shirts.shirt_color,
pants.pants_color
FROM shirts, pants;
Notes:
- The
CROSS JOIN
operation combines each row from theshirts
table with every row from thepants
table, resulting in a Cartesian product of the two tables. - In the alternative way using
,
, SQL performs an implicitCROSS JOIN
, resulting in the same output as explicitly specifyingCROSS JOIN
. However, it’s generally recommended to useCROSS JOIN
for clarity and explicitness.
~UNION/SELECT/FROM
Purpose: To stack one dataset on top of the other.
Pre-requisite:
- Tables must have the same number of columns.
- The columns must have the same data types in the same order as the first table.
Statement example:
SELECT *
FROM table1
UNION
SELECT *
FROM table2;
Notes:
- When you combine tables with
UNION
, duplicate rows will be excluded. If we performUNION
on two sets of data (tables), say A and B, then the data returned in the result will essentially be A + B – (A intersect B). - If, however, you wanted to include duplicates, certain versions of SQL provide the
UNION ALL
operator.
WITH/AS
Purpose: To re-use the result table of the previous query by putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query.
Statement example:
WITH previous_results AS (
SELECT ...
...
...
...
)
SELECT *
FROM previous_results
JOIN customers
ON _____ = _____;
Notes:
WITH
can be used for more than one nested query. For example:
WITH
query1 AS (SELECT column1 FROM table1 WHERE condition1),
query2 AS (SELECT column2 FROM table2 WHERE condition2)
SELECT...
UNION
can be used instead of a nested query:
SELECT *
FROM table1
WHERE number1 < 5
UNION
SELECT *
FROM table2
WHERE number2 < 5;
very informative