SQL BASICS : How to manipulate and query relational database?

SQL (Structured Query Language) is used to extract information from data stored in a Relational Database (RLDB). A RLDB consists of multiple tables, each containing rows and columns to record information. Think of it like an Excel workbook with spreadsheets, but with defined columns and tight relationships between tables.

Here’s an example of a RLDB for a music company:

Image source: sqlite tutorial

Each table in the database is connected through 1-to-1, 1-to-many, or many-to-1 relationships, important for writing queries, especially using JOIN functions. Each row or record is unique due to the primary key, and columns have constraints and data types (e.g., Playlistid: INTEGER).

In this post, basic functions for analytics projects are covered, categorized into manipulating functions, queries, and aggregate functions.

Manipulating Functions

CREATE

Creates new tables, views, or databases.

Purpose

To create a new table in the database from scratch.

Pre-requisite

No same table exists in the database. This is checked by SELECT * FROM table_name.

Statement Example

CREATE TABLE students (
   StudentID  INTEGER PRIMARY KEY,
   First_Name VARCHAR(40),
   Last_Name VARCHAR(40)
);

Notes

  • Table’s name always in plural form.
  • No space should be in column name, words are connected by an underscore (‘_’).

INSERT INTO/ VALUES

Purpose

To insert a new row/record into a table.

Pre-requisite

None.

Statement Example

INSERT INTO students (StudentID, First_Name, Last_Name)
VALUES (1, 'James', 'Bonds');
-- Option for multiple rows
INSERT INTO students (StudentID, First_Name, Last_Name)
VALUES 
    (1, 'James', 'Bonds'),
    (2, 'Katniss', 'Everdeen'),
    (3, 'Sarah', 'Connor');

Notes:

  • The INSERT INTO statement is used to add new rows to a table in the database.
  • The columns being inserted into must be specified in parentheses after the table name.
  • The values being inserted into each column must be specified in the same order as the columns were listed in the parentheses.
  • For multiple rows insertion, each set of values is separated by commas, and the entire set of values is enclosed within parentheses.

SELECT/ FROM students

Purpose

To fetch data from the students table.

Pre-requisite: Table data exist in the database. This is checked by SELECT * FROM table_name.

Statement Example

SELECT *
FROM students;

Notes

  • ‘*’ means showing all columns and rows in the table.
  • By default, the rows will be selected in the order they appear in the table. To set a certain order for how the rows will be sorted, use the ORDER BY clause.

ALTER TABLE/ ADD COLUMN

Purpose

To add a new column into a table.

Statement Example

ALTER TABLE students 
ADD COLUMN email TEXT;

Notes

  • After the data are added for column email, NULL values are recorded for all rows under email.
  • It is unable to specify the position of the column in the table because the new added column will be at the end of the table by default. However, using SELECT functions you can specify the order in which columns are presented in the result.

UPDATE/SET/WHERE

Purpose

To edit a row.

Statement Example

UPDATE students 
SET email = 'james.bond007@gmail.com'
WHERE StudentID = 1;

Notes

  • This statement can not remove or add rows, but it can update an existing row.
  • The ALTER statement can be used to modify, add, or remove columns. For example: ALTER TABLE table_name DROP COLUMN column_name;
  • The UPDATE statement only updates rows.

DELETE FROM/WHERE

Purpose

To remove one or more rows.

Statement Example

DELETE FROM students 
WHERE email IS NULL;

-- Delete specific number of rows
DELETE FROM students 
WHERE email IS NULL
LIMIT 5;

-- Delete a number of rows under a specific condition
DELETE FROM students 
WHERE StudentID in (1,2,5,7) AND email IS NULL;

Queries


SELECT/AS

Purpose

To rename a column in the result.

Statement Example

SELECT StudentID AS 'Student_Number'
FROM students;

Notes

  • The new name can be anything you want as long as you put it in single quotes.

SELECT DISTINCT

Purpose

To return only unique values from the table. It filters out all duplicate values in specified columns.

Pre-requisite

Table data must exist in the database.

Statement Example

SELECT DISTINCT First_Name, email 
FROM students;

SELECT/FROM/WHERE

Purpose

To filter the results which only include rows satisfying a specific condition.

Statement Example

SELECT StudentID 
FROM students
WHERE StudentID > 2;

Notes

  • Comparison operators used with WHERE:
    • = equal to
    • != not equal to
    • > greater than
    • < less than
    • >= greater than or equal to
    • <= less than or equal to
  • IN keyword can be used to define a set of satisfying values of the condition. E.g. WHERE First_Name IN ('Bond', 'Bob').
  • NOT keyword can be used:
SELECT *
FROM students
WHERE NOT email = 'James.bond007@gmail.com';

SELECT/FROM/WHERE/LIKE

Purpose

To return result which only include rows satisfying the condition that they contain characters following LIKE.

Statement Example

SELECT First_Name 
FROM students
WHERE First_Name LIKE 'Bo_';

Notes

  • LIKE operator can be applied to numerical values.
  • Whether to match a number or a string, the pattern should always be wrapped within a pair of quotations.
  • 'Bo_' matches exactly two characters after 'Bo'.
  • 'Bo%' matches unlimited characters after 'Bo'.
  • Actual characters % or _ can be differentiated using escape character \. E.g: '100\%'.
  • LIKE is not case-sensitive.

SELECT/FROM/WHERE/BETWEEN

Purpose

To filter the result within a certain range, the range can be defined by numbers, dates, or texts.

Statement Example

SELECT *
FROM students
WHERE Graduation_Year BETWEEN 2011 AND 2015;

Notes

  • Above query will result all records from year 2011 up to and including year 2015 or [2011, 2015].
  • It is different from a text range, which does not return the record associated with the upper limit value. E.g [A, J).
  • All text values must be wrapped by a pair of quotations. E.g: ..BETWEEN 'A' AND 'J'.
  • Text range can be case sensitive since the comparison follows lexicographical order. E.g: …ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz….

SELECT/FROM/WHERE/AND

Purpose

To combine multiple conditions.

Statement Example

SELECT * 
FROM students
WHERE First_Name = 'Bond' AND Last_Name = 'James';

SELECT/FROM/WHERE/OR

Purpose

To return a row if either of the conditions is true.

Notes

  • () should be used to show the order of operation if OR, AND are used multiple times. The order of precedence is AND first then OR.
  • To avoid using OR multiple times, we can use WHERE/IN (set of values) instead.

SELECT/FROM/WHERE/../ORDER BY

Purpose

To list the data in our result set in a particular order, either alphabetically or numerically.

Statement Example

SELECT * 
FROM students
WHERE NOT First_Name = 'Bond'
ORDER BY StudentID DESC;

Notes

  • ASC is a keyword used in ORDER BY to sort the results in ascending order (low to high or A-Z).
  • DESC is a keyword used in ORDER BY to sort the results in descending order (high to low or Z-A).
  • ORDER BY always goes after WHERE and GROUP BY, but before LIMIT.
  • When ordering by more than one column, it will first order the data on the first column, then, keeping the previous column order, it will order on the next column, and so on.
  • Sorting rows that contain NULL is done by using NULLS FIRST (null before other non-null values) or NULLS LAST (null after other non-null values) after ORDER BY.
  • The ORDER BY statement with the DESC option uses the NULLS FIRST by default.
  • The ORDER BY statement with the ASC option uses the NULLS FIRST by default.

SELECT/CASE…/FROM

Purpose

To create different results or outputs according to different cases.

Statement Example

SELECT *,
CASE 
WHEN Grade > 8 THEN 'Excellent'
WHEN Grade > 7 THEN 'Good'
WHEN Grade > 6 THEN 'OK'
WHEN Grade > 5 THEN 'Pass'
ELSE 'Fail'
END
FROM students;

Notes

  • A good practice is that you shorten the column name using AS clause.
  • After THEN, the results can be in any type: Integer, text, or date, etc.
  • ELSE clause is optional.
  • The order of WHEN clauses is important because it follows precedence from top to bottom. If the top WHEN is not satisfied then it moves to the next WHEN, continued to the bottom WHEN.

Aggregate Functions

SELECT COUNT

Purpose

Takes the name of a column as an argument and counts the number of non-empty values in that column.

Statement Example

SELECT COUNT(*) FROM students;

SELECT COUNT(DISTINCT subject) FROM students;

Notes

  • COUNT() function on a column in SQL will include duplicate values by default.
  • If you want to count only the unique values in a column, use DISTINCT clause within the COUNT() function.

SELECT SUM

Purpose

Takes the name of a column as an argument and returns the sum of all the values in that column.

Statement Example

SELECT SUM(grade) FROM students;

SELECT MAX/MIN

Purpose

  • MAX takes the name of a column as an argument and returns the largest value in that column.
  • MIN takes the name of a column as an argument and returns the smallest value in that column.

Notes

  • When you have more than one row that contains the minimum or maximum value in a column, the topmost row containing that value will be returned in the result.
  • To make the output show all rows that contain the maximum price instead of just the top one:
SELECT name, grade FROM students 
WHERE grade = (SELECT max(grade) FROM students);

SELECT AVERAGE

Purpose

Takes the name of a column as an argument and calculates the average value of a particular column.

Statement Example

SELECT AVG(DISTINCT grade) FROM students;

Notes:

  • To run the AVG() function on a column such that it only averages the unique values in the column, we could use the DISTINCT clause right before the column name.

SELECT ROUND

Purpose:

The ROUND() function takes two arguments inside the parenthesis: a column name, and an integer. It rounds the values in the column to the number of decimal places specified by the integer.

Statement Example:

SELECT ROUND(AVG(grade), 2)
FROM students;

SELECT / FROM / GROUP BY

Purpose:

GROUP BY is normally used with aggregate functions, in collaboration with the SELECT statement to arrange identical data into groups.

Statement Example:

SELECT year,
       AVG(grade)
FROM students
GROUP BY year
ORDER BY year;

Notes:

  • The GROUP BY statement comes after any WHERE statements, but before ORDER BY or LIMIT.
  • It’s recommended to include the grouped-by column in the SELECT for more clarity, so that it’s easier to see what rows belong to which group.
  • GROUP BY takes input include several rows and turns them into one row, hence we should indicate what should be the result for those combined rows (e.g: average values, or sum value, etc.). GROUP BY therefore is normally being used with aggregate functions.
  • GROUP BY can also be used without aggregate functions when all rows are unique under selected attributes.

SELECT / FROM / GROUP BY / HAVING

Purpose:

HAVING is used after GROUP BY to filter which groups are included in the output and which groups are not.

Statement Example:

SELECT subject,
       grade,
       COUNT(name)
FROM students
GROUP BY 1, 2
HAVING COUNT(name) > 10;

Notes:

  • HAVING is different from WHERE. WHERE is used to filter rows, while HAVING is used together with GROUP BY to filter groups.
  • HAVING statement always comes after GROUP BY / FROM / WHERE, but before SELECT / DISTINCT / ORDER BY and LIMIT.
  • Since HAVING is evaluated before SELECT, you cannot use column aliases in the HAVING clause.
  • HAVING statement can replace WHERE conditions when we want to cross join two tables.

Functions for multiple tables

SELECT / FROM / JOIN / ON

Purpose:

Join multiple tables on certain attributes.

Statement Example:

SELECT *
FROM orders
JOIN customers
    ON orders.customer_id = customers.customer_id;

# USING is used as another option for ON:
SELECT *
FROM orders
JOIN customers
    USING(customer_id);

# Another way to do a cross join using WHERE:
SELECT person.name AS name, email.email AS email
FROM person, email
WHERE person.id = email.person_id;

Notes:

  • When NULL values are there in a column, they are not matched to other NULL values since NULL means the absence of any value and it will never be equal to anything. NULL = NULL results in FALSE.

~LEFT JOIN / ON

Purpose:

Join multiple tables and keep unmatched rows of the attributes on the left-hand side.

Notes:

  • If one primary key on the left table has multiple matches on the right table, the result table will contain duplicated primary key of the left table.
  • The most common types of joins will be joining a foreign key from one table with the primary key from another table. For instance, when we join orders and customers, we join on customer_id, which is a foreign key in orders and the primary key in customers.
  • A column that can uniquely identify a record of data is known as a “Candidate Key”. Tables can have multiple “Candidate Key”s, each of which could potentially be the “Primary Key”, but there must only be one “Primary Key” per table. The column chosen as the “Primary Key” follows the naming convention like customer_id or product_id.

~CROSS JOIN / ON

Purpose:

To combine all rows of one table with all rows of another table. For instance, if we had a table of shirts and a table of pants, we might want to know all the possible combinations to create different outfits.

Statement Example:

SELECT *
FROM shirts
CROSS JOIN pants;

SELECT shirts.shirt_color,
       pants.pants_color
FROM shirts
CROSS JOIN pants;
# alternative way: 
SELECT shirts.shirt_color,
       pants.pants_color
FROM shirts, pants;

Notes:

  • The CROSS JOIN operation combines each row from the shirts table with every row from the pants table, resulting in a Cartesian product of the two tables.
  • In the alternative way using ,, SQL performs an implicit CROSS JOIN, resulting in the same output as explicitly specifying CROSS JOIN. However, it’s generally recommended to use CROSS JOIN for clarity and explicitness.

~UNION/SELECT/FROM

Purpose: To stack one dataset on top of the other.

Pre-requisite:

  • Tables must have the same number of columns.
  • The columns must have the same data types in the same order as the first table.

Statement example:

SELECT *
FROM table1
UNION
SELECT *
FROM table2;

Notes:

  • When you combine tables with UNION, duplicate rows will be excluded. If we perform UNION on two sets of data (tables), say A and B, then the data returned in the result will essentially be A + B – (A intersect B).
  • If, however, you wanted to include duplicates, certain versions of SQL provide the UNION ALL operator.

WITH/AS

Purpose: To re-use the result table of the previous query by putting a whole first query inside the parentheses () and giving it a name. After that, we can use this name as if it’s a table and write a new query using the first query.

Statement example:

WITH previous_results AS (
    SELECT ...
    ...
    ...
    ...
)
SELECT *
FROM previous_results
JOIN customers
    ON _____ = _____;

Notes:

  • WITH can be used for more than one nested query. For example:
WITH 
query1 AS (SELECT column1 FROM table1 WHERE condition1), 
query2 AS (SELECT column2 FROM table2 WHERE condition2)
SELECT...

UNION can be used instead of a nested query:

SELECT *
FROM table1
WHERE number1 < 5
UNION
SELECT *
FROM table2
WHERE number2 < 5;

By Kat

One thought on “SQL BASICS : How to manipulate and query relational database?”

Leave a Reply

Your email address will not be published. Required fields are marked *