DATABASE DESIGN: Evaluate the quality of the design? [Part 2]

So far, we have discussed the top-down approach, which focuses on decomposing raw information of different forms into attributes on data tables and setting criteria that we believe constitute a good database design. We now dive deeper into the process of analyzing and minimizing redundancies and anomalies in data. This process is called normalization. Normalization includes techniques to decompose data while still preserving functional dependencies of attributes, maintaining the original relationships and constraints of the data in the decomposed tables.

Normalization evaluates relation schema through a series of up to 6 tests. Each test has a criteria and a corresponding normal form. The relation schema must meet each criteria to achieve each level of normalisation. The first 4 tests are the most important ones basing on functional dependencies among attributes. The last 2 tests can be optional considering their certain disadvantages, they are based on multivalued dependencies and join dependencies. In the coming part, we will see how the first 4 test are performed to purify and improve the database design .

Table of Contents

Assumptions

While performing normalisation process, it is important to note that we are assuming functional dependencies exist within given database tables (relations). This assumption is made based on the fact that database designer follows top-down approach and performs an appropriate design process in previous conceptual stage to produce entity-relationship model.

First Normal Form(1NF)

Criteria : Only atomic/indivisible values are permitted

Multiple values or composite value ( such as array, list, etc.,) are not allowed in a table cell. This is because of the constrain we set for database table that each cell of table contain only one single value.

Three main techniques to achieve the first normal form:

Remove the non-atomic attribute from the table and place it in a separate table. The new table also include the primary key or the original table for reference. This results in two 1NF relations.

For example, we have table Persons and each person can have multiple address locations:

ID	Name	Locations
1	John	California, Texas, Florida
2	Mary	New York, Pennsylvania, Georgia
3	Peter	Michigan , Illinois, Ohio

New table and original table after removing the non-atomic attribute:

Person_ID	Location
1	California
1	Texas
1	Florida
2	New York
2	Pennsylvania
2	Georgia
3	Michigan
3	Illinois
3	Ohio

Table: Person_Location

ID	Name
1	John
2	Mary
3	Peter

Table: Persons

2. Put each location in to a separate rows, other attributes are the same. This has disadvantages of introducing redundancy, hence it is not recommended

ID	Name	Location
1	John	California
1	John	Texas
1	John	Florida
2	Mary	New York
2	Mary	Pennsylvania
2	Mary	Georgia
3	Peter	Michigan
3	Peter	Illinois
3	Peter	Ohio

Table: Persons

3. The third approach is creating more columns and place each location in one column : Location1, location2, ect. This has disadvantage of introducing NULL in case that not all people have three address location. For example, we add to the table another person name Katie with only 1 location, our table will look like this:

ID	Name	Location1	Location2	Location3
3	Peter	Michigan	Illinois	Ohio
4	Katie	New York	NULL	NULL
…	…	…	…	…

Among three solutions proposed above, the first one is considered the best because it have less redundancy compared to other two.

It is also noted that sometimes atomic data can include the entire object or file stored as a binary large object (BLOB) or character large object (CLOB).

Second Normal Form (2NF)

Criteria: Only fully functional dependency exists in the table

The goal of second normalization step is to achieve a relation in which all attributes are fully and functionally dependent on the primary key. In other words, this process removes all possible partial dependencies in the table to make all non-prime attribute fully functionally dependent only on the primary key.

An attribute is non-prime if it is not a part of any key ( either primary key or candidate key). A partial dependency happens when a group of non-prime attributes only functionally depends on a part of the primary key. For example, we have table Person_Project with 3 attributes: PersonID, ProjectID, Project_Location. If Project_Location ony functional depends on ProjectID but not PersonID, we have a partial dependency in this table and we need to remove that to achieve the second normal form.

It is noted that we only consider this test if primary key contains more than 1 attribute.

To achieve the second normal form, we will move the partial dependency out to a separate table (e.g., Project). Only the reference attribute( e.g., ProjectID) remains in the original table.

Third Normal Form(3NF)

Criteria: No transitive dependency exists in the table

Another scenario is that we have certain non-prime attributes, though functional depends on the primary key, but indirectly through one or more other non-prime attributes.

For example, we have 3NF-violated table below:

Table: EmployeeDetails

EmployeeID	EmployeeName	DepartmentID	DepartmentName	DepartmentLocation
1	John Doe	D01	HR	New York
2	Jane Smith	D02	Finance	San Francisco
3	Alice Johnson	D01	HR	New York
4	Bob Brown	D03	IT	Chicago

The table violates 3NF because there are transitive dependencies. Specifically:

EmployeeID → DepartmentID
DepartmentID → DepartmentName, DepartmentLocation

This means that DepartmentName and DepartmentLocation are indirectly dependent on EmployeeID through DepartmentID.

To achieve 3NF, we can decompose the above table into two tables: EmployeeDetails which have only 3 attributes: EmployeeID, EmployeeName, and DepartmentID, while DepartmentName and DepartmentLocation are placed in a new table with Primary Key is DepartmentID.

Corrected Tables in 3NF

To correct this and achieve 3NF, we should decompose the table into two separate tables.

Table 1: Employees

EmployeeID	EmployeeName	DepartmentID
1	John Doe	D01
2	Jane Smith	D02
3	Alice Johnson	D01
4	Bob Brown	D03

Table 2: Departments

DepartmentID	DepartmentName	DepartmentLocation
D01	HR	New York
D02	Finance	San Francisco
D03	IT	Chicago

Now, the Employees table and the Departments table are in 3NF because there are no transitive dependencies. Each non-key attribute is fully functionally dependent on the primary key of its respective table.

Boyce-Codd Normal Form

Criteria: Every functional dependency in a relation, the determinant (left-hand side) always is a candidate key

The third normal form (3NF) can still be achieved if a group of non-prime attributes functionally depends on a prime attribute. This prime attribute cannot be part of primary key ( because it will violate 2NF), but it can be a part of a candidate key. In other case, there is a chance that a prime attribute depends on a non-prime attribute as well. Hence 3NF can achieve if in every dependency A ->B ( indicating that the value of A determines the value of B) either:

A is a prime attribute
B is a prime attribute

The critical difference is that BCNF does not allow the exceptions that 3NF allows for functional dependencies where B is a prime attribute.

BCNF enforces stricter rules than 3NF by ensuring that every functional dependency’s determinant is a super key, thereby eliminating potential anomalies and redundancies that 3NF might still allow.

Normal forms do not guarantee a good database design. It must confirm additional properties: guarantees that the spurious tuple generation does not occur after decomposition(critical, must be achieved at any cost), and each functional dependency is preserve after decomposition ( sometime sacrificed)

Breaking

DATABASE DESIGN: Evaluate the quality of the design? [Part 2]

Assumptions

First Normal Form(1NF)

Criteria : Only atomic/indivisible values are permitted

Second Normal Form (2NF)

Criteria: Only fully functional dependency exists in the table

Third Normal Form(3NF)

Criteria: No transitive dependency exists in the table

Boyce-Codd Normal Form

Criteria: Every functional dependency in a relation, the determinant (left-hand side) always is a candidate key

By Kat

Related Post

Leave a Reply Cancel reply

You Missed

AI 900 PREP : Understanding essential AI concepts

AI-900 PREP: Cloud Computing Fundamentals

DATABASE DESIGN: Evaluate the quality of the design? [Part 2]

DATABASE DESIGN: Evaluate the quality of the design? [Part 1]

DATABASE DESIGN: Evaluate the quality of the design? [Part 2]

Assumptions

First Normal Form(1NF)

Criteria : Only atomic/indivisible values are permitted

Second Normal Form (2NF)

Criteria: Only fully functional dependency exists in the table

Third Normal Form(3NF)

Criteria: No transitive dependency exists in the table

Boyce-Codd Normal Form

Criteria: Every functional dependency in a relation, the determinant (left-hand side) always is a candidate key

By Kat

Related Post

DATABASE DESIGN: Evaluate the quality of the design? [Part 1]

DATABASE DESIGN: what are database constraints ?

DATA MODELING: How to build Entity-Relationship Diagram?

Leave a Reply Cancel reply

You Missed

AI 900 PREP : Understanding essential AI concepts

AI-900 PREP: Cloud Computing Fundamentals

DATABASE DESIGN: Evaluate the quality of the design? [Part 2]

DATABASE DESIGN: Evaluate the quality of the design? [Part 1]