DATABASE DESIGN: Evaluate the quality of the design? [Part 2]

So far, we have discussed the top-down approach, which focuses on decomposing raw information of different forms into attributes on data tables and setting criteria that we believe constitute a good database design. We now dive deeper into the process of analyzing and minimizing redundancies and anomalies in data. This process is called normalization. Normalization includes techniques to decompose data while still preserving functional dependencies of attributes, maintaining the original relationships and constraints of the data in the decomposed tables.

Normalization evaluates relation schema through a series of up to 6 tests. Each test has a criteria and a corresponding normal form. The relation schema must meet each criteria to achieve each level of normalisation. The first 4 tests are the most important ones basing on functional dependencies among attributes. The last 2 tests can be optional considering their certain disadvantages, they are based on multivalued dependencies and join dependencies. In the coming part, we will see how the first 4 test are performed to purify and improve the database design .

Assumptions

While performing normalisation process, it is important to note that we are assuming functional dependencies exist within given database tables (relations). This assumption is made based on the fact that database designer follows top-down approach and performs an appropriate design process in previous conceptual stage to produce entity-relationship model.

First Normal Form(1NF)

Criteria : Only atomic/indivisible values are permitted

Multiple values or composite value ( such as array, list, etc.,) are not allowed in a table cell. This is because of the constrain we set for database table that each cell of table contain only one single value.

Three main techniques to achieve the first normal form:

  1. Remove the non-atomic attribute from the table and place it in a separate table. The new table also include the primary key or the original table for reference. This results in two 1NF relations.

For example, we have table Persons and each person can have multiple address locations:

IDNameLocations
1JohnCalifornia, Texas, Florida
2MaryNew York, Pennsylvania, Georgia
3PeterMichigan , Illinois, Ohio


New table and original table after removing the non-atomic attribute:

Person_IDLocation
1California
1Texas
1Florida
2New York
2Pennsylvania
2Georgia
3Michigan
3Illinois
3Ohio
Table: Person_Location
IDName
1John
2Mary
3Peter
Table: Persons

2. Put each location in to a separate rows, other attributes are the same. This has disadvantages of introducing redundancy, hence it is not recommended

IDNameLocation
1JohnCalifornia
1JohnTexas
1JohnFlorida
2MaryNew York
2MaryPennsylvania
2MaryGeorgia
3PeterMichigan
3PeterIllinois
3PeterOhio
Table: Persons

3. The third approach is creating more columns and place each location in one column : Location1, location2, ect. This has disadvantage of introducing NULL in case that not all people have three address location. For example, we add to the table another person name Katie with only 1 location, our table will look like this:

IDNameLocation1Location2Location3
3PeterMichiganIllinoisOhio
4KatieNew YorkNULLNULL

Among three solutions proposed above, the first one is considered the best because it have less redundancy compared to other two.

It is also noted that sometimes atomic data can include the entire object or file stored as a binary large object (BLOB) or character large object (CLOB).

Second Normal Form (2NF)

Criteria: Only fully functional dependency exists in the table

The goal of second normalization step is to achieve a relation in which all attributes are fully and functionally dependent on the primary key. In other words, this process removes all possible partial dependencies in the table to make all non-prime attribute fully functionally dependent only on the primary key.

An attribute is non-prime if it is not a part of any key ( either primary key or candidate key). A partial dependency happens when a group of non-prime attributes only functionally depends on a part of the primary key. For example, we have table Person_Project with 3 attributes: PersonID, ProjectID, Project_Location. If Project_Location ony functional depends on ProjectID but not PersonID, we have a partial dependency in this table and we need to remove that to achieve the second normal form.

It is noted that we only consider this test if primary key contains more than 1 attribute.

To achieve the second normal form, we will move the partial dependency out to a separate table (e.g., Project). Only the reference attribute( e.g., ProjectID) remains in the original table.

Third Normal Form(3NF)

Criteria: No transitive dependency exists in the table

Another scenario is that we have certain non-prime attributes, though functional depends on the primary key, but indirectly through one or more other non-prime attributes.

For example, we have 3NF-violated table below:

Table: EmployeeDetails

EmployeeIDEmployeeNameDepartmentIDDepartmentNameDepartmentLocation
1John DoeD01HRNew York
2Jane SmithD02FinanceSan Francisco
3Alice JohnsonD01HRNew York
4Bob BrownD03ITChicago

The table violates 3NF because there are transitive dependencies. Specifically:

  • EmployeeID → DepartmentID
  • DepartmentID → DepartmentName, DepartmentLocation

This means that DepartmentName and DepartmentLocation are indirectly dependent on EmployeeID through DepartmentID.

To achieve 3NF, we can decompose the above table into two tables: EmployeeDetails which have only 3 attributes: EmployeeID, EmployeeName, and DepartmentID, while DepartmentName and DepartmentLocation are placed in a new table with Primary Key is DepartmentID.

Corrected Tables in 3NF

To correct this and achieve 3NF, we should decompose the table into two separate tables.

Table 1: Employees

EmployeeIDEmployeeNameDepartmentID
1John DoeD01
2Jane SmithD02
3Alice JohnsonD01
4Bob BrownD03

Table 2: Departments

DepartmentIDDepartmentNameDepartmentLocation
D01HRNew York
D02FinanceSan Francisco
D03ITChicago

Now, the Employees table and the Departments table are in 3NF because there are no transitive dependencies. Each non-key attribute is fully functionally dependent on the primary key of its respective table.

Boyce-Codd Normal Form

Criteria: Every functional dependency in a relation, the determinant (left-hand side) always is a candidate key

The third normal form (3NF) can still be achieved if a group of non-prime attributes functionally depends on a prime attribute. This prime attribute cannot be part of primary key ( because it will violate 2NF), but it can be a part of a candidate key. In other case, there is a chance that a prime attribute depends on a non-prime attribute as well. Hence 3NF can achieve if in every dependency A ->B ( indicating that the value of A determines the value of B) either:

  • A is a prime attribute
  • B is a prime attribute

The critical difference is that BCNF does not allow the exceptions that 3NF allows for functional dependencies where B is a prime attribute.

BCNF enforces stricter rules than 3NF by ensuring that every functional dependency’s determinant is a super key, thereby eliminating potential anomalies and redundancies that 3NF might still allow.

Normal forms do not guarantee a good database design. It must confirm additional properties: guarantees that the spurious tuple generation does not occur after decomposition(critical, must be achieved at any cost), and each functional dependency is preserve after decomposition ( sometime sacrificed)

By Kat

Leave a Reply

Your email address will not be published. Required fields are marked *