Data modeling is a critical step of database design. It involves translating business requirements about data into a technical blueprint. Data modeling is applied to build new databases or to reengineer applications to improve their performance and scalability. There are three levels of data modeling:
1) High level or Conceptual level : focuses on analyzing the requirements of business, how data should be stored and processed. The output of this level is a detailed description of the data with relationships and constrains between those relationships. This can be summarized into an ERD ( Entity-Relationship Diagram) which includes major entities ( e.g., Customer Name, Order Date), their attributes( CustomerID, Name, Email, etc.), and their associations (e.g., Customers place Orders)
2) Logical level : focuses on converting the conceptual model into a relational schema in the chosen data model in DBMS ( Database Management System). This involves data normalization to reduce redundancy and improve integrity of data and identifying Primary and Foreign Keys. The output of this step will be a more refined ERD model
3) Low level or Physical Database design: focuses on data types, how each type is actually stored, how to improve its performance, etc. This involves defining actual database tables and columns, data types ( e.g., CustomerID – INT, Primary Key, AUTO_INCREMENT, Name – VARCHAR(100), etc.)
Data modeling ensures that the database is well-structured, efficient and aligned with business requirements.
Entity-Relationship Model
As mentioned above, the center of Data modeling is developing an Entity-Relationship Model. It acts as a bridge between business requirements and the technical implementation of DB.
In this part, we are going to discuss deeper about properties of ER model and steps taken to build an ER model.
Entities
Entity in database represents a person, a place, an event or an object in real-world. For example, a teacher, a cinema, a show, an album, etc. Entity can be tangible or intangible( e.g., department, course).
To make sure that an entity is relevant to DB design context, it is important to make sure that the business requires information about this entity in its operation. In other words, an entity in DB design is any object in the system that you want to store information about.
In ERD, entities are presented by rectangle boxes. They are database tables. In each table, each row represents an instance of that entity.
Attributes
Attributes are properties of an entity. For example, ‘student’ entity has name, DOB, email address as its attributes.
At the conceptual level, attributes are represented by ellipses. A double eclipse represents attributes that can have multiple values. A dash ellipse means the attribute can be calculated.
Identifier (Keys)
These are one or more attributes that uniquely define the entity ( e.g., StudentID, CustomerID). These are also keys used to connect tables in a DB.
Characteristics of an Primary Key include : not changing in value, not null
Relationships
Relationships in DB describe logical associations between entities. A relationship in DB is described by two characteristics: the number of entities involved in the relationship (Degree), and the number of entities to which another entity can be associated ( Cardinality)
In the conceptual design level, relationships can be presented by diamond shapes with lines connecting between entities
- Degree : includes Unary, Binary and Ternary
Unary( Recursive) is relationship between instances of a single entity set ( e.g., an employee supervises another employee)
Binary is relationship between two entities
Ternary is relationship among three entities.
- Cardinality: includes One-to-One, One-to-Many, and Many-to-Many
Steps to create an ER model
In previous parts, we have discussed about different levels or phases of data modelling, properties of an ER model. Now, we are going to summarize that with specific steps needed to create an ER model.
STEP 1: Entity Identification
From the detailed description collected in the beginning of conceptual design phase, we highlight all nouns in that description, draft out a list of potential identities and select entities which we want to store information.
STEP 2: Relationship Identification
Base on the list of entities selected in step 1, in this step, we identify relationships between entities by describing them in one sentence. For example:
- Student is assigned a module
- Lecture delivers a module
STEP 3: Cardinality Identification
From previous steps, we have all entities and their relationships. In this step, we continue to describe those relationships in more details. For example:
- A student can be assigned to multiple modules
- A lecture can deliver only one module
STEP 4: Attribute Identification
Now we are going to look into entities that we have selected in more details : what information about each entity that we need to store?
For example, information about student entity we need includes: Name of the student, StudentID, courses they can enroll in, their marks, etc.
At the end of this step, we can draft out the preliminary design of our ER model as below:
STEP 5: Revise Model
Base on the draft of ER model above, we can identify Many-To-Many relationships which leads to an additional bridge entity ( e.g., Student_take_Module entity), and further refine our model.