Database Normalisation

Define your priorities

When designing a database, you need to choose which information goes in which table. Base your choice on your priorities:

your priority is to retrieve data as fast as possible.
or
your priority is to save disk space and to update data fast.

In the second case, you can use database normalisation rules. These rules will help you verifying you made the good choices. This post presents the most important normalisation rules.

One field has one piece of information.

The rule:

Put only one piece of information in one field. This rule prevents also putting two pieces of information of the same nature in one field.

Examples:

Employees can work on several projects. We should not put two project IDs in one field of the employees table. Instead, we should have a separate table where you assign projects to employees. This table will have an employee ID column and a project ID column.
If an employee is assigned to two projects:

  • each assignment will be on a separate row.
  • both rows will have the same employee ID in the "employee ID column".

Moreover, we should not put the fixed phone, mobile and fax numbers in the same column. In that case, you should have a fixed phone number column, a mobile number column and a fax number column.

Effect of the rule:

This rule is not going to help you saving space nor to change data faster. One the other side:

  • in our example, looking for all employees working on a project will be more convenient.
  • updating one of the values will be more convenient.
  • it is a good practice officially part of normalisation rules.

Information not directly related to the person or thing.

The rule:

A table has information about a person or thing. Put columns with information not directly related to this person or thing in a separate table.

Example:

A table has information about rooms in an office. The room’s name, number, floor number, number of seats are directly related to this room.

On the other side, the office manager’s name is directly related to the office. It is not directly related to this room. As a result, put this first and last name in another table than the rooms table.

Effect of the rule:

Data is not duplicated. As a result:

  • when updating a piece of information, you need to update only one field.
  • if you had duplicates, you could by mistake omit to update the information in all fields.
  • there is less data in the table. As a result, the table takes less place on the disk.

In our example: you avoid having the office manager’s name on each row corresponding to a room in this office. You would need to be change each of these rows when there is a new office manager. Moreover, if you omit one row when executing this update, then this row will not have the same office manager name as the others. We call this data corruption.

Information directly related to a part of the person’s or thing’s identifier.

Rule:

This rule applies when you use more than one attribute to identify a person or thing. You should put in another table columns that contain information directly related to only one of these attributes.

Example:

You identify an employee with the combination of his first name, middle name and last name. In such a case, the employee’s names day would depend only on the first name. As a result, put “names day” in another table.

Effect of the rule:

Same effect as for rule 2)

Information that can be calculated or derived.

The rule:

There are some piece of information that you can calculate or derive (with a mathematical formula or string manipulation) based on other pieces of information stored in the table. You should not store them in any table.

Examples:

You have the column “net price” in a table. You should not have any column representing gross price in the database. You can calculate gross price by multiplying the net price with the tax rate.
If you have the columns first name and last name, you should not store the full name. This full name is the concatenation of the first name and last name.

Effect of the rule:

We avoid storing data that is not necessary to store. This saves disk space. Also, it avoids executing unnecessary updates. Example: if the tax rate is changing, you would have to update all fields containing a gross price calculated with this tax rate.

What about de-normalisation?

When de-normalising your database, you group your information in as less tables as possible, even if it creates duplicates.
Consequence:
you avoid using joins to get information from several tables. As a result, read operations are faster.
as you will have duplicate information, updates will be slower and data will use more space.

 
Please help this site survive, share on:



Author:
Last udpated: September 10, 2017
Created: November 21, 2014

Something not clear? Ask your question here:

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>