Home‎ > ‎6 pm class pages‎ > ‎

Neuwelt,Gabriel Jacob

Normalization Basics

Database Normalization
                Database normalization is the efficient organizing of data in a database this is usually done through a reduction of data redundancy  in a database. Wikipedia’s definition of Database normalization is improving the overall accuracy and consistency of data in a database through efficient design. It is an excellent way to test if a database has data redundancy or not. By not normalizing a database it can be slowed down through to much memory usage or it can become cluttered in confusing for those who manage the database.

                Normalizing a database is done through a series of steps also called normal forms. There are five different forms: First normal form, second normal form, thirds normal form, fourth normal form or Boyce-Codd normal form and fifth normal form. When a database goes past first normal form each form that follows needs to have the database in the previous form for it to work.  E.g. If you are trying to normalize a database into third normal form the database must already be in second normal form and first normal form for it to work.

 First normal form (1NF) is the simplest form and easiest to implement. It removes columns that have duplicated data in them and it then condenses the data into one group which is linked to a primary key. This is the main form that most databases employ to become normalized.  An Example:  You decide to make a table with one chef and with many dishes that he makes under him.

 Dish 1

This creates a problem since you have duplicating columns with dishes and you may need to hire another chef. Instead simplify it by adding unique identifies and condensing the dishes into one column.

 Chef ID
 Dish ID

This way each chef has a unique identifier that can relate to a specific dish. This speeds up the database through the use of less memory usage and reduces complexity.

The second normal form (2NF) requires the database to already be in first normal form. It removes data that is not directly related to the primary key. It removes this data and puts it into its own table and relates the two tables with a foreign key. An example is that you have a small business and you have a customer table which stores addresses.

Those addresses are used in many different tables within your database e.g. invoices, shipping. By moving those addresses into a separate table you reduce the redundancy of multiple address in multiple tables and the possibility of updating one address and leaving the other address the same which would cause a mistake in the system. You can then link the address table to other tables so when you change an address in the address table it automatically changes all the addresses in the other tables. This is one main goal of database normalization, when you change or modify a database you want to be able to change it from one location.



                Third normal form (3NF) requires that the database comply with both first normal form and second normal form, additionally it requires the removal of columns not dependant on the primary key. For example: You have a table with student ID, advisor name and advisor room number. The advisors room number is not dependant on the advisor name since there can be many advisors in different rooms. The solution would be to move the room number to a different table.

  •  Student ID
     Advisor Name
     Room Number

The two tables would now look like this

 Student ID
 Advisor Name
 Advisor Name
 Room number

  Boyce-Codd normal form (BCNF) takes third normal form one step further by making every dependent piece of data into a candidate key. A candidate key is group of characters that can be used to identify a part of a database. Usually a character key is selected as a primary key. 

                Fourth normal form (4NF) and fifth normal form (5NF) are usually not required because they are hard to implement into a database. This is not a problem though since once a database is in third normal form going beyond that won’t have much of an effect on the memory usage or organization of your database. The whole point of database normalization is to make it easier for the programmer to easily manipulate the data in the system without ruining the principle of the database. A database is in fourth normal form once if it is in all the previous forms and has no rows that repeat themselves. For example if a furniture makes the same couch but in different colors each row will have the same basic information for the couch just a different color. These last two forms are only used if a programmer wants a perfect database with no flaws in it.

 Data redundancy can make looking up items in a database very complicated and confusing. Database normalization takes out these confusing and repeating factors and makes the database very easy to read and understand. It also makes it flow better and more accurately. Most databases do not need all five forms of normalization; most databases only need up to the third normal form to fix all data redundancy issues. Data normalization is used by programmers who want more consistent and accurate data in their tables. It does not necessarily need to be used since ideally most data should be normalized before its put into the database.  The businesses that mainly use database normalization are data warehouses and transaction databases like Amazon or PayPal. These databases need normalization because they deal with millions of pieces of information that can easily compromise their systems if they are left un normalized. Their systems need to be very easy to manage with that much information since if it was unmanageable their systems could be severely slowed down which would cause a loss of sales on their sites.