Home‎ > ‎3 pm class page‎ > ‎

Blumenfeld2

Database Normalization

DATABASE NORMALIZATION: Is the process by which databases are efficiently organized.  When it comes to Database Normalization, there are two main goals:  eliminating redundant data as well as ensuring that database dependencies make sense. By ensuring that these two goals be met, overcrowding and non-logical storage options are greatly reduced resulting in a concise and efficient data base system. In order to ensure that a set of guideline be followed by the data-basing community, a set of "normal forms" were created and are numbered 1NF-5NF. While there are 5 different forms, the first three are most commonly used with the fourth being added in some exceptions. Very rarely will the 5th form be utilized. These forms start with a minimum level of normalization and ascend as the numbers rise making each step more thorough. Material paraphrased from http://databases.about.com/od/specificproducts/a/Should-I-Normalize-My-Database.htm  

Normal Forms 1-4: 

NF 1: The first normal form is a very basic set of guidelines and rules used for the initial phases of the database process. The two main goals of this form are as follows; to eliminate redundant information from multiple tables in the same database, as well as to create separate tables for each group of related data and relative each row with a unique column or in some cases a set of columns. This is also known as the primary key.

 

NF2: The second form continues the process of helping to eliminate duplicate data. This form also continues to remove redundant data that shows up on multiple tables. The difference between the first and second is that the second will usually create a new table for these subsets so as to differentiate between the clean data versus data with redundancies. Last, this table begins to create the relationships between these new tables and their predecessors through the utilization of foreign keys. 

Normal Forms 1-3

 

NF3: The third form takes a larger step than the step taken between forms 1 and 2 in that it removes all of the columns that are not dependent on the primary key. Additionally, every determinant must be a candidate key. What a candidate key is, is a combination of attributes that can be used to identify a unique database record without any extras or unnecessary information. Further, one of these candidate keys is still chosen as the table’s primary key.

 

NF4: The fourth form meets all of the requirements that the third form does while in addition the fourth form has no multi-valued dependencies. A multi-valued dependency occurs when the presence of one or more rows in a table implies the presence of one or more other rows in that same table. Once again, NF4 is able to meet all of the requirements of NF1-3 in addition to the elimination of muti-valued dependencies. 

 

Now that you know about the different forms, when should you normalize?

 

When given an opportunity, make an attempt to normalize your database. Once again it helps you reduce redundant information which optimizes performance. Make sure that when you begin the process that you follow the correct procedure and understand that the forms (NF1-4) build on each other. In circumstances in which databases are to be accesses by larger numbers of people or where time and money can be allocated freely, the process of normalization is recommended. Overall, if the means are there to normalize, however if there are confounding factors that prevent it, de-normalization is not always a recipe for disaster.

Normalization Example

When should you not normalize?

While it is recommended that databases be normalized for the purpose of efficiency and ease of use, the process of normalization does take time and effort. Sometimes normalization causes vast amounts of tables and relationships to be created. The process of interweaving all the information as well as removing redundant data can take time, energy and monetary resources out of a person or a business. If the database still works in the de-normalization phase, then some suggest moving on and not spending these resources. Additionally, for smaller projects and for work that doesn’t require in depth analysis the normalization process is sometimes unnecessary. Realize however that this may equate to having to start the process over or correcting errors as they occur throughout the process. 







Normalizing an Example Table: example taken from http://support.microsoft.com/kb/283878

This example highlights the process of going through the first three Normal Forms.

1.        Unnormalized table: 

Student#

Advisor

Adv-Room

Class1

Class2

Class3

1022

Jones

412

101-07

143-01

159-02

4123

Smith

216

201-01

211-02

214-01

2.        First Normal Form: No Repeating Groups
Tables should have only two dimensions. Since one student has several classes, these classes should be listed in a separate table. Fields Class1, Class2, and Class3 in the above records are indications of design trouble. 

Spreadsheets often use the third dimension, but tables should not. Another way to look at this problem is with a one-to-many relationship, do not put the one side and the many side in the same table. Instead, create another table in first normal form by eliminating the repeating group (Class#), as shown below:

Student#

Advisor

Adv-Room

Class#

1022

Jones

412

101-07

1022

Jones

412

143-01

1022

Jones

412

159-02

4123

Smith

216

201-01

4123

Smith

216

211-02

4123

Smith

216

214-01

3.        Second Normal Form: Eliminate Redundant Data

Note the multiple Class# values for each Student# value in the above table. Class# is not functionally dependent on Student# (primary key), so this relationship is not in second normal form.

The following two tables demonstrate second normal form: 

Students:

Student#

Advisor

Adv-Room

1022

Jones

412

4123

Smith

216

4.       

Registration:

Student#

Class#

1022

101-07

1022

143-01

1022

159-02

4123

201-01

4123

211-02

4123

214-01

5.        Third Normal Form: Eliminate Data Not Dependent On Key

In the last example, Adv-Room (the advisor's office number) is functionally dependent on the Advisor attribute. The solution is to move that attribute from the Students table to the Faculty table, as shown below:

Students:

Student#

Advisor

1022

Jones

4123

Smith

6.       

Faculty:

Name

Room

Dept

Jones

412

42

Smith

216

42

Comments