DATABASE NORMALIZATION: Is the process by which databases are efficiently organized. When it comes to Database Normalization, there are two main goals: eliminating redundant data as well as ensuring that database dependencies make sense. By ensuring that these two goals be met, overcrowding and non-logical storage options are greatly reduced resulting in a concise and efficient data base system. In order to ensure that a set of guideline be followed by the data-basing community, a set of "normal forms" were created and are numbered 1NF-5NF. While there are 5 different forms, the first three are most commonly used with the fourth being added in some exceptions. Very rarely will the 5th form be utilized. These forms start with a minimum level of normalization and ascend as the numbers rise making each step more thorough. Material paraphrased from http://databases.about.com/od/specificproducts/a/Should-I-Normalize-My-Database.htm
Normal Forms 1-4:
NF 1: The first normal form is a very basic set of guidelines and rules used for the initial phases of the database process. The two main goals of this form are as follows; to eliminate redundant information from multiple tables in the same database, as well as to create separate tables for each group of related data and relative each row with a unique column or in some cases a set of columns. This is also known as the primary key.
NF2: The second form continues the process of helping to eliminate duplicate data. This form also continues to remove redundant data that shows up on multiple tables. The difference between the first and second is that the second will usually create a new table for these subsets so as to differentiate between the clean data versus data with redundancies. Last, this table begins to create the relationships between these new tables and their predecessors through the utilization of foreign keys.
NF3: The third form takes a larger step than the step taken between forms 1 and 2 in that it removes all of the columns that are not dependent on the primary key. Additionally, every determinant must be a candidate key. What a candidate key is, is a combination of attributes that can be used to identify a unique database record without any extras or unnecessary information. Further, one of these candidate keys is still chosen as the table’s primary key.
NF4: The fourth form meets all of the requirements that the third form does while in addition the fourth form has no multi-valued dependencies. A multi-valued dependency occurs when the presence of one or more rows in a table implies the presence of one or more other rows in that same table. Once again, NF4 is able to meet all of the requirements of NF1-3 in addition to the elimination of muti-valued dependencies.
Now that you know about the different forms, when should you normalize?
When given an opportunity, make an attempt to normalize your database. Once again it helps you reduce redundant information which optimizes performance. Make sure that when you begin the process that you follow the correct procedure and understand that the forms (NF1-4) build on each other. In circumstances in which databases are to be accesses by larger numbers of people or where time and money can be allocated freely, the process of normalization is recommended. Overall, if the means are there to normalize, however if there are confounding factors that prevent it, de-normalization is not always a recipe for disaster.
When should you not normalize?
While it is recommended that databases be normalized for the purpose of efficiency and ease of use, the process of normalization does take time and effort. Sometimes normalization causes vast amounts of tables and relationships to be created. The process of interweaving all the information as well as removing redundant data can take time, energy and monetary resources out of a person or a business. If the database still works in the de-normalization phase, then some suggest moving on and not spending these resources. Additionally, for smaller projects and for work that doesn’t require in depth analysis the normalization process is sometimes unnecessary. Realize however that this may equate to having to start the process over or correcting errors as they occur throughout the process.
Normalizing an Example Table: example taken from http://support.microsoft.com/kb/283878
This example highlights the process of going through the first three Normal Forms.
1. Unnormalized table:
First Normal Form: No Repeating Groups
Second Normal Form: Eliminate Redundant Data
Third Normal Form: Eliminate Data Not Dependent On Key