Home‎ > ‎Database‎ > ‎

Database Normalization

Definition

Normalization is the process of organizing data in a database.  [1]


Normalization identifies and eliminates information by applying a set of rules to your tables to confirm that they are structured properly. 


If you create a database without normalization you may fall upon a few complications. Complications include the inability of your tables communicating properly in order to create a report you may need. There are five rules included in database normalization. Databases can be classified by the amount of rules they follow. When following the first rule, the data base is considered to be in Normal Form. Using rule number on and rule number 2, your database has achieved second normal form. Once you have implemented rule number three into your database you have achieved third normal form (3NF). 3NF is considered the most common level of normalization that a database has achieved. Databases are able to achieve a higher level of normalization by implemented rules four and five. The rules are listed below.


Example of Normalized Databases

The database featured below attains the goals of normalization and achieve normal form standards.











Rules of Database Normalization


Rule #1 [Normal Form]: 

  • Eliminate repeating groups in individual tables.
  • Create a separate table for each set of related data.
  • Identify each set of related data with a primary key.

Do not use multiple fields in a single table to store similar data. For example, to track an inventory item that may come from two possible sources, an inventory record may contain fields for Vendor Code 1 and Vendor Code 2.

What happens when you add a third vendor? Adding a field is not the answer; it requires program and table modifications and does not smoothly accommodate a dynamic number of vendors. Instead, place all vendor information in a separate table called Vendors, then link inventory to vendors with an item number key, or vendors to inventory with a vendor code key.[4]


Rule #2 [Second Normal Form]:

  • Create separate tables for sets of values that apply to multiple records.
  • Relate these tables with a foreign key.

Records should not depend on anything other than a table's primary key (a compound key, if necessary). For example, consider a customer's address in an accounting system. The address is needed by the Customers table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of storing the customer's address as a separate entry in each of these tables, store it in one place, either in the Customers table or in a separate Addresses table.[4]


Rule #3 [Third Normal Form; 3NF]: 

  • Eliminate fields that do not depend on the key.

Values in a record that are not part of that record's key do not belong in the table. In general, any time the contents of a group of fields may apply to more than a single record in the table, consider placing those fields in a separate table.

For example, in an Employee Recruitment table, a candidate's university name and address may be included. But you need a complete list of universities for group mailings. If university information is stored in the Candidates table, there is no way to list universities with no current candidates. Create a separate Universities table and link it to the Candidates table with a university code key.

EXCEPTION: Adhering to the third normal form, while theoretically desirable, is not always practical. If you have a Customers table and you want to eliminate all possible interfield dependencies, you must create separate tables for cities, ZIP codes, sales representatives, customer classes, and any other factor that may be duplicated in multiple records. In theory, normalization is worth pursing. However, many small tables may degrade performance or exceed open file and memory capacities.

It may be more feasible to apply third normal form only to data that changes frequently. If some dependent fields remain, design your application to require the user to verify all related fields when any one is changed. [4]


Rule #4: Isolate Independent Multiple Relationships. See Rules of Data Normalization for more information.


Rule #5: Isolate Semantically Related Multiple Relationships. See Rules of Data Normalization for more information.

Comments