- Database normalization is a technique of organizing the data in the database.
Normalization of data can be considered a process of analysing the given relation schemas based on their Functional Dependencies and primary keys to achieve the following properties:
i. Minimizing redundancy
ii. Minimizing the insertion, deletion, and update anomalies
iii. Ensuring data is stored in correct table
It can be considered as a filtering process to make the design have successively better quality
- It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.
- Without normalization it becomes difficult to handle and update database without facing data loss.
- The various forms of normalization are described below:
I. First Normal Form (1NF):
- First normal form (1NF) states that the domain of an attribute must include only atomic values and that the value of any attribute in a tuple must be a single value from the domain of that attribute.
- A relation is said to be in 1NF if it contains no non-atomic values and each row can provide a unique combination of values.
- 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple.
- The only attribute values permitted by 1NF are single atomic values.
- Example: Un-Normalized Table-
Normalized Table: Any Row must not have a column in which more than one value is saved, instead data is separated in multiple rows as shown below.
II. Second Normal Form (2NF):
- A relation is said to be in 2NF, if it is already in 1NF and each and every attribute fully depends on the primary key of the relation.
- There must not be any partial dependency of any column on the primary key.
- Second normal form (2NF) is based on the concept of full functional dependency. A functional dependency X -> Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.
- A functional dependency X->Y is a partial dependency if some attribute A belongs to X can be removed from X and the dependency still holds.
Stud_Name depends on Stud_ID and Proj_Name depends on Proj_ID
The above table can be normalized to 2NF as shown below.
Student Table in 2NF
Project Table in 2NF
III. Third Normal Form (3NF):
- A relation is said to be in 3NF, if it is already in 2NF and there exists no transitive dependency in that relation.
- If a table contains transitive dependency, then it is not in 3NF, and the table must be split to bring it into 3NF.
What is a transitive dependency?
A -> B [B depends on A] & B -> C [C depends on B]
Then A -> C[C depends on A] can be derived.
- Example:Below table not in 3NF
Stud_ID is the only prime key attribute. City can be identified by Stu_ID as well as Zip. Neither Zip is a superkey nor City is a prime attribute.
Stud_ID -> Zip -> City, so there exists transitive dependency. Hence 3NF table is below
IV. Boyce-Codd Normal Form (BCNF):
- BCNF is an extension of Third Normal Form in strict way.
- A relationship is said to be in BCNF if it is already in 3NF and for any non-trivial functional dependency, X -> A, then X must be a super-key.
- A 3NF table which does not have multiple overlapping candidate keys is said to be in BCNF.
In the 3NF example,Stud_ID is super-key in Student_Detail relation and Zip is super-key in ZipCodes relation.
So Stud_ID ->Stud_Name, Zip and
Confirms, that both relations are in BCNF.