What is Normalization in DBMS and why we need it?
Let's first discuss Normalization. It is the process of organizing data in a database. Where database structure is organized by splitting up the larger tables into smaller and meaningful tables.
The database is an important component of software projects where database operations are required. The efficiency and the accuracy of the software application depend upon the database performance. During the database design process, the database designers aim to build a database that stores the data most efficiently and robustly.
Database normalization is an important step that helps the database designers to eliminate the duplication of data which leads to database anomalies.
Problems without Normalization
Data Redundancy Problem
Data Redundancy is where you store the same data many times (duplicate data) in your table. This repeated data needs to be typed in over and over again which takes a long time.
The way to avoid the data redundancy problems we use the database normalization concept.
Database Anomalies
The problems caused due to redundant data which is a duplication data field in multiple records are referred to as database anomalies. This creates many database anomalies that include :
- Insertion anomaly: This happens when inserting vital data into the database is not possible because other data is not already there. For example, if a system is designed to require that a customer is on file before a sale can be made to that customer, but you cannot add a customer until they have bought something, then you have an insert anomaly.
- Deletion Anomalies: This happens when the deletion of unwanted information causes desired information to be deleted as well. For example, if a single database record contains information about a particular product along with information about a salesperson for the company and the salesperson quits, then information about the product is deleted along with salesperson information.
- Update Anomalies: This happens when the person charged with the task of keeping all the records current and accurate, is asked, for example, to change an employee’s title due to a promotion. If the data is stored redundantly in the same table, and the person misses any of them, then there will be multiple titles associated with the employee. The end-user has no way of knowing which is the correct title.
Normalization Rules
First Normal Form: The table must satisfy the following rules.
- Each table cell should have an atomic value.
- Each column (attribute ) name should be unique.
- No repeating column groups allowed.
- The table must have a primary key.
Second Normal Form: The table must satisfy the following rules.
- The table must be in the first normal form.
- The table should not have any partial dependencies.
Third Normal Form: The table must satisfy the following rules.
- The table must be in the second normal form.
- The table should not have any transitive dependencies.
Boyce-Codd Normal Form(BCNF): The BCNF standard is a higher version of the third normal form (3NF). The BCNF was developed to handle the remaining redundancies that may still exist in the database. The table must satisfy the following rules.
- The table should be in the third normal form (3NF).
- For functional dependency, A -> B, A should be a Super key.
Conclusion: Through the process of database normalization we bring our schema’s tables into conformance with progressive normal forms. As a result, our tables each represent a single entity (a book, an author, a subject, etc) and we benefit from decreased redundancy, fewer anomalies, and improved efficiency.