(b) (i) What is Normalization? Explain 1NF, 2NF, 3NF and BCNF with suitable example

Normalization of a data is a process of analyzing the given relation schema based on their FDs and primary keys to achieve the desirable properties of

Minimizing redundancy
Minimizing the insertion, deletion and update anomalies.

First Normal Form:

First Normal form imposes a very basic requirement on relations; unlike the other normal forms, it does not require additional information such as functional dependencies.

A domain is atomic if elements of the domain are considered to be indivisible units. We say that a relation schema R is in first normal form (1NF) if the domains of all attributes of R are atomic.

A set of names is an example of a nonatomic value. For example, if the schema of a relation employee included an attribute children whose domain elements are sets of names, the schema would not be in first normal form.

Composite attributes, such as an attribute address with component attributes street and city, also have nonatomic domains.

Second Normal Form:

2NF is based on the concept of full functional dependency.

A functional dependency X -> Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.

A relational schema R is in 2NF every non prime attribute A in R is fully functionally dependent on the PK of R.

The relation schema can be 2NF normalized into a number of 2NF relations in which non-prime attributes are associated only with the part of the PK on which they are fully functionally dependent.

Third Normal Form:

BCNF requires that all nontrivial dependencies be of the form α -> ß, where α is a superkey. 3NF relaxes this constraint slightly by allowing nontrivial functional dependencies whose left side is not a superkey.

A relation schema R is in third normal form (3NF) with respect to a set F of functional dependencies if, for all functional dependencies in F+ of the form α -> ß.

α -> ß is a trivial functional dependency.
α is a superkey of R.
Each attribute A in ß – α is contained in a candidate key for R.

The definition of 3NFallows
Certain functional dependencies that are not allowed in BCNF. A dependency α -> ß that satisfies only the third alternative of the 3NF definition is not allowed in BCNF, but is allowed in 3NF.

Boyce Code Normal Form:

A relation schema R is in BCNF if whenever a non trivial functional dependency X->A holds in R then X is a super key of R.

13_b_1

The only difference between the definition BCNF and 3NF is that which allows A to br prime, is absent from BCNF.

Suppose that we have thousands of lots in the relation but the lots are from only two countries XX and YY.

Lots sizes in XX country are 0.5, 0.6, 0.7, 0.8, 0.9, 0.10
Lots sizes in YY country are 1.1, 1.2… 2.0
We have additional FD,

Area->country name

Then, the relation schema is still in 3NF because country name is prime attribute.

The area of a LOT that determines the country, as specified by FD5, can be represented by a separate relation R (Area, country name). This representation reduces the redundancy of repeating the same information in thousand of LOTS A tuples.

LOTS 1A

FD5 violates BCNF in LOTS 1A because Area is not a super key of LOTS 1A.

FD5 satisfies 3NF in LOTS 1A because country name is a prime attribute, but this condition does not exist in the definition of BCNF.