Why pre-processing is required?

Similar questions

What do you mean by pre-processing? Why is it required?

1 Answer

• Data preprocessing is a data mining technique that involves transforming raw data into an understandable format.

• Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends, and is likely to contain many errors.

• Data preprocessing is a proven method of resolving such issues.

Data in real world is :

• Incomplete:

o Data is incomplete because of lacking attribute values, lacking certain attributes of interest, or containing only aggregate data.

o eg: occupation=””.

• Noisy/Dirty:

o Noisy data contains errors or outliers

o eg: salary=”-10”.

• Inconsistent: Inconsistent data contains discrepancies (an illogical or surprising lack of compatibility or similarity between two or more facts.) in codes or names.

o Eg: Age=”43” Birthday=”07/04/1996”.

o Eg: Was rating “1, 2, 3”, now rating “A, B, C”.

Reasons preprocessing is required:

• Real-world data tend to be dirty, incomplete, and inconsistent.

• Data preprocessing techniques can improve the quality of the data, thereby helping to improve the accuracy and efficiency of the mining process.

• Data preprocessing makes quality decisions based on quality data.

• Data preprocessing detects data anomalies, rectifies them early, and reduces the data to be analyzed thus leading to huge payoffs for decision making.

Data Pre-processing is important as:

• Data warehouse needs consistent integration of Quality data.

• If there is no quality data, there will be no quality mining results.

Eg: duplicate or inconsistent data may cause incorrect or even misleading statistics.

Please log in to add an answer.