Explain Data Mining as a step in KDD. Give the architecture of typical Data Mining system?
1 Answer

KDD Process (Knowledge Discovery in Database):

  • The term KDD refers to the broad process of finding knowledge in data, and emphasizes the high level application of particular data mining methods.
  • The goal of the KDD process is to extract knowledge from data in the context of large databases.

enter image description here

  • The overall process of finding and interpreting patterns from data involves the repeated application of the following steps:
  1. Developing an understanding of:

    • The application domain
    • The relevant prior knowledge
    • The goals of end user
  2. Creating a target data set:

    • Selecting a data set or focusing on a subset of variables or data samples on which discovery is to be performed.
  3. Data cleaning and preprocessing:

    • Removal of noise or outliers.
    • Strategies for handling missing data fields.
  4. Data reduction and projection:

    • Finding useful features to represent the data depending on the goal of the task.
  5. Choosing the data mining task:

    • Deciding whether the goal of the KDD process is classification, regression, clustering, etc.
  6. Choosing the data mining algorithm:

    • Selecting methods to be used for searching the pattern in the data.
    • Deciding which models and parameters may be appropriate.
    • Matching a particular data mining method with the overall criteria of the KDD process.
  7. Data mining:

    • Searching for patterns of interest in a particular representational form or a set of such representations as classification rules or tress, regression, clustering, and so forth.
  8. Interpreting mined patterns

  9. Consolidating discovered knowledge

Architecture of Typical Data mining system

enter image description here

  • Architecture of a typical data mining system may have the following major components as shown in fig:
  1. Database, data warehouse, or other information repository:

    • This is information repository.
    • Data cleaning and data integration techniques may be performed on the data.
  2. Databases or data warehouse server:

    • It fetches the data as per the users’ requirement which one need for data mining task.
  3. Knowledge base:

    • This is used to guide the search, and gives the interesting and hidden patterns from data.
  4. Data mining engine:

    • It performs the data mining task such as characterization, association, classification, cluster analysis etc.
  5. Pattern evaluation module:

    • It is integrated with the mining module and it give the search of only the interesting patterns.
  6. Graphical user interface:

    • This module is used to communicate between user and the data mining system and allow users to browse databases or data warehouse schemas.
Please log in to add an answer.