There are 2 Issues in classification:
DATA PREPARATION: The preprocessing steps may be applied to the data for classification and prediction are : Data cleaning ,feature selection, and data transformation.
• Data cleaning: This preprocesses the data in order to reduce noise and handle missing values.
• Data transformation: it is used to generalize or normalize data.
• Relevance analysis: Removes irrelevant or redundant attributes.
EVALUATING CLASSIFICATION METHODS:
Hypothesis are used to infer classification of examples in the test set .
Accuracy gives percentage of examples in the test set that are classified correctly.
Other attributes used to evaluate classification methods:
• Speed and Scalability: Time to construct the model and also time to use the model.
• Robustness: This is the ability of the classifier to make correct predictions given noisy data or data with missing values
• Scalability: This refers to the ability to construct the classifier efficiently given large amounts of data.
• Interpretability: This refers to the level of understanding and insight that is provided by the classifier
• Goodness of rules: Decision tree size compactness of classification rules.
Classification methods include:
K Nearest Neighbor
• Bayesian classifiers are statistical classifiers.
• They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class.
• Each Bayesian example can incrementally increase or decrease the probability that a hypothesis is correct-prior knowledge can be combined with observed data.
• Bayesian classification is based on Bayesian theorem.
• Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.
Naïve Bayesian classifiers:
• These assume that the effect of an attribute value on a given class is independent of the values of the other attributes.
• This assumption is called class conditional independence.
• It is made to simplify the computations involved in this.
• The purpose of Bayesian theorem is to predict the class label for a given tuple.
• Let X be a data tuple.
• In Bayesian terms, X is considered “evidence.”
• it is described by measurements made on a set of n attributes.
• Let H be some hypothesis, such as that the data tuple X belongs to a specified class C.
• For classification problems, we are looking for the probability that tuple X belongs to class C, given that we know the attribute description of X.