Short note on Issues in classification and explain any one technique of classification.

634views

written 8.1 years ago by

There are 2 Issues in classification:

DATA PREPARATION: The preprocessing steps may be applied to the data for classification and prediction are : Data cleaning ,feature selection, and data transformation.

• Data cleaning: This preprocesses the data in order to reduce noise and handle missing values.

• Data transformation: it is used to generalize or normalize data.

• Relevance analysis: Removes irrelevant or redundant attributes.

EVALUATING CLASSIFICATION METHODS:

Hypothesis are used to infer classification of examples in the test set .

Accuracy gives percentage of examples in the test set that are classified correctly.

Other attributes used to evaluate classification methods:

• Speed and Scalability: Time to construct the model and also time to use the model.

• Robustness: This is the ability of the classifier to make correct predictions given noisy data or data with missing values

• Scalability: This refers to the ability to construct the classifier efficiently given large amounts of data.

• Interpretability: This refers to the level of understanding and insight that is provided by the classifier

• Goodness of rules: Decision tree size compactness of classification rules.

Classification methods include:

Decision tree
Bayesian classification
Rule based
K Nearest Neighbor

Bayesian Classification:

• Bayesian classifiers are statistical classifiers.

• They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class.

• Each Bayesian example can incrementally increase or decrease the probability that a hypothesis is correct-prior knowledge can be combined with observed data.

• Bayesian classification is based on Bayesian theorem.

• Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.

Naïve Bayesian classifiers:

• These assume that the effect of an attribute value on a given class is independent of the values of the other attributes.

• This assumption is called class conditional independence.

• It is made to simplify the computations involved in this.

Bayesian Theorem:

• The purpose of Bayesian theorem is to predict the class label for a given tuple.

• Let X be a data tuple.

• In Bayesian terms, X is considered “evidence.”

• it is described by measurements made on a set of n attributes.

• Let H be some hypothesis, such as that the data tuple X belongs to a specified class C.

• For classification problems, we are looking for the probability that tuple X belongs to class C, given that we know the attribute description of X.

enter image description here

ADD COMMENT EDIT