Short note on Issues in classification and explain any one technique of classification.
1 Answer

There are 2 Issues in classification:

DATA PREPARATION: The preprocessing steps may be applied to the data for classification and prediction are : Data cleaning ,feature selection, and data transformation.

• Data cleaning: This preprocesses the data in order to reduce noise and handle missing values.

• Data transformation: it is used to generalize or normalize data.

• Relevance analysis: Removes irrelevant or redundant attributes.


Hypothesis are used to infer classification of examples in the test set .

Accuracy gives percentage of examples in the test set that are classified correctly.

Other attributes used to evaluate classification methods:

• Speed and Scalability: Time to construct the model and also time to use the model.

• Robustness: This is the ability of the classifier to make correct predictions given noisy data or data with missing values

• Scalability: This refers to the ability to construct the classifier efficiently given large amounts of data.

• Interpretability: This refers to the level of understanding and insight that is provided by the classifier

• Goodness of rules: Decision tree size compactness of classification rules.

Classification methods include:

  1. Decision tree

  2. Bayesian classification

  3. Rule based

  4. K Nearest Neighbor

Bayesian Classification:

• Bayesian classifiers are statistical classifiers.

• They can predict class membership probabilities, such as the probability that a given tuple belongs to a particular class.

• Each Bayesian example can incrementally increase or decrease the probability that a hypothesis is correct-prior knowledge can be combined with observed data.

• Bayesian classification is based on Bayesian theorem.

• Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases.

Naïve Bayesian classifiers:

• These assume that the effect of an attribute value on a given class is independent of the values of the other attributes.

• This assumption is called class conditional independence.

• It is made to simplify the computations involved in this.

Bayesian Theorem:

• The purpose of Bayesian theorem is to predict the class label for a given tuple.

• Let X be a data tuple.

• In Bayesian terms, X is considered “evidence.”

• it is described by measurements made on a set of n attributes.

• Let H be some hypothesis, such as that the data tuple X belongs to a specified class C.

• For classification problems, we are looking for the probability that tuple X belongs to class C, given that we know the attribute description of X.

enter image description here

Please log in to add an answer.

Continue reading...

The best way to discover useful content is by searching it.