Explain various outlier detection methods.

105views

written 2.3 years ago by

binitamayekar ★ 6.5k

Outlier

An outlier is a data object that is exceptionally far from the mainstream of data.
An outlier is an object that deviates significantly from the rest of the objects and behaves differently.
It can be created due to measurement or execution errors.
Therefore, outlier detection can be defined as the process of detecting and then excluding outsiders from a given set of data.
But, there are no standardized outlier identification methods because these are mostly dataset-dependent.

Extreme Value Analysis -

This is a basic method and useful for 1-dimensional data.
In this method, values that are too large or too small are considered outliers.
Examples of this method are the z-test and t-test.
This method is generally used as the final step for interpreting outputs of other outlier detection methods.
Because this method is a good heuristic for the initial analysis of data but they do not have much value in multivariate settings.

Linear Approach -

In this outlier detection method, data is organized into a lower-dimensional sub-space by using linear correlations.
Then the distance of each data point to a plane that fits the sub-space is calculated.
This distance is used to find outliers.
An example of this method is Principal Component Analysis (PCA).

Probabilistic and Statistical Methods -

This method assumes particular distributions for data.
This method generally uses the expectation-maximization (EM) function to calculate parameters for the approach.
At last. find out the probability distribution for each data object.
The data object with a low probability is considered an outlier.

Proximity Methods -

In this method, outliers are considered as objects that are isolated from the rest of the data sets.
In this method, the object is considered an outlier if its neighborhood does not have enough other points.
In this method, the object is considered an outlier if its density is relatively much lower than that of its neighbors.
Examples of this type of method are Cluster analysis, density-based analysis, and nearest neighborhood.

Information-theoretical methods -

In this method, outliers increase the minimum code length to describe a data set.
These methods measure the regularity of audit data and perform appropriate data transformations.
Dataset used this method has high regularity.
This method uses relative entropy to determine whether the approach is suitable for a new dataset.

ADD COMMENT EDIT