1
2.9kviews
Explain various outlier detection methods.
0
100views

## Outlier

• An outlier is a data object that is exceptionally far from the mainstream of data.
• An outlier is an object that deviates significantly from the rest of the objects and behaves differently.
• It can be created due to measurement or execution errors.
• Therefore, outlier detection can be defined as the process of detecting and then excluding outsiders from a given set of data.
• But, there are no standardized outlier identification methods because these are mostly dataset-dependent.

## Outlier Detection Methods

• There are many methods or approaches used to detect abnormalities.
• Based on that outlier detection methods can be categorized as follows:

Extreme Value Analysis -

• This is a basic method and useful for 1-dimensional data.
• In this method, values that are too large or too small are considered outliers.
• Examples of this method are the z-test and t-test.
• This method is generally used as the final step for interpreting outputs of other outlier detection methods.
• Because this method is a good heuristic for the initial analysis of data but they do not have much value in multivariate settings.

Linear Approach -

• In this outlier detection method, data is organized into a lower-dimensional sub-space by using linear correlations.
• Then the distance of each data point to a plane that fits the sub-space is calculated.
• This distance is used to find outliers.
• An example of this method is Principal Component Analysis (PCA).

Probabilistic and Statistical Methods -

• This method assumes particular distributions for data.
• This method generally uses the expectation-maximization (EM) function to calculate parameters for the approach.
• At last. find out the probability distribution for each data object.
• The data object with a low probability is considered an outlier.

Proximity Methods -

• In this method, outliers are considered as objects that are isolated from the rest of the data sets.
• In this method, the object is considered an outlier if its neighborhood does not have enough other points.
• In this method, the object is considered an outlier if its density is relatively much lower than that of its neighbors.
• Examples of this type of method are Cluster analysis, density-based analysis, and nearest neighborhood.

Information-theoretical methods -

• In this method, outliers increase the minimum code length to describe a data set.
• These methods measure the regularity of audit data and perform appropriate data transformations.
• Dataset used this method has high regularity.
• This method uses relative entropy to determine whether the approach is suitable for a new dataset.