1
2.9kviews
Explain various outlier detection methods.
1 Answer
0
100views

Outlier

  • An outlier is a data object that is exceptionally far from the mainstream of data.
  • An outlier is an object that deviates significantly from the rest of the objects and behaves differently.
  • It can be created due to measurement or execution errors.
  • Therefore, outlier detection can be defined as the process of detecting and then excluding outsiders from a given set of data.
  • But, there are no standardized outlier identification methods because these are mostly dataset-dependent.

Outlier Detection Methods

  • There are many methods or approaches used to detect abnormalities.
  • Based on that outlier detection methods can be categorized as follows:

Extreme Value Analysis -

  • This is a basic method and useful for 1-dimensional data.
  • In this method, values that are too large or too small are considered outliers.
  • Examples of this method are the z-test and t-test.
  • This method is generally used as the final step for interpreting outputs of other outlier detection methods.
  • Because this method is a good heuristic for the initial analysis of data but they do not have much value in multivariate settings.

Linear Approach -

  • In this outlier detection method, data is organized into a lower-dimensional sub-space by using linear correlations.
  • Then the distance of each data point to a plane that fits the sub-space is calculated.
  • This distance is used to find outliers.
  • An example of this method is Principal Component Analysis (PCA).

Probabilistic and Statistical Methods -

  • This method assumes particular distributions for data.
  • This method generally uses the expectation-maximization (EM) function to calculate parameters for the approach.
  • At last. find out the probability distribution for each data object.
  • The data object with a low probability is considered an outlier.

Proximity Methods -

  • In this method, outliers are considered as objects that are isolated from the rest of the data sets.
  • In this method, the object is considered an outlier if its neighborhood does not have enough other points.
  • In this method, the object is considered an outlier if its density is relatively much lower than that of its neighbors.
  • Examples of this type of method are Cluster analysis, density-based analysis, and nearest neighborhood.

Information-theoretical methods -

  • In this method, outliers increase the minimum code length to describe a data set.
  • These methods measure the regularity of audit data and perform appropriate data transformations.
  • Dataset used this method has high regularity.
  • This method uses relative entropy to determine whether the approach is suitable for a new dataset.
Please log in to add an answer.