1

2.9kviews

Explain various outlier detection methods.

**1 Answer**

1

2.9kviews

Explain various outlier detection methods.

0

100views

written 2.0 years ago by |

- An outlier is a data object that is exceptionally far from the mainstream of data.
- An outlier is an object that deviates significantly from the rest of the objects and behaves differently.
- It can be created due to measurement or execution errors.
- Therefore, outlier detection can be defined as the process of detecting and then excluding outsiders from a given set of data.
- But, there are no standardized outlier identification methods because these are mostly dataset-dependent.

- There are many methods or approaches used to detect abnormalities.
*Based on that outlier detection methods can be categorized as follows:*

**Extreme Value Analysis -**

- This is a basic method and useful for 1-dimensional data.
- In this method, values that are too large or too small are considered outliers.
**Examples**of this method are theand*z-test**t-test.*- This method is generally used as the final step for interpreting outputs of other outlier detection methods.
- Because this method is a good heuristic for the initial analysis of data but they do not have much value in multivariate settings.

**Linear Approach -**

- In this outlier detection method, data is organized into a lower-dimensional sub-space by using linear correlations.
- Then the distance of each data point to a plane that fits the sub-space is calculated.
- This distance is used to find outliers.
- An
of this method is*example**Principal Component Analysis (PCA).*

**Probabilistic and Statistical Methods -**

- This method assumes particular distributions for data.
- This method generally uses the expectation-maximization (EM) function to calculate parameters for the approach.
- At last. find out the probability distribution for each data object.
- The data object with a low probability is considered an outlier.

**Proximity Methods -**

- In this method, outliers are considered as objects that are isolated from the rest of the data sets.
- In this method, the object is considered an outlier if its neighborhood does not have enough other points.
- In this method, the object is considered an outlier if its density is relatively much lower than that of its neighbors.
**Examples**of this type of method areand*Cluster analysis, density-based analysis,**nearest neighborhood.*

**Information-theoretical methods -**

- In this method, outliers increase the minimum code length to describe a data set.
- These methods measure the regularity of audit data and perform appropriate data transformations.
- Dataset used this method has high regularity.
- This method uses relative entropy to determine whether the approach is suitable for a new dataset.

ADD COMMENT
EDIT

Please log in to add an answer.