Define Outlier and different types of Outlier
1 Answer

What Are Outliers?

  • Outlier. A data object that deviates significantly from the normal objects as if it were generated by a different mechanism

  • Outliers are different from the noise data

  • Noise is random error or variance in a measured variable

  • Noise should be removed before outller detection

  • Outliers are interesting it violates the mechanism that generates the

  • Outlier detection vs. novelty detection: early stage, outlier, but later merged into the model


  • Credit card fraud detection

  • Telecom fraud detection

  • Customer segmentation

  • Medical analysis

Types of Outliers

Three kinds: global, contextual and collective outliers

1.Global outlier (or point anomaly)

  • If it significantly deviates from the rest of the data set.

  • Ex. Intrusion detection in computer networks

    • Issue: Find an appropriate measurement of deviation

2.Contextual outlier (or conditional outler)

  • Object is O, if it deviates significantly based on a selected context

  • Ex. 80° F in Urbana: outlier? (depending on summer or winter?)

  • Attributes of data objects should be divided into two groups

  • Contextual attributes: defines the context, e.g. time & location

  • Behavioral attributes: characteristics of the object, used in Can be viewed as a generalization of local outliers-whose density

  • significantly deviates from its local area Issue: How to define or formulate meaningful context?

Collective Outliers

  • A subset of data objects collectively deviate significantly from the whole data set, even if the individual data objects may not be outliers

  • Applications: E.g.. Intrusion detection:

    • When a number of computers keep sending denial-of-service packages to each other.

    • Detection of collective outliers

  • Consider not only behavior of individual objects, but also that of groups of objects

  • Need to have the background knowledge on the relationship among data objects, such as a distance or similarity measure on objects.

