Compare and contrast the different techniques for anomaly detection that were presented in Section 10.1.2. In particular, try to identify circumstances in which the definitions of anomalies used in the different techniques might be equivalent or situations in which one might make sense, but another would not. Be sure to consider different types of data.

What will be an ideal response?

First, note that the proximity- and density-based anomaly detection tech-
niques are related. Specifically, high density in the neighborhood of a point

implies that many points are close to it, and vice-versa. In practice, density
is often defined in terms of distance, although it can also be defined using a
grid-based approach.

The model-based approach can be used with virtually any underlying tech-
nique that fits a model to the data. However, note that a particular model,

statistical or otherwise, must be assumed. Consequently, model-based ap-
proaches are restricted in terms of the data to which they can be applied.

For example, if the model assumes a Gaussian distribution, then it cannot
be applied to data with a non-Gaussian distribution.
On the other hand, the proximity- and density-based approaches do not
make any particular assumption about the data, although the definition of
an anomaly does vary from one proximity- or density-based technique to
another. Proximity-based approaches can be used for virtually any type
of data, although the proximity metric used must be chosen appropriately.
For example, Euclidean distance is typically used for dense, low-dimensional
data, while the cosine similarity measure is used for sparse, high-dimensional
data. Since density is typically defined in terms of proximity, density-based
approaches can also be used for virtually any type of data. However, the
meaning of density is less clear in a non-Euclidean data space.

Proximity- and density-based anomaly detection techniques can often pro-
duce similar results, although there are significant differences between tech-
niques that do not account for the variations in density throughout a data set
or that use different proximity measures for the same data set. Model-based
methods will often differ significantly from one another and from proximity-
and density-based approaches.

Computer Science & Information Technology

You might also like to view...

The ________ software license encourages the user to adapt and distribute the software

Fill in the blank(s) with correct word

Computer Science & Information Technology

A(n) ________ chart is useful for illustrating comparisons among related categories

Fill in the blank(s) with correct word

Computer Science & Information Technology