anomaly detection is an unsupervised learning algorithm to find unusual events.

What's anomaly?

An example is considered anomaly if some of its features has lower probability than a given small probability

( is a vector of features)

Assumptions

Anomalies in data occur only very rarely
The features of data anomalies are significantly different from those of normal instances

Applications

Anomalous data is linked to some sort of problem or rare event such as

Hacking
Fraud
New, unseen defects (previously seen defects can be detected using supervised learning)
Textual errors

anomaly detection VS supervised learning

When to use

anomaly detection supervised learning
very small number of anomalous/positive examples comparable number of positive & negative examples
future anomalies may completely differ from any of anomalous examples trained enough positive examples in the training set, future positive examples are similar e.g. spam

anomaly detection	supervised learning
very small number of anomalous/positive examples	comparable number of positive & negative examples
future anomalies may completely differ from any of anomalous examples trained	enough positive examples in the training set, future positive examples are similar e.g. spam

Algorithm

(ML Spec) Given a training set with examples, each example has features

Choose features for anomaly

(How to choose features) Choose features, denoted , that might be indicative of anomalies

They should

be normally distributed. When doing exploratory data analysis, plot their histograms.
take on unusually large or small values in the event of anomalies

Work with non-normal features

If features are skewed, try transforming using

Fit parameters

Estimate parameter for normal distribution using maximum likelihood estimation

import numpy as np
 
def estimate_gaussian(X): 
    """
    Calculates mean and variance of all features 
    in the dataset
    
    Args:
        X (ndarray): (m, n) Data matrix
    
    Returns:
        mu (ndarray): (n,) Mean of all features
        var (ndarray): (n,) Variance of all features
    """
    m, n = X.shape
    mu = 1 / m * np.sum(X, axis = 0)
    var = 1 / m * np.sum(np.square(X-mu), axis = 0)        
	return mu, var

Fit parameters for ALL features:

Density estimation

Compute for each new example

Anomaly if

Evaluation

Real-number evaluation

(ML Spec) When developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm.

Fit model on non-anomalous labeled training set
Tune parameters on cross-validation set with few anomalies

You can't use 'macro parameter character #' in math modey=\begin{cases} 1 \quad \text{if } p(x) <\epsilon \; \text{(anomaly)} \\ 0 \quad \text{if } p(x) \geq \epsilon \; \text{(normal)} \end{cases}$$ 3. Test on a test set (including few anomalies) > [!example]- Eg. Detect flawed aircraft engines > > ![](https://i.imgur.com/H8tQVVn.jpg) An alternative is to have no test set, with all anomalous examples in cross-validation set. However, it should only be used when there are very few labeled anomalous examples, because no test set leads to higher risk of [[overfitting]] ## Error analysis Common problem: - We want $p(x)$ large for normal examples $x$ and small for anomalous examples, but $p(x)$ is comparable (e.g. both large) for both types ➡️ the algorithm fails to flag some examples as anomalies [[error analysis]]: - Manually look through should-be anomalous examples - **Identify additional, unused features that distinguish anomalies from normal examples**

My (Chiffon) Nguyen

Explorer

anomaly detection

Assumptions

Applications

Algorithm

Choose features for anomaly

Fit parameters

Density estimation

Evaluation

Real-number evaluation

Graph View

Table of Contents

Backlinks