Practical anomaly detection

An introduction to unsupervised ML techniques for anomaly detection and their strengths and weaknesses in different application areas.

Detecting anomalies is of high interest in multiple industries for identifying safety and security risks, ensuring production quality, or finding new business opportunities. However, identifying anomalies is difficult because they are, almost by definition, poorly represented in datasets.

Learning outcomes

  • Understanding qualitative and quantitative definitions of anomalies
  • Overview of theoretical foundations and practical implementations of multiple anomaly detection algorithms
  • Learning how to evaluate and compare performances of different algorithms

Structure of the workshop

Part 1: introduction, algorithms, exercises

  • Informal notion of anomaly and the types of anomaly detection like unsupervised learning, one class problem, class imbalances, etc.
  • Concise introductions to several algorithms, each chosen to represent a certain approach to anomaly detection.
  • Contamination Framework. Assumptions of the different algorithms
  • Anomaly Detection via Density Estimation.
  • Anomaly Detection via Isolation.
  • Awareness for relevant problem parameters like the degree of contamination, the clusteredness of anomalies, irrelevant dimensions, etc.
  • Evaluate and compare the algorithms' performance.
  • Anomaly Detection via reconstruction error.

Part 2: anomaly detection in time series

  • Anomaly types: Point, context and pattern anomalies.
  • Preprocessing techniques for anomaly detection in time series.
  • Context anomalies, regimes and the hidden Markov model.
  • Pattern anomalies and maximal discords.
  • Extreme Value Theory and GEV distributions.
  • Exercises. Detecting low and high values. Exploring, studying and detecting anomalies ride-share data.

    xkcd.com

In this series