Isolation Forests: The good, the bad and the ugly

Anomaly detection is one of the main methods behind numerous real life machine learning use cases such as predictive maintenance, network intrusion detection, system health monitoring, fraud detection and novelty detection. Because of the high relevance and sensitivity of many application areas, robustness and reliability are main concerns when designing anomaly detection systems. A good understanding of the mathematical principles behind the algorithms that are used in practice is therefore highly desirable.

In this talk we will investigate a simple algorithm called isolation forest which has gained large popularity over the last decade. Despite of its success, the reasons for the good performance of isolation forest are currently only partially understood. We review some of the recent literature which shows strength and weaknesses of the algorithm and conclude with a few observations which might lead to further research directions.

References

[Guh16R]

Robust Random Cut Forest Based Anomaly Detection on Streams, Sudipto Guha, Nina Mishra, Gourav Roy, Okke Schrijvers.

Jun 2016

In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. We investigate a robust random cut data structure that can be used as a sket...

[Har19E]

Extended Isolation Forest, Sahand Hariri, Matias Carrasco Kind, Robert J. Brunner.

2019

We present an extension to the model-free anomaly detection algorithm, Isolation Forest. This extension, named Extended Isolation Forest (EIF), resolves issues with assignment of anomaly score to given data points. We motivate the problem using heat maps for anomaly scores. These maps suffer from artifacts generated by the criteria for branching operation of the binary tree. We explain this …

[Liu08I]

Isolation Forest, Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou.

Dec 2008

Most existing model-based approaches to anomaly detection construct a proﬁle of normal instances, then identify instances that do not conform to the normal proﬁle as anomalies. This paper proposes a fundamentally different model-based method that explicitly isolates anomalies instead of proﬁles normal points. To our best knowledge, the concept of isolation has not been explored in current …

[Sid16F]

Finite Sample Complexity of Rare Pattern Anomaly Detection, Amran Siddiqui, Alan Fern, Thomas G Dietterich, Shubhomoy Das.

2016

Anomaly detection is a fundamental problem for which a wide variety of algorithms have been developed. However, compared to supervised learning, there has been very little work aimed at understanding the sample complexity of anomaly detection. In this paper, we take a step in this direction by introducing a Probably Approximately Correct (PAC) framework for anomaly detection based on the …

In this series →

Anomaly Detection

Trainings: Practical anomaly detection Blog: The hidden assumptions and pitfalls of … Pills: Anomaly Transformer: Time Series Anomaly … Pills: Graph Augmented Normalizing Flows for … Seminar: Graph Anomaly Detection: Robustness …