Isolation Forests: The good, the bad and the ugly

Anomaly detection is one of the main methods behind numerous real life machine learning use cases such as predictive maintenance, network intrusion detection, system health monitoring, fraud detection and novelty detection. Because of the high relevance and sensitivity of many application areas, robustness and reliability are main concerns when designing anomaly detection systems. A good understanding of the mathematical principles behind the algorithms that are used in practice is therefore highly desirable.

In this talk we will investigate a simple algorithm called isolation forest which has gained large popularity over the last decade. Despite of its success, the reasons for the good performance of isolation forest are currently only partially understood. We review some of the recent literature which shows strength and weaknesses of the algorithm and conclude with a few observations which might lead to further research directions.

References

  • Robust Random Cut Forest Based Anomaly Detection on Streams, Sudipto Guha, Nina Mishra, Gourav Roy, Okke Schrijvers. International Conference on Machine Learning (2016)
  • Extended Isolation Forest, Sahand Hariri, Matias Carrasco Kind, Robert J. Brunner. IEEE Transactions on Knowledge and Data Engineering (2019)
  • Isolation Forest, Fei Tony Liu, Kai Ming Ting, Zhi-Hua Zhou. 2008 Eighth IEEE International Conference on Data Mining (2008)
  • Finite Sample Complexity of Rare Pattern Anomaly Detection, Amran Siddiqui, Alan Fern, Thomas G Dietterich, Shubhomoy Das. (2016)

In this series