The calibrated classifier

Informed decision-making based on classifiers requires that the confidence in their predictions reflect the actual error rates. When this happens, one speaks of a calibrated model. Recent work showed that expressive neural networks are able to overfit the cross-entropy loss without losing accuracy, thus producing overconfident (i.e. miscalibrated) models. We analyse several definitions of calibration and the relationships between them, look into related empirical measures and their usefulness, and explore several algorithms to improve calibration.


  • Calibration for Anomaly Detection, Adrian Benton. 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining - KDD '19: Workshop on Anomaly Detection in Finance (2019)
  • The Well-Calibrated Bayesian, A. P. Dawid. Journal of the American Statistical Association (1982)
  • Setting decision thresholds when operating conditions are uncertain, Cèsar Ferri, José Hernández-Orallo, Peter Flach. Data Mining and Knowledge Discovery (2019)
  • On Calibration of Modern Neural Networks, Chuan Guo, Geoff Pleiss, Yu Sun, Kilian Q. Weinberger. Proceedings of the 34th International Conference on Machine Learning (2017)
  • Beyond temperature scaling: Obtaining well-calibrated multi-class probabilities with Dirichlet calibration, Meelis Kull, Miquel Perello Nieto, Markus Kängsepp, Telmo Silva Filho, Hao Song, Peter Flach. Advances in Neural Information Processing Systems 32 (2019)
  • Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings, Aviral Kumar, Sunita Sarawagi, Ujjwal Jain. International Conference on Machine Learning (2018)
  • Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles, Balaji Lakshminarayanan, Alexander Pritzel, Charles Blundell. Advances in Neural Information Processing Systems 30 (2017)
  • Focal Loss for Dense Object Detection, Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollár. arXiv:1708.02002 [cs] (2017)
  • Calibrating deep neural networks using focal loss, Jishnu Mukhoti, Viveka Kulharia, Amartya Sanyal, Stuart Golodetz, Philip Torr, Puneet Dokania. Advances in neural information processing systems (2020)
  • Predicting good probabilities with supervised learning, Alexandru Niculescu-Mizil, Rich Caruana. Proceedings of the 22nd international conference on Machine learning - ICML '05 (2005)
  • Regularizing Neural Networks by Penalizing Confident Output Distributions, Gabriel Pereyra, George Tucker, Jan Chorowski, Łukasz Kaiser, Geoffrey Hinton. arXiv:1701.06548 [cs] (2017)
  • Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods, John C. Platt. Advances in Large Margin Classifiers (1999)
  • Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters, Aniruddh Raghu, Omer Gottesman, Yao Liu, Matthieu Komorowski, Aldo Faisal, Finale Doshi-Velez, Emma Brunskill. arXiv:1807.01066 [cs, stat] (2018)
  • Evaluating model calibration in classification, Juozas Vaicenavicius, David Widmann, Carl Andersson, Fredrik Lindsten, Jacob Roll, Thomas Schön. The 22nd International Conference on Artificial Intelligence and Statistics (2019)
  • Calibration tests in multi-class classification: A unifying framework, David Widmann, Fredrik Lindsten, Dave Zachariah. Advances in Neural Information Processing Systems 32 (2019)

In this series