On Calibration of Modern Neural Networks, Proceedings of the 34th International Conference on Machine Learning(2017)
Conﬁdence calibration – the problem of predicting probability estimates representative of the true correctness likelihood – is important for classiﬁcation models in many applications. We discover that modern neural networks, unlike those from a decade ago, are poorly calibrated. Through extensive experiments, we observe that depth, width, weight decay, and Batch Normalization are important factors inﬂuencing calibration. We evaluate the performance of various post-processing calibration methods on state-ofthe-art architectures with image and document classiﬁcation datasets. Our analysis and experiments not only offer insights into neural network learning, but also provide a simple and straightforward recipe for practical settings: on most datasets, temperature scaling – a singleparameter variant of Platt Scaling – is surprisingly effective at calibrating predictions.