kyle - A toolkit for classifier calibration

There exist a number of different methods for calibrating models, and for measuring and visualizing calibration. Given the lack of reliable and maintained libraries, appliedAI is developing and open sourcing the library kyle to fill this gap.
The library is still in the alpha stage, under heavy development and breaking changes may happen at any time. Download the code and submit issues in our github repository.

Kyle is a python library that contains utilities for measuring and visualizing calibration of probabilistic classifiers as well as for recalibrating them. Currently, only methods for recalibration through post-processing are supported, although we plan to include calibration specific training algorithms as well in the future.

Kyle is model agnostic, any probabilistic classifier can be wrapped in a thin wrapper called CalibratableModel which supports multiple calibration algorithms.

Apart from tools for analysing models, kyle also offers support for developing and testing custom calibration metrics, algorithms and decision processes. In order not to have to rely on evaluation data sets and trained models for delivering labels and confidence vectors, kyle can construct custom samplers based on fake classifiers. These samplers can also be fit on an arbitrary data set (the outputs of a classifier together with the labels) in case one wants to mimic a real classifier using a fake one.

Using fake classifiers, an arbitrary number of ground truth labels and miscalibrated confidence vectors can be generated to streamline the analysis of calibration related algorithms (common use cases are e.g. analysis of variance and bias of calibration metrics and sensitivity of decision processes to miscalibration).

In this series