Reference

It’s Just Not That Simple: An Empirical Study of the Accuracy-Explainability Trade-off in Machine Learning for Public Policy, Andrew Bell, Ian Solano-Kamaiko, Oded Nov, Julia Stoyanovich. 2022 ACM Conference on Fairness, Accountability, and Transparency(2022)

Abstract

To achieve high accuracy in machine learning (ML) systems, practitioners often use complex “black-box” models that are not easily understood by humans. The opacity of such models has resulted in public concerns about their use in high-stakes contexts and given rise to two conflicting arguments about the nature — and even the existence — of the accuracy-explainability trade-off. One side postulates that model accuracy and explainability are inversely related, leading practitioners to use black-box models when high accuracy is important. The other side of this argument holds that the accuracy-explainability trade-off is rarely observed in practice and consequently, that simpler interpretable models should always be preferred. Both sides of the argument operate under the assumption that some types of models, such as low-depth decision trees and linear regression are more explainable, while others such as neural networks and random forests, are inherently opaque. Our main contribution is an empirical quantification of the trade-off between model accuracy and explainability in two real-world policy contexts. We quantify explainability in terms of how well a model is understood by a human-in-the-loop (HITL) using a combination of objectively measurable criteria, such as a human’s ability to anticipate a model’s output or identify the most important feature of a model, and subjective measures, such as a human’s perceived understanding of the model. Our key finding is that explainability is not directly related to whether a model is a black-box or interpretable and is more nuanced than previously thought. We find that black-box models may be as explainable to a HITL as interpretable models and identify two possible reasons: (1) that there are weaknesses in the intrinsic explainability of interpretable models and (2) that more information about a model may confuse users, leading them to perform worse on objectively measurable explainability tasks. In summary, contrary to both positions in the literature, we neither observed a direct trade-off between accuracy and explainability nor found interpretable models to be superior in terms of explainability. It’s just not that simple!