The time between the years has been quite busy for us. Among other activities, we created our new training Methods and issues in explainable AI and put considerable effort into the development of our open source libraries such as pyDVL: the python Data Valuation Library. Nevertheless, we also managed to find some time to scout the literature. Here is a summary of what we found interesting.
Offline reinforcement learningCritic Regularized Regression
A simple but powerful algorithm for offline reinforcement learning, which can be seen as a combination of behavior cloning and Q-learning, and sets a new state-of-the-art in many tasks.
Simulation-based inferenceThe Frontier of Simulation-based Inference
An overview and schematic comparison of recent developments in simulation-based inference and their enabling factors. Advancements in ML, Active Learning and Augmentation are named as the three driving forces in the field.
Interpretable machine learning
Mixture of Decision Trees for Interpretable Machine Learning
A linear gating function together with multiple expert decision trees is trained by expectation-maximization and results in a new, fully interpretable model that works well on several (simple) data sets.
Representation learning with BYOL and SimSiam BYOL was the first work to show how useful low-dimensional representations can be when learned in an unsupervised way without negative sampling. It inspired a series of simpler architectures, with SimSiam among them.
Data valuationWhat Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation
When data samples are difficult to learn, neural networks tend to memorize their labels rather than infer useful features. This has been shown to improve, rather than limit, their accuracy. A recent paper introduces a few key concepts that help investigate this phenomenon.Beyond neural scaling laws: beating power law scaling via data pruning
Large neural networks are very “data hungry”, but data pruning holds the promise to alleviate this dependence. A recent paper studies the use of different pruning strategies and metrics, with encouraging results.CS-Shapley: Class-wise Shapley Values for Data Valuation in Classification
Using in-class accuracy to up-weight the value of a data point and out-of-class accuracy as a discounting factor, the authors define a new utility function that is better suited for valuation in classification tasks.