The time between the years has been quite busy for us. Among other activities, we created our new training Methods and issues in explainable AI and put considerable effort into the development of our open source libraries such as pyDVL: the python Data Valuation Library. Nevertheless, we also managed to find some time to scout the literature. Here is a summary of what we found interesting.
Offline reinforcement learning
Critic Regularized RegressionA simple but powerful algorithm for offline reinforcement learning, which can be seen as a combination of behavior cloning and Q-learning, and sets a new state-of-the-art in many tasks.
Simulation-based inference
The Frontier of Simulation-based InferenceAn overview and schematic comparison of recent developments in simulation-based inference and their enabling factors. Advancements in ML, Active Learning and Augmentation are named as the three driving forces in the field.
SBI approaches
Interpretable machine learning
Mixture of Decision Trees for Interpretable Machine Learning
Classification with decision tree ensemble
A linear gating function together with multiple expert decision trees is trained by expectation-maximization and results in a new, fully interpretable model that works well on several (simple) data sets.
Representation learning
Is keeping hard examples or keeping easy examples better
when doing data pruning?
Representation learning with BYOL and SimSiam
BYOL was the first work to show how useful low-dimensional representations can
be when learned in an unsupervised way without negative sampling. It inspired a
series of simpler architectures, with SimSiam among them.
Data valuation
What Neural Networks Memorize and Why: Discovering the Long Tail via Influence EstimationWhen data samples are difficult to learn, neural networks tend to memorize their labels rather than infer useful features. This has been shown to improve, rather than limit, their accuracy. A recent paper introduces a few key concepts that help investigate this phenomenon.
Beyond neural scaling laws: beating power law scaling via data pruningLarge neural networks are very “data hungry”, but data pruning holds the promise to alleviate this dependence. A recent paper studies the use of different pruning strategies and metrics, with encouraging results.
CS-Shapley: Class-wise Shapley Values for Data Valuation in ClassificationUsing in-class accuracy to up-weight the value of a data point and out-of-class accuracy as a discounting factor, the authors define a new utility function that is better suited for valuation in classification tasks.