The library is still under active development and will be released soon as open source.
pyDVL is a library for data valuation (see our review). It explores and implements:
- Exact (combinatorial)
- Exact, for KNN
- Truncated Monte Carlo Shapley
- Sparsity aware
- KNN-Shapley surrogates
- Distributional Shapley
- Optimizations for linear regression, binary classification, kernel density estimation
- Influence functions
- Fast Hessian-Vectors with stochastic optimization for large models
- (Approximate) Maximum Influence Perturbation
In addition, we provide analyses of the strengths and weaknesses of key methods, as well as detailed examples.