Reference

Leveraging exploration in off-policy algorithms via normalizing flows, Bogdan Mazoure, Thang Doan, Audrey Durand, Joelle Pineau, R. Devon Hjelm. Conference on Robot Learning(2020)

Abstract

The ability to discover approximately optimal policies in domains with sparse rewards is crucial to applying reinforcement learning (RL) in many real-world scenarios. Approaches such as neural dens...