The task of performing a sequence of decisions in an optimal way is ubiquitous in industrial applications. From controlling plasma in a fusion reactor, to driving cars or playing games: automatically performing sequences of decisions is required in most domains. In recent years, more and more tasks that were not traditionally thought of as sequential decision-making were formulated as such, for example chip design or video compression.
This training gives introduction to deep reinforcement learning (RL) for practitioners, where particular focus is placed on learning safe policies (relevant in applications with constrained control spaces) and sample efficient algorithms. We will discuss the different forms that a decision process can take, how a variety of tasks can be formulated as such processes, and a selection of the multitude of RL methods that can be used for optimally solving them.
As in , we will analyse the stability during training as well as the gap between simulation and reality (sim2real). For the latter, we will apply our algorithms on a real robot, like a Franka Emika arm.
Participants will learn about the most important recent advances in deep RL, and obtain an intuition for when and how to use RL techniques (and also, maybe more importantly, when not to). The learnings include:
- Basics of deep model-free and model-based RL
- Which problems are suitable for an approach RL
- Exploration vs. exploitation
- On-policy, off-policy and offline learning
- Various deep RL algorithms
- Effects of reward engineering
- Sample efficiency, safety and robustness in RL
- “Classical” deep RL: Different Variants of Q-Learning
- Combining Q-learning with planning: self-play
- Continuous actions: DDPG and SAC
- Learning safe policies in constrained control spaces
- Sample efficiency in model-free RL
- Imitation learning and offline RL
- Software for RL 1: designing environments and evaluation scenarios
- Software for RL 2: parallelization, vectorization
- Basics of deep model-based RL, an introduction to mu-zero