Self-supervised state space geometry learning and goal generation for reinforcement learning

Reinforcement learning methods can solve difficult tasks when provided with human-engineered learning signals such as shaped rewards or goal curricula. Since these learning signals are time-consuming to manually engineer, we want to automatically generate them. When the state space is infinite, methods for generating learning signals often fail due to inappropriately assuming that the state space has a Euclidean geometry (e.g. noisy TV problem).

The approach of this project is two-fold:

1. Learn a model of the state space geometry (a distance function or "state embedding") based on the heuristic that consecutive states in agent trajectories should be similar.
2. Use the learned geometry to automatically generate goals based on the heuristic that goals should not be too easy or too hard to reach.

Our group already has a concrete idea of how to do this. Roughly speaking, a supervised learning signal for a geometry model can be computed from a graph of agent trajectories. By construction, the learned geometry is formally related to probabilities of reaching states and therefore can be used to generate goal states.

Some key problems to be solved include:

A. Computing the supervised learning signal naively has an O(n^3) time complexity.
B. Bootstrapping is required to merge similar states in trajectory graphs (a geometry model is required to learn the geometry model).
C. The geometry model and learning method that uses the supervised learning signal must be engineered (likely using deep learning).
D. Several gaps in the goal generation method must be filled.

We are looking for 1-3 highly motivated master students interested in algorithm development and deep reinforcement learning. Students must have strong programming skills (Python) and solid fundamental knowledge of linear algebra, probability theory, algorithmics and machine learning. Any student working on (C) will benefit from strong deep learning engineering skills. Due to the nature of learning algorithm development, students will need tenacity, curiosity and creativity. The project will be a close collaboration in a team comprising the students and supervisor (Alexander Nedergaard), so teamwork and communication are essential.

If you are interested, please send a brief (!) email to anederga@ethz.ch including your interest, relevant experience and CV.

back