RL theory: adapting value representations in a non-stationary world

RL algorithms have received massive attention in part due to their mastery of computer games, such as Chess or Go. However, these games have quasi-stationary dynamics, i.e. the environment of the agent changes very slowly, which allows the RL algorithms to converge.
In the real word sudden dramatic changes in the environment can occur. Such dramatic changes are for example studied as self-organized criticality in economics, seismology, or evolutionary dynamics. How should an agent optimally adapt to such events? Previously, it has been suggested (1,2) that for example the learning rate should be optimally adapted in volatile environments.
In our lab, we study vocal learning in songbirds and try to elucidate the algorithmic principles governing it. As shown in the figure below taken from Ravbar et al. (3), zebra finches exhibit an elevated variance in phonological features of song syllables, while their own song still deviates from a targeted tutor song.
We hypothesize, the transient modulation of phonological variance is accompanied by modulation of internal value representation in the phonological space. The aim of the project is to investigate possible function approximation mechanisms by which the value function approaches the true (volatile) reward function. Comparing learning dynamics and simulated dopaminergic neuron firing rates of such mechanisms could yield valuable theoretical insights and experimental predictions.
In particular, we would like to incorporate RPE(reward prediction error) into a pre-existing GP(gaussian process) regression framework.
Methods: Mathematical modelling and literature search.
Supervision/ Questions:
Kanghwi Lee (kanlee (at) ini.ethz.ch), Hahnloser group
1. Behrens TEJ, Woolrich MW, Walton ME, Rushworth MFS. Learning the value of information in an uncertain world. Nat Neurosci. 2007 Sep;10(9):1214–1221.
2. Piray P, Daw ND. A simple model for learning in volatile environments. PLoS Comput Biol. 2020 Jul 1;16(7):e1007963.
3. Ravbar P, Lipkind D, Parra LC, Tchernichovski O. Vocal exploration is locally regulated during song learning. J Neurosci. 2012 Mar 7;32(10):3422–3432.

© 2023 Institut für Neuroinformatik