Behavioral Cloning on chips

alternate text Figure 1: A recurrent spiking network in a continuous control task. The spiking network is capable to control the joints of a simple 2D bipedal walker.
Supervisor: Giacomo Indiveri
Co-supervisor: Cristiano Capone
January 20, 2023

Abstract
Recurrent spiking neural networks (RSNN) in the brain learn to perform a wide range of perceptual, cognitive and motor tasks very efficiently in terms of energy consumption and their training requires very few examples. This motivates the search for biologically inspired learning rules for RSNNs, aiming to improve our understanding of brain computation and the efficiency of artificial intelligence. Several spiking models and learning rules have been proposed, but it remains a challenge to design RSNNs whose learning relies on biologically plausible mechanisms and are capable of solving complex temporal tasks. A target-based learning scheme in which the learning rule derived from likelihood maximization can be used to mimic a specific spatio-temporal spike pattern that encodes the solution to complex temporal tasks. This method makes the learning extremely rapid and precise, outperforming state of the art algorithms for RSNNs. While error-based approaches, (e.g. e-prop) trial after trial optimize the internal sequence of spikes in order to progressively minimize the MSE we assume that a signal randomly projected from an external origin (e.g. from other brain areas) directly defines the target sequence.
This framework naturally lends itself to Behavioral Cloning and allows for efficiently solving relevant
closed-loop tasks, such as tasks that require retaining memory for a long time (Button and Food) and motor tasks (the 2D Bipedal Walker).

Background
Behavioral cloning is a method of training artificial intelligence (AI) systems that involves mimicking the actions of a human or another AI agent. By observing the behavior of the demonstration agent and learning from it, the AI system can learn to perform the same tasks without the need for explicit programming.
One of the key advantages of behavioral cloning is that it has the potential to transfer policies to neuromorphic chips.
Neuromorphic electronic circuits have been shown to be a promising technology for the implementation of spiking neural network models. These circuits are typically designed using mixed-mode analog/digital transistors and fabricated using standard VLSI processes to emulate the physics of real neurons and synapses in real-time. Similar to the neural processes they model, neuromorphic systems process information using energy-efficient asynchronous, event-driven, methods. They are adaptive, fault- tolerant, and can be flexibly configured to display complex behaviors by combining multiple instances of simpler elements. The most striking difference between neuromorphic processing sys- tems and standard computing ones is in their unconventional (beyond von Neumann) architecture: rather than implementing one or more digital, time-multiplexed, central processing units physically separated from the main memory areas, they are characterized by parallel processing with co-localized memory and computation. This fundamental architectural difference is the main reason that allows neuromorphic systems to perform bio-signal processing using orders of magnitude less power (ranging from a factor of 10× to 1000×) than any AI conventional computing system.

Project goals
The goal of the project is to transfer a desired policy onto a neuromorphic chip. The idea is to take an optimal policy (from a human expert or an already trained AI) and replicate it on low-energy hardware. The great advantage of this approach is the possibility of an agent replicating a desirable policy possibly a portable and energy-efficient piece of hardware.
Neuromorphic chips are specialized hardware that mimic the structure and function of biological neurons and synapses, and are well-suited for implementing AI systems. Because behavioral cloning focuses on mimicking the behavior of an agent, rather than explicitly programming it, the resulting AI system can be more easily ported to neuromorphic hardware. This is because the policies learned through behavioral cloning are more likely to be computationally efficient and biologically plausible, which makes them well-suited for implementation on neuromorphic chips.
The first phase of the proposal is to implement on-chip a recurrent spiking network capable to learn in a target-based fashion (see [1]). In the second phase, this architecture will be used to perform behavioral cloning and to transfer a target policy on a chip. This technology will be benchmarked to solve a standard motor task (such as the 2D bipedal walker) and navigation task (see [2]).

Relevant literature
Links to related work:
1. The paper describing behavioral cloning in RSNNs is avavilable here https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010221
2. The paper describing target-based supervised learning in RSNNs is available at https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1010221
3. A paper describing the general domain of spiking neural network “neuromorphic” chips is here: https://arxiv.org/abs/1506.032644. The paper describing the specific SNN DYNAP-SE chip that will be used in the project is the following one: https://arxiv.org/abs/1708.04198
5. A paper describing how to use populations of neurons to deal with noise and low precision in SNNs is: https://www.biorxiv.org/content/10.1101/2022.10.26.513846v1

Requirements

The student is expected to have a good understanding of spiking recurrent neural networks and network dynamics. The project will make extensive use of Python code and Jupyter notebooks. Neuromorphic hardware will be used with the computer-in-the-loop, so having experience with neuromorphic circuits is a plus (but not a requirement).