Meta-Learning with Hypernetworks and Multi-modal embeddings

FS 2021/2022 Master’s Thesis project

Supervisor: Prof. Benjamin Grewe - INI
Co-supervisor: Elvis Nava - ETH AI Center, INI, SRL - elvis.nava@ai.ethz.ch

Keywords: Machine Learning, Meta-Learning, Language Modeling, Hypernetworks, Multi-modal embeddings

Short Abstract

The aim of this Master’s thesis project is to extend the Meta-Learning Hypernetwork architecture, enabling it to interpret tasks expressed as language embeddings, so as to meaningfully represent task similarities in an actionable way without the need to know all tasks in advance.

Description

Meta-Learning (Hospedales et al. 2020) is a sub-field of machine learning concerned with learning on and transferring knowledge between multiple related tasks. Meta-Learning procedures usually run over multiple training episodes, with the goal of improving future learning of unseen tasks with higher data and/or compute efficiency. Beyond efficiency gains, such a learning framework more closely resembles human and animal learning, which is not constrained to static single task learning.

Recent works (Zhao et al. 2020; Henning et al. 2021; von Oswald et al. 2021) explored the use of Hypernetworks applied to a variety of problems, including that of conditioning on task embeddings to produce parameters for a downstream model, which then learns the task in an inner loop.

Explicitly learning task embeddings is useful and can inform our understanding of how tasks are related to each other, leading our hypernetwork model to efficiently share information required to solve similar tasks. However, such embeddings cannot be instantly produced for completely novel tasks, and require some adaptation.
A key insight towards overcoming this limitation is that tasks can be described with language (in the form of questions, instructions, etc.), so that a language-based task embedding can in principle be used to relate a task to previously learned ones. For this purpose, it may be useful to leverage CLIP embeddings (Radford et al. 2021) both for embedding task textual representations, and embedding underlying classification tasks’ labels, so that the inner-loop model is required to output meaningful embedding vectors for classification as opposed to one-hot labels.
In this project, we aim to leverage such language-based task representation and use it to build meta-learning models that generalize to unseen but describable tasks.

Goal

The goal of this project is to construct a hypernetwork architecture based on multimodal task and label embeddings and test its generalization properties on unseen tasks. This will be part of a broader project within the lab to explore this topic.
As part of the project, the student will familiarize themselves with the existing literature on meta-learning, hypernetworks and multimodal embeddings, help in the implementation and training of the models, and obtain/structure data sources, such as the VQA dataset (Agrawal et al. 2016).

Responsibilities

  •   -  Research into meta-learning models and techniques, hypernetworks and multimodal embeddings

  •   -  Assistance in the design of relevant experiments

  •   -  Retrieval and structuring of data sources

  •   -  Implementation and training of deep neural network models

What you offer

A motivated Master’s student with an excellent background in Machine Learning, knowledge of Pytorch (or JAX) and experience with training large neural network models.

What we offer

  •   -  The interdisciplinary collaborative environment at the intersection of machine learning, computer science and neuroscience offered by the ETH AI Center and the Institute of Neuroinformatics.

  •   -  A Master’s thesis project on a novel and popular topic in neural networks.

Thesis start: September/October

Duration: 6 months

Please attach a CV, short motivation and background (<0.5 page). If you have any questions about the project, do not hesitate to contact us.

Bibliography

Agrawal, Aishwarya, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, and Devi Parikh. 2016. “VQA: Visual Question Answering.” ArXiv:1505.00468 [Cs], October. http://arxiv.org/abs/1505.00468.

Henning, Christian, Maria R. Cervera, Francesco D’Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, and João Sacramento. 2021. “Posterior Meta-Replay for Continual Learning.” ArXiv:2103.01133 [Cs], June. http://arxiv.org/abs/2103.01133.

Hospedales, Timothy, Antreas Antoniou, Paul Micaelli, and Amos Storkey. 2020. “Meta-Learning in Neural Networks: A Survey.” ArXiv:2004.05439 [Cs, Stat], November. http://arxiv.org/abs/2004.05439.

Johannes von Oswald, Seijin Kobayashi, Alexander Meulemans, Christian Henning, Benjamin F. Grewe, Joao Sacramento. 2021. “Neural Networks with Late-Phase Weights.” ArXiv:2007.12927 [Cs, Stat], April. http://arxiv.org/abs/2007.12927.

Radford, Alec, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, et al. 2021. “Learning Transferable Visual Models From Natural Language Supervision.” ArXiv:2103.00020 [Cs], February. http://arxiv.org/abs/2103.00020.

Dominic Zhao, Seijin Kobayashi, João Sacramento*, Johannes von Oswald*. 2020. “Meta-Learning via Hypernetworks,”. https://meta-learn.github.io/2020/papers/38_paper.pdf

© 2024 Institut für Neuroinformatik