Project Proposal: COGITAO as a Reinforcement Learning Environment for Object-Centric Planning

alternate text Overview of the COGITAO generator. COGITAO builds and selects transformation sequences, then, given the configuration requested by the user and the selected transformations, proceeds to randomly sample objects and position them in the defined input grid. Once objects are positioned in the input grid, each transformation is sequentially applied to each object, one after another.

1. Background

Recent research introduced COGITAO [1], a visual reasoning framework that systematically evaluates compositionality and generalization in AI models. By generating millions of grid-world tasks from 28 interoperable transformations, the framework revealed a critical limitation in state-of-the-art models: while they show strong in-domain performance, they consistently fail to generalize to novel combinations of familiar rules, relying on pattern recognition rather than systematic reasoning. The original COGITAO paper established it as a powerful tool for analyzing these shortcomings in a supervised setting. However, its static input-output format does not explore an agent's ability to learn, plan, and execute sequences of actions to achieve a goal, which is a hallmark of intelligent behavior.

 

2. Motivation & Project Objectives

The original study revealed that even powerful vision transformers struggle with compositional generalization. This suggests that simply scaling existing architectures is insufficient. A more promising direction is to investigate models that can build internal representations of the world and use them to plan future actions. Reinforcement Learning (RL) provides the natural paradigm for such an investigation.

This project aims to extend the COGITAO framework to serve as a testbed for sequential decision-making and planning, with a special focus on object-centric learning. The primary objectives are:

  1. 1. Develop a COGITAO-RL Environment: The core objective is to adapt the existing COGITAO generator into a fully interactive Reinforcement Learning environment. Instead of predicting a final state, an agent will be tasked with selecting the correct sequence of transformations to transform an initial state into a goal state. This shifts the challenge from recognition to active, sequential problem-solving.

  2. 2. Benchmark Sequential Decision-Making Models: The new environment will be used to benchmark a range of methods for sequential decision-making. A key focus will be on World Models, which learn a compressed model of the environment's dynamics and use it for planning. This allows us to test if learning to predict future states helps agents generalize better than purely reactive policies.

  3. 3. Integrate Object-Centric Inductive Biases: We will explore how incorporating object-centric biases into world models can improve their performance and generalization. Since COGITAO tasks are inherently object-based, architectures that learn and reason over discrete object representations are hypothesized to be more sample-efficient and better at compositional generalization than models that operate on raw pixels.

 

3. Requirements

This project is well-suited for a master's student with a strong interest in reinforcement learning, generative models, and the foundations of AI reasoning. The ideal candidate should have:

  • - Strong Programming Skills: Proficiency in Python and significant experience with a major deep learning framework (PyTorch is preferred).

  • - Experience with Reinforcement Learning: Practical experience implementing and training RL agents (e.g., policy gradient methods, Q-learning). Familiarity with standard RL environments (e.g., Gym/Gymnasium) is a plus.

  • - Knowledge of Generative Models: Understanding of and hands-on experience with models like VAEs or Transformers, which form the basis of many world models.

  • - Solid Mathematical Foundation: A good grasp of probability, linear algebra, and calculus is essential for understanding the underlying concepts.

 

4. Contact

  • - Yassine Taoudi-Benchekroun (ytaoudi@ethz.ch): Yassine is a 2nd year PhD Student under the supervision of Prof. Benjamin Grewe and Prof. Melika Peyvand at the Institute of Neuroinformatics. His main research interests include compositionality, modularity and reasoning. Read more: https:yassine.fyi

  • - Pascal Sager (sage@zhaw.ch): Pascal is a PhD candidate at the Centre for Artificial Intelligence at ZHAW and a visiting student at the Institute of Neuroinformatics at UZH/ETH. His research focuses on model-based reinforcement learning, specifically on how AI systems can construct robust world models through structured latent representations. Before starting a PhD, he gained experience in the industry as a hardware and software engineer. Read more: linkedin.com/in/sagerpascal/ and sagerpascal.github.io

 

5. Starting date + Duration

Master's thesis. Starting date is flexible from October 1st, 2025.

 

6. References

  • Taoudi-Benchekroun, Y., Troyan, K., Sager, P., Gerber, S., Tuggener, L., Grewe, B. 2025, 'COGITAO: A Visual Reasoning Framework To Study Compositionality &

  • Generalization', arXiv preprint arXiv:2509.05249.

  • Hafner, D., et al. (2019). Dream to Control: Learning Behaviors by Latent Imagination. arXiv:1912.01603.

  • Greff, K., et al. (2019). Multi-Object Representation Learning with Iterative Variational Inference. arXiv:1903.00450.