Uncovering shared cognitive representations in language models and biological agents

alternate text

Background:

Recent advances in language models (LMs) have highlighted their remarkable capabilities across a broad spectrum of tasks. However, significant gaps remain, particularly in temporal probabilistic reasoning—a fundamental aspect of cognition effortlessly handled by humans and other primates. Small language models (≤ 1B parameters), while computationally efficient and interpretable, provide an ideal testbed to systematically study these capabilities. Recent studies indicate that these smaller models, when properly fine-tuned, can achieve state-of-the-art results, demonstrating advanced reasoning comparable to larger models. Additionally, the internal representations of LMs have shown striking similarities to cognitive and neural representations observed in biological brains, offering exciting opportunities for cross-disciplinary insights.

 

Master Thesis /Semester Project:

In this project, we aim to develop and expand the G1Bbon benchmark (ai.trt-bench.org) to systematically assess and improve the temporal reasoning capabilities of small (< 1B parameters) language models. By combining detailed behavioural data from human participants and non-human primates, we will investigate how these models internally represent and process a set of cognitive tasks. The project will focus on mechanistic interpretability methods to dissect the representational dynamics of these cognitive features within the models, on the fine-tuning of models using various techniques and in developing novel brain inspired architectures to improve efficiency and performance of models in cognitive tasks.

 

Research Directions:

Your thesis can be focused on one (or multiple) of the topics below:

  1. 1. Expand Task Space: Develop new Temporal Reasoning Tasks (TRTs) inspired by human and primate cognitive experiments to robustly probe model reasoning capabilities.
  2. 2. Adapt and Fine-tune Models: Fine-tune selected 1B-scale LLMs using human-derived behavioral data (GRPO) to improve their performance.
  3. 3. Behavioral Comparison: Analyze and systematically compare model performance with human and primate behavioral data to uncover similarities and differences in cognitive processing.
  4. 4. Develop Interpretability Methods: Implement novel mechanistic interpretability techniques (e.g., sparse probing, layer-wise activation dynamics, chain-of-thought analyses) to provide deep insights into the evolving representations within the models.
  5. 5. Optimize Model Size and Performance: Develop new techniques to train the smallest language model that achieves the highest possible performance.

 

Your benefits:

This project will allow you to explore the cognitive potential and limitations of language models and your work will lead directly to publishable insights into temporal reasoning, model interpretability, and cross-species cognitive comparisons. You will gain hands-on experience with fine-tuning language models and deployment of experiments in GPU clusters, designing sophisticated cognitive tasks, and developing state-of-the-art mechanistic interpretability methods. Moreover, your efforts will result in co-authorship on relevant publications and substantial contributions to the broader AI and cognitive science communities.

 

Related works / Preliminary readings:

 

Your profile:

We seek a motivated student proficient in Python and PyTorch and some general knowledge of inference libraries such as unsloth, vLLM, Ollama. Prior experience with language models, cognitive science, or interpretability methods is beneficial.

 

Supervisors:

  • - Dr. Gonçalo Guiomar (ETH, UZH): goncalo (at) ini.uzh.ch
  • - Dr. Mario Giulianelli (ETH)
  • - Elia Torre (ETH, UZH)
  • - Prof. Valerio Mante (ETH, UZH)

 

Starting date + Duration:

This project is currently available as a semester project or master thesis.