Action recognition of behaviors in zebra finches

Fig 1. A snapshot of the copulation experiment in the BirdPark setup. The male bird is mounting the female. In the BirdPark, behavioral interactions are recorded from three views, and vocalizations are recorded by wall microphones and individual accelerometers attached on birds. These accelerometers not only captured vocalizations but also enabled detection of brief copulation attempts, when one bird flaps its wings to maintain balance on the other bird’s back.
Keywords: action recognition, keypoint detection, computer vision, natural behavior, courtship display, songbird.
Research context:
Our lab aims to study the courtship and copulation behaviors of zebra finches, focusing on their vocal and behavioral dynamics. To capture these interactions, we recorded nine pairs of birds in our BirdPark system (Fig 1) [1] and identified 176 copulation attempts. Around these copulation events, we then manually annotated five core courtship behaviors and nine additional social behaviors, creating a ground truth behaviour dataset.
As manual annotation is time- and labor-intensive, we are looking to implement machine learning models to automate behavior recognition for each bird.
Goal of the project:
This project aims to develop a machine learning pipeline for automated behavior recognition during courtship and copulation. We are currently extracting 2D keypoints for each animal, together with their identity, in each camera view. These multi-view keypoints will be combined to reconstruct 3D poses in BirdPark coordinates, allowing us to track each bird’s posture and position in every frame. Your task will be to implement machine learning algorithms (e.g. Dino [2] or SAM [3] + a classification head) to classify a range of social behaviors based on 3D keypoints, including mounting, tail quivering, allopreening, and clumping, etc. You are also encouraged to do a literature review and propose your own ideas.
Your profile:
Ideal candidates should have a strong background in machine learning, preferably in computer vision, and enthusiasm for interdisciplinary research.
Contact:
Xueqian Ma (xueqma@ethz.ch), Maris Basha (maris@ini.uzh.ch), Luca Yapura (lyapura@ethz.ch), Prof. Richard Hahnloser (rich@ini.ethz.ch)
[1] L. Rüttimann, J. Rychen, T. Tomka, H. Hörster, M. Rocha, and R. Hahnloser. 2022. Multimodal system for recording individual-level behaviors in songbird groups. Biorxiv. doi: 10.1101/2022.09.23.509166.
[2] M. Caron, H. Touvron, I. Misra , H. Jegou , J. Mairal, P. Bojanowski, and A. Joulin. 2021. Emerging properties in self-supervised vision transformers. arXiv:2104.14294v2
[3] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A.C. Berg, W.Lo, P. Dollár, and R. Girshick. 2023. Segment Anything. arXiv:2304.02643