Segmentation and Clustering of Vocal Behavior using Deep Learning

We monitor the communication of songbirds that are housed together and recorded with a stationary wall microphone. To know which bird utters which vocalization (source separation problem), we attach miniature backpacks on these birds [1]. Such a backpack carries an accelerometer that records vibration signals from the body of that individual bird. However, the accelerometer signals have properties that render them more difficult to analyze than microphone signals: accelerometers only monitor low-frequency vibrations, and radio noise and wing flaps can mask vocal signals. The goal of this project is to segment and cluster vocalizations from such experiments, using the combined information from all audio channels and minimal training data. Segmentation and clustering of noise-affected longitudinal data into recurring patterns is a universal theme in science and medicine [2].

[1] Ter Maat, A., et al. (2014). Zebra finch mates use their forebrain song system in unlearned call communication. PloS one 9.10: e109334
[2] Coffey, K.R., Marx, R.G. & Neumaier, J.F. DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations. Neuropsychopharmacol. 44, 859–868 (2019).

Idea/Approach: We have started to develop a semi-supervised machine learning system that can annotate a single channel using minimal expert-labelled data. To avoid annotation errors stemming from various sources of noise in a single channel, we want to take advantage of the information contained in the other channels. Our aim is therefore to extent the input and output space of the machine learning system and to learn all channel annotations simultaneously.

Soonest possible project start: immediately (August 2021)


Tomas Tomka (tomas (at), Hahnloser group

© 2022 Institut für Neuroinformatik