MSc projects: Enhancing Communication for Individuals with Complex Congenital Disorders - A Personalized Speech Recognition Approach

Supervisors: Roman Boehringer (roman@ethz.ch), Pehuen Moure (pehuen@ini.ethz.ch)

Motivation

For individuals, especially children, with complex congenital disorders such as cerebral palsy, muscular dystrophy, developmental coordination disorder, or Apert syndrome, fine motor skills and speech are frequently impaired to varying degrees. Despite many of these disabilities having little to no impact on cognitive abilities, individuals in these groups often encounter significant communication challenges, both in spoken and written forms. These communication difficulties can hinder their ability to function effectively in society, leading to various disadvantages.

Commonly used tools for speech recognition such as Google, Apple, or Dragon often struggle to adapt and detect speech that deviates from the norm, rendering them unsuitable for individuals with speech impairments. This is especially true for languages that are not english. In this project, our primary objective is to develop a framework that streamlines and personalizes the training and fine-tuning of speech recognition models in german, thereby addressing the unique communication needs of these individuals.

Project 1: Early layer fine-tuning

One possible approach to deal with this non-normative speech is to use early layer training that has been shown to work [1, 2]. Early layer training has the advantage that no extensive training dataset for the base model is needed.

Goal of the Master Thesis

The goal of the Master's Thesis is to replicate the published work [1,2] by utilizing early-layer fine-tuning of an existing German speech recognition model (Open Source model: bofenghuang/whisper-large-v2-cv11-german · Hugging Face) with data from a speech-impaired child. The Master's Thesis includes:

- Implement the model and test it on normal speech.
- Collect a dataset featuring speech-impaired children (including tasks such as obtaining consent forms, data collection, preprocessing, and data annotation).
- Fine-tune the model [2] using the newly acquired dataset and validate the model's robustness.

Project 2: Training-Data Augmentation

This project revolves around an exploratory idea, primarily focused on training an initial model with limited data. Training such a model from scratch is believed to enhance its robustness, especially when dealing with speech that deviates significantly from the norm. The primary constraint is the availability of training data. Collecting speech samples from individuals with certain speech variations presents significant challenges. This is primarily attributed to two key factors: the scarcity of individuals affected by these specific speech disorders (often referred to as 'rare diseases'), and the increased effort required from individuals to provide speech samples. To address this challenge, our approach is to collect a limited amount of data and model the variations in speech. This model will then be used to transform an existing, pre-annotated corpus, enabling us to train a model from scratch.

Goal of the Master Thesis

The goal of this project is to analyze speech variations and modify an existing German speech corpus so that it can serve as the foundation for training a new model. The Master's Thesis includes:

- Collect a dataset featuring a speech-impaired child, including tasks such as drafting consent forms, data collection, preprocessing, and data annotation.
- Analyze speech variations and adapt an existing corpus to capture these variations in speech.
- Train a new model and validate its robustness.

Requirements

This project has the extent of a MSc-Thesis-Project (For semester project we can discuss partial work)

Our ideal master's student for this project:

- Is self-driven and highly engaged.
- Possesses Python coding skills.
- Is proficient in the German language (ideally).
- Has some knowledge about neural networks

References:

- Green et al., “Automatic Speech Recognition of Disordered Speech: Personalized models outperforming human listeners on short phrases” https://www.isca-speech.org/archive/pdfs/interspeech_2021/green21_interspeech.pdf
- J. Tobin and K. Tomanek, "Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 6637-6641, doi: 10.1109/ICASSP43922.2022.9747516.

Interesting Publication:

- Murero et al. “Artificial Intelligence for Severe Speech Impairment: Innovative approaches to AAC and Communication”, https://ceur-ws.org/Vol-2730/paper31.pdf

back