Multilingual Automatic Speech Recognition for Low-resource Languages

Master’s Thesis-I (Nationwide project - Bhashini, NLTM) at the Computational Speech and Language Technologies Lab, IIT Bombay, Guides: Prof. Preethi Jyothi, Prof. Pushpak Bhattacharya

Summary: This project involves creating multilingual Automatic Speech Recognition tools robust to mispronunciation, along with requiring low computation, less data and specifically catering to low resource languages.

Paper accepted to the 4th Multilingual Representation Learning workshop, EMNLP 2024

Paper 1 link, Poster link

Paper 2 link

Designed a novel strategy to incorporate paraphrase supervision in multimodal models and improve ASR for noisy speech
Developed a novel sequential method to combine speech-based parameter-efficient fine-tuning and text-only adaptation for multimodal multilingual models like SeamlessM4T, improving ASR on 10+ low-resource Indian and African languages
Obtained a 40% WER reduction over baseline on IndicVoices-Maithili and identified a cross-lingual transfer technique that can give more than 17% relative reduction in WER for a low-resource language without using any speech of that language

Share on

Twitter Facebook LinkedIn

Amruta Parulekar

Share on