Multilingual Automatic Speech Recognition for Low-resource Languages
Master’s Thesis-I (Nationwide project - Bhashini, NLTM) at the Computational Speech and Language Technologies Lab, IIT Bombay, Guides: Prof. Preethi Jyothi, Prof. Pushpak Bhattacharya
Summary: This project involves creating multilingual Automatic Speech Recognition tools robust to mispronunciation, along with requiring low computation, less data and specifically catering to low resource languages.
Paper accepted to the 4th Multilingual Representation Learning workshop, EMNLP 2024
- Designed a novel strategy to incorporate paraphrase supervision in multimodal models and improve ASR for noisy speech
- Developed a novel sequential method to combine speech-based parameter-efficient fine-tuning and text-only adaptation for multimodal multilingual models like SeamlessM4T, improving ASR on 10+ low-resource Indian and African languages
- Obtained a 40% WER reduction over baseline on IndicVoices-Maithili and identified a cross-lingual transfer technique that can give more than 17% relative reduction in WER for a low-resource language without using any speech of that language