Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR
Published in Accepted to the 4th Multilingual Representation Learning Workshop, EMNLP 2024, 2024
Automatic speech recognition (ASR) for low-resource languages remains challenging due to the limited availability of labeled training data. Parameter-efficient fine-tuning and text-only adaptation are two widely used approaches to address these constraints. In this study, we explore how these techniques can be effectively combined using a multilingual multimodal model like SeamlessM4T. Multimodal models can leverage unlabeled text through text-only adaptation alongside parameter-efficient ASR fine-tuning, resulting in improved ASR performance. Additionally, we demonstrate cross-lingual transfer from a high-resource language, achieving up to a 17% relative reduction in WER in a zero-shot setting without any labeled speech data.
Recommended citation: Abhishek Gupta, Amruta Parulekar, Sameep Chattopadhyay, and Preethi Jyothi. 2024. Parameter-efficient Adaptation of Multilingual Multimodal Models for Low-resource ASR. In Proceedings of the Fourth Workshop on Multilingual Representation Learning (MRL 2024), pages 175–185, Miami, Florida, USA. Association for Computational Linguistics.
Download Paper | Download Slides