Speech and Text Generative Models for Automatic Dubbing of Videos

Master’s Thesis-II (Nationwide project - BharatGen) at the Computational Speech and Language Technologies Lab, IIT Bombay, Guides: Prof. Preethi Jyothi, Prof. Ganesh Ramakrishnan

Summary: This project aims to generate natural low-resource language speech for agriculture education videos.

  • Using non-autoregressive flow matching of continuous normalizing flows for text-guided multilingual speech generation
  • Adapting neural codec language models like Vall-E and SpeechX for voice and emotion transfer in low resource dialects