Voxify: Your Gateway to Multilingual Conversations

  • Unique Paper ID: 173329
  • Volume: 11
  • Issue: 10
  • PageNo: 112-123
  • Abstract:
  • This paper presents the development of a speech-to-speech translation system built on a hybrid architecture that integrates on-device processing with cloud-based services. The system comprises three core modules: speech-to-text transcription, translation and speech synthesis, and language detection, working in tandem to convert spoken input into translated audio output. Initial speech is captured using the device’s microphone, with preprocessing techniques like noise reduction applied for clarity. On- device speech recognition ensures rapid transcription of spoken words into text, minimizing latency. The transcribed text is then sent to the cloud-based Bhashini API, which performs both text-to-text translation and text-to-speech synthesis. The system uses the Whisper speech-to-text model, fine-tuned for Indic languages, to detect the spoken language and ensure accurate translation. Error-handling mechanisms, data privacy protocols, and performance optimizations, such as compression techniques and encrypted communication, enhance the system's robustness. By combining fast on-device transcription with advanced cloud-based translation, the system delivers scalable, real-time translations, particularly suited for multilingual and culturally diverse contexts.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{173329,
        author = {Prof. Krishnendu Nair and Siddhi Thoke and T T K Urshitha Sai and Pratham Yadav},
        title = {Voxify: Your Gateway to Multilingual Conversations},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {10},
        pages = {112-123},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=173329},
        abstract = {This paper presents the development of a speech-to-speech translation system built on a hybrid architecture that integrates on-device processing with cloud-based services. The system comprises three core modules: speech-to-text transcription, translation and speech synthesis, and language detection, working in tandem to convert spoken input into translated audio output. Initial speech is captured using the device’s microphone, with preprocessing techniques like noise reduction applied for clarity. On- device speech recognition ensures rapid transcription of spoken words into text, minimizing latency. The transcribed text is then sent to the cloud-based Bhashini API, which performs both text-to-text translation and text-to-speech synthesis. The system uses the Whisper speech-to-text model, fine-tuned for Indic languages, to detect the spoken language and ensure accurate translation. Error-handling mechanisms, data privacy protocols, and performance optimizations, such as compression techniques and encrypted communication, enhance the system's robustness. By combining fast on-device transcription with advanced cloud-based translation, the system delivers scalable, real-time translations, particularly suited for multilingual and culturally diverse contexts.},
        keywords = {Machine Translation, Natural Language Processing, Automatic Speech Recognition, Speech to Text Translation, Text-to-text translation, Text-to-Speech Synthesis, Speech Detection.},
        month = {February},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 10
  • PageNo: 112-123

Voxify: Your Gateway to Multilingual Conversations

Related Articles