AI-Based Real-Time Sign Language Translator Using CNN, LSTM, MediaPipe, and Multi-Language TTS

  • Unique Paper ID: 201050
  • PageNo: 258-264
  • Abstract:
  • This paper presents an AI-powered Sign Language Translator designed to bridge communication between deaf and mute individuals and the hearing population. The system leverages Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to recognize hand gestures in real-time and convert them into text and speech. Hand landmarks are extracted using Google's MediaPipe framework (21 keypoints per hand). The model is trained on the WLASL (Word-Level American Sign Language) dataset and extended with Indian Sign Language (ISL) data. Multi-language translation is supported via Google Translate and gTTS. The system achieves 27–30 fps on standard CPU hardware with an overall accuracy of 91.6%, offering a scalable and inclusive communication solution for accessibility.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{201050,
        author = {Mrs Veerendeswari J and Mrs Celin Julie B and Jayasri M and Ganiga M and Harini S},
        title = {AI-Based Real-Time Sign Language Translator Using CNN, LSTM, MediaPipe, and Multi-Language TTS},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {no},
        pages = {258-264},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=201050},
        abstract = {This paper presents an AI-powered Sign Language Translator designed to bridge communication between deaf and mute individuals and the hearing population. The system leverages Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to recognize hand gestures in real-time and convert them into text and speech. Hand landmarks are extracted using Google's MediaPipe framework (21 keypoints per hand). The model is trained on the WLASL (Word-Level American Sign Language) dataset and extended with Indian Sign Language (ISL) data. Multi-language translation is supported via Google Translate and gTTS. The system achieves 27–30 fps on standard CPU hardware with an overall accuracy of 91.6%, offering a scalable and inclusive communication solution for accessibility.},
        keywords = {Sign Language Recognition, CNN, LSTM, Media Pipe, Real-Time Gesture Recognition, WLASL Dataset, Multi-Language Translation, Text-to-Speech, Accessibility, Deep Learning, Indian Sign Language.},
        month = {May},
        }

Cite This Article

J, M. V., & B, M. C. J., & M, J., & M, G., & S, H. (2026). AI-Based Real-Time Sign Language Translator Using CNN, LSTM, MediaPipe, and Multi-Language TTS. International Journal of Innovative Research in Technology (IJIRT), 258–264.

Related Articles