Dynamic translation of American Sign Language using CNN-LSTM Model

  • Unique Paper ID: 178120
  • PageNo: 2844-2846
  • Abstract:
  • A real-time web-based system for translating American Sign Language (ASL) gestures into spoken and written language is developed using advanced machine learning and computer vision techniques. This work integrates a live WebRTC video pipeline on a React front end with a Flask server backend to capture and process gesture videos. Computer vision libraries (OpenCV and MediaPipe) extract hand landmarks from each frame, while a hybrid deep learning model (convolutional neural network followed by Long Short-Term Memory (LSTM) layers in TensorFlow) interprets the temporal sequence of gestures to predict the intended sign. The recognized sign is then output as text and synthesized into speech using a text-to-speech API. We evaluate the system on an ASL gesture dataset, measuring recognition accuracy and latency. Experimental results demonstrate high accuracy and real-time performance, confirming the feasibility of ASL translation technology.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{178120,
        author = {K N Gautam and Dhoddareddy Jathin Reddy and Katikam Sreekanth Kumar and Mr. Elaiyaraja P},
        title = {Dynamic translation of American Sign Language using CNN-LSTM Model},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {12},
        pages = {2844-2846},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=178120},
        abstract = {A real-time web-based system for translating American Sign Language (ASL) gestures into spoken and written language is developed using advanced machine learning and computer vision techniques. This work integrates a live WebRTC video pipeline on a React front end with a Flask server backend to capture and process gesture videos. Computer vision libraries (OpenCV and MediaPipe) extract hand landmarks from each frame, while a hybrid deep learning model (convolutional neural network followed by Long Short-Term Memory (LSTM) layers in TensorFlow) interprets the temporal sequence of gestures to predict the intended sign. The recognized sign is then output as text and synthesized into speech using a text-to-speech API. We evaluate the system on an ASL gesture dataset, measuring recognition accuracy and latency. Experimental results demonstrate high accuracy and real-time performance, confirming the feasibility of ASL translation technology.},
        keywords = {American Sign Language, Sign Language Recognition, Computer Vision, MediaPipe, CNN-LSTM, Real-Time WebApp, WebRTC, Flask and Text-to-Speech.},
        month = {May},
        }

Cite This Article

Gautam, K. N., & Reddy, D. J., & Kumar, K. S., & P, M. E. (2025). Dynamic translation of American Sign Language using CNN-LSTM Model. International Journal of Innovative Research in Technology (IJIRT), 11(12), 2844–2846.

Related Articles