Voxvisor

  • Unique Paper ID: 173613
  • Volume: 11
  • Issue: 10
  • PageNo: 862-870
  • Abstract:
  • The communication barrier between hearing and non-hearing individuals, particularly those with speech impairments, remains a significant challenge. Traditional sign language interpretation methods, often reliant on manual techniques, are time-consuming and limited in their ability to adapt to diverse signing styles. To address these limitations, we propose a deep learning-based system, Voxvisor, for real-time sign language recognition and translation into audible speech. Voxvisor incorporates advanced computer vision techniques, including key-point detection, optical flow, and YOLO (You Only Look Once) feature extraction, to accurately identify and classify sign language gestures. By leveraging deep learning architectures such as CNNs, RNNs, and LSTMs, Voxvisor can effectively learn from a comprehensive dataset of sign language videos, capturing both spatial and temporal characteristics of gestures. Compared to existing manual methods, our approach offers several advantages: real-time recognition, adaptability to various signing styles, improved accuracy. By bridging the communication gap between hearing and non-hearing individuals, Voxvisor has the potential to significantly improve the quality of life for those with speech impairments and promote social inclusion.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{173613,
        author = {VARANASI AVINASH and KASARAPU DINESH CHARAN RAJ and CHEKKA NAVEENA and AYISETTI MADHAVIKA SANTHOSHI and AMITI JAIVARDHAN},
        title = {Voxvisor},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {10},
        pages = {862-870},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=173613},
        abstract = {The communication barrier between hearing and non-hearing individuals, particularly those with speech impairments, remains a significant challenge. Traditional sign language interpretation methods, often reliant on manual techniques, are time-consuming and limited in their ability to adapt to diverse signing styles. To address these limitations, we propose a deep learning-based system, Voxvisor, for real-time sign language recognition and translation into audible speech. Voxvisor incorporates advanced computer vision techniques, including key-point detection, optical flow, and YOLO (You Only Look Once) feature extraction, to accurately identify and classify sign language gestures. By leveraging deep learning architectures such as CNNs, RNNs, and LSTMs, Voxvisor can effectively learn from a comprehensive dataset of sign language videos, capturing both spatial and temporal characteristics of gestures. Compared to existing manual methods, our approach offers several advantages: real-time recognition, adaptability to various signing styles, improved accuracy. By bridging the communication gap between hearing and non-hearing individuals, Voxvisor has the potential to significantly improve the quality of life for those with speech impairments and promote social inclusion.},
        keywords = {Deep learning, YOLOv5 (You Only Look Once), Optical flow, Real-time recognition, Sign gesture interpreter.},
        month = {March},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 10
  • PageNo: 862-870

Voxvisor

Related Articles