Visual Speech recognition using lip movement for deaf people using deep learning

  • Unique Paper ID: 155001
  • Volume: 8
  • Issue: 12
  • PageNo: 998-1002
  • Abstract:
  • There has been a growing interest in creating automatic lip-reading systems (ALR). Methods based on Deep Learning (DL), like other computer vision applications, have grown in popularity and allowed for significant improvements in performance. The audio-visual speech recognition approach attempts to boost noise-robustness in mobile situations by extracting lip movement from side-face pictures. Although most earlier bimodal speech recognition algorithms used frontal face (lip) pictures, these approaches are difficult for consumers to utilise because they need them to speak while holding a device with a camera in front of their face. Our suggested solution, which uses a tiny camera put in a phone to capture lip movement, is more natural, simple, and convenient. This approach also successfully prevents a reduction in the input speech's signal-to-noise ratio (SNR). Optical-flow analysis extracts visual characteristics, which are then coupled with auditory data in the context of DCNN-based recognition. We employ DCNN for audio-visual speech recognition in this paper; specifically, we leverage deep learning from audio and visual characteristics for noise-resistant speech recognition. In the experimental analysis we achieved around 90% accuracy on real time test data that provides higher accuracy than traditional deep learning algorithm.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{155001,
        author = {G.J.Navale and RIYA SATIJA and MAHIMA OSWAL and BHAVESH DALAL and NIKET PATIL},
        title = {Visual Speech recognition using lip movement for deaf people using deep learning},
        journal = {International Journal of Innovative Research in Technology},
        year = {},
        volume = {8},
        number = {12},
        pages = {998-1002},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=155001},
        abstract = {There has been a growing interest in creating automatic lip-reading systems (ALR). Methods based on Deep Learning (DL), like other computer vision applications, have grown in popularity and allowed for significant improvements in performance. The audio-visual speech recognition approach attempts to boost noise-robustness in mobile situations by extracting lip movement from side-face pictures. Although most earlier bimodal speech recognition algorithms used frontal face (lip) pictures, these approaches are difficult for consumers to utilise because they need them to speak while holding a device with a camera in front of their face. Our suggested solution, which uses a tiny camera put in a phone to capture lip movement, is more natural, simple, and convenient. This approach also successfully prevents a reduction in the input speech's signal-to-noise ratio (SNR). Optical-flow analysis extracts visual characteristics, which are then coupled with auditory data in the context of DCNN-based recognition. We employ DCNN for audio-visual speech recognition in this paper; specifically, we leverage deep learning from audio and visual characteristics for noise-resistant speech recognition. In the experimental analysis we achieved around 90% accuracy on real time test data that provides higher accuracy than traditional deep learning algorithm.},
        keywords = {Deep Learning, Image processing, clasisifcation, deep learning, feature extraction, feature selection},
        month = {},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 8
  • Issue: 12
  • PageNo: 998-1002

Visual Speech recognition using lip movement for deaf people using deep learning

Related Articles