Real time Image Captioning Enhancing Accessibility for Visually Impaired

  • Unique Paper ID: 173977
  • PageNo: 2347-2352
  • Abstract:
  • The world becomes restricted for visual impaired people because they need additional help with accessing visual materials and making sense of visual elements. The proposed solution consists of real-time image conversion through visual data into descriptive audio output. Deep learning algorithms enable the system to extract image features from VGG16 and ResNet and DenseNet which function as pre-trained Convolutional Neural Networks (CNNs). After extracting information from images the Gated Recurrent Unit (GRU) network processes this data to produce relevant verbal descriptions. The TTS engine converts the created captions into spoken words so users can hear descriptive audio descriptions of their visual content. The integrated system combines computer vision techniques with natural language processing methods to generate correct descriptions which enables visually impaired people to better understand their environments. Users can interpret images without help through this independent system which requires minimal effort. Through its ability to offer inclusive interfaces the technology improves quality of life opportunities for people with visual impairments. The system provides users with an effective real-time audio description method which delivers easily understandable visual information.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{173977,
        author = {Chemudupati Venkata Vasundhara and Padala Naga Subhash Reddy and Komma Madhusudhana Rao and Nalli Jashuva},
        title = {Real time Image Captioning Enhancing  Accessibility for Visually Impaired},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {10},
        pages = {2347-2352},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=173977},
        abstract = {The world becomes restricted for visual impaired people because they need additional help with accessing visual materials and making sense of visual elements. The proposed solution consists of real-time image conversion through visual data into descriptive audio output. Deep learning algorithms enable the system to extract image features from VGG16 and ResNet and DenseNet which function as pre-trained Convolutional Neural Networks (CNNs). After extracting information from images the Gated Recurrent Unit (GRU) network processes this data to produce relevant verbal descriptions. The TTS engine converts the created captions into spoken words so users can hear descriptive audio descriptions of their visual content. The integrated system combines computer vision techniques with natural language processing methods to generate correct descriptions which enables visually impaired people to better understand their environments. Users can interpret images without help through this independent system which requires minimal effort. Through its ability to offer inclusive interfaces the technology improves quality of life opportunities for people with visual impairments. The system provides users with an effective real-time audio description method which delivers easily understandable visual information.},
        keywords = {Image Processing, Convolutional Neural Networks (CNN), Gated Recurrent Unit (GRU), VGG16, ResNet, DenseNet, Text-to-Speech (TTS), Assistive Technology, Natural Language Processing (NLP)},
        month = {March},
        }

Cite This Article

Vasundhara, C. V., & Reddy, P. N. S., & Rao, K. M., & Jashuva, N. (2025). Real time Image Captioning Enhancing Accessibility for Visually Impaired. International Journal of Innovative Research in Technology (IJIRT), 11(10), 2347–2352.

Related Articles