AI-based Image Captioning and Scene Description

  • Unique Paper ID: 169286
  • PageNo: 605-611
  • Abstract:
  • In recent improvements of image captioning and scene description based on AI, there has been great enhancement in the areas of accessibility for visually impaired users and management of contents. Recent works focus on new approaches that combine deep learning models, transformers, and multimodal techniques for high-quality, context-sensitive image description. Exploiting these emerging technologies, our research introduces a newly designed device that targets visually impaired users. We are proposing in this paper a machine learning, large language modeling, and natural language processing device that is able to voice out the details captured by a camera regarding an object and provide a description. Our device incorporates many state-of-the-art techniques proposed in recent studies, such as semantic and visual attention mechanisms, and delivers even better accuracy with more contextual relevance in its descriptions. This approach has contributed not only to developing the field of assistive technologies but also to the greater goal of making visual information more accessible and understandable by the blind.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{169286,
        author = {Priya Kumari and Astha Srivastava and Sonu Kumar Saw and Lakhan Singh and Puneet Kaur},
        title = {AI-based Image Captioning and Scene Description},
        journal = {International Journal of Innovative Research in Technology},
        year = {2024},
        volume = {11},
        number = {6},
        pages = {605-611},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=169286},
        abstract = {In recent improvements of image captioning and scene description based on AI, there has been great enhancement in the areas of accessibility for visually impaired users and management of contents. Recent works focus on new approaches that combine deep learning models, transformers, and multimodal techniques for high-quality, context-sensitive image description. Exploiting these emerging technologies, our research introduces a newly designed device that targets visually impaired users. We are proposing in this paper a machine learning, large language modeling, and natural language processing device that is able to voice out the details captured by a camera regarding an object and provide a description. Our device incorporates many state-of-the-art techniques proposed in recent studies, such as semantic and visual attention mechanisms, and delivers even better accuracy with more contextual relevance in its descriptions. This approach has contributed not only to developing the field of assistive technologies but also to the greater goal of making visual information more accessible and understandable by the blind.},
        keywords = {AI-based image captioning, scene description, accessibility, visually impaired, machine learning, large language models (LLM), natural language processing (NLP), assistive technologies, deep learning, multimodal techniques.},
        month = {November},
        }

Cite This Article

Kumari, P., & Srivastava, A., & Saw, S. K., & Singh, L., & Kaur, P. (2024). AI-based Image Captioning and Scene Description. International Journal of Innovative Research in Technology (IJIRT), 11(6), 605–611.

Related Articles