ENHANCING VISUAL QUESTION ANSWERING BRIDGING COMPUTER VISION AND NLP

  • Unique Paper ID: 175515
  • PageNo: 3400-3406
  • Abstract:
  • This project focuses on developing a Visual Question Answering (VQA) system that integrates computer vision and natural language processing to enable machines to understand and respond to questions about visual content. The system leverages deep learning techniques, including Convolutional Neural Networks (CNNs) for image feature extraction and Recurrent Neural Networks (RNNs) for processing textual questions. The goal is to create an interactive platform where users can upload images, ask questions in English, and receive answers in multiple languages, including Hindi, Telugu, Urdu, and Kannada. This project aims to enhance human-AI interaction by making machines more intelligent and capable of understanding the world through both visual and textual information.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{175515,
        author = {Soma Sekhara Rao Kadiyala and Mr. R Madhukanth and Aseervad Abhishek Sripathi and Krishna Babu Kondaimanchilli and Harsha Vardhan Attaluri and Mohammad Shahid},
        title = {ENHANCING VISUAL QUESTION ANSWERING BRIDGING COMPUTER VISION AND NLP},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {3400-3406},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=175515},
        abstract = {This project focuses on developing a Visual Question Answering (VQA) system that integrates computer vision and natural language processing to enable machines to understand and respond to questions about visual content. The system leverages deep learning techniques, including Convolutional Neural Networks (CNNs) for image feature extraction and Recurrent Neural Networks (RNNs) for processing textual questions. The goal is to create an interactive platform where users can upload images, ask questions in English, and receive answers in multiple languages, including Hindi, Telugu, Urdu, and Kannada. This project aims to enhance human-AI interaction by making machines more intelligent and capable of understanding the world through both visual and textual information.},
        keywords = {Image analysis, User Interface, Descriptive Response Generation Along With Processed Google Generative AI APK Keys, Integrated Machine Learning, RNN, CNN, Text Processing And Natural Language Integration Key Words.},
        month = {April},
        }

Cite This Article

Kadiyala, S. S. R., & Madhukanth, M. R., & Sripathi, A. A., & Kondaimanchilli, K. B., & Attaluri, H. V., & Shahid, M. (2025). ENHANCING VISUAL QUESTION ANSWERING BRIDGING COMPUTER VISION AND NLP. International Journal of Innovative Research in Technology (IJIRT), 11(11), 3400–3406.

Related Articles