Document-Based Question Answering Using Retrieval-Augmented Generation with Open-Source Language Model

  • Unique Paper ID: 174185
  • PageNo: 3138-3144
  • Abstract:
  • Our project aims to address the issue of Extracting important information from a large collection of PDF documents by creating a smart system that combines two powerful methods: Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). PDF files and other types of documents often hold vast amounts of text, such as user manuals, legal papers, academic articles, and technical guides. Manually searching through these documents to find specific answers can be very time-consuming and difficult. Our system begins with the user's question and then searches for relevant information from various external sources. By leveraging these external sources, RAG enhances the capabilities of pre-trained LLMs. While LLMs have revolutionized natural language processing, their responses are still limited to the data they were trained on. By adding external information, RAG can significantly improve the accuracy and relevance of LLM responses. This process enriches the language model's answers by combining the user's query with the most recent available data, ensuring that the responses are not only relevant and specific but also up-to-date and contextually accurate. This approach greatly enhances the quality of responses for a wide range of applications, from chatbots to information retrieval systems. By making use of RAG, our system can provide better, more accurate, and more contextually aware responses, addressing the challenge of extracting meaningful content from large collections of PDF documents and other texts.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{174185,
        author = {Ramkishor Pondreti and Yashwanth Yarabati and Chandini Panga and Deepthi Korla and Rahul Perla and Harish Barakala},
        title = {Document-Based Question Answering Using Retrieval-Augmented Generation with Open-Source Language Model},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {10},
        pages = {3138-3144},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=174185},
        abstract = {Our project aims to address the issue of Extracting important information from a large collection of PDF documents by creating a smart system that combines two powerful methods: Large Language Models (LLMs) and Retrieval Augmented Generation (RAG). PDF files and other types of documents often hold vast amounts of text, such as user manuals, legal papers, academic articles, and technical guides. Manually searching through these documents to find specific answers can be very time-consuming and difficult. Our system begins with the user's question and then searches for relevant information from various external sources. By leveraging these external sources, RAG enhances the capabilities of pre-trained LLMs. While LLMs have revolutionized natural language processing, their responses are still limited to the data they were trained on. By adding external information, RAG can significantly improve the accuracy and relevance of LLM responses. This process enriches the language model's answers by combining the user's query with the most recent available data, ensuring that the responses are not only relevant and specific but also up-to-date and contextually accurate. This approach greatly enhances the quality of responses for a wide range of applications, from chatbots to information retrieval systems. By making use of RAG, our system can provide better, more accurate, and more contextually aware responses, addressing the challenge of extracting meaningful content from large collections of PDF documents and other texts.},
        keywords = {Retrieval Augmented Generation, Question Answering, Large Language Models, Information Retrieval},
        month = {March},
        }

Cite This Article

Pondreti, R., & Yarabati, Y., & Panga, C., & Korla, D., & Perla, R., & Barakala, H. (2025). Document-Based Question Answering Using Retrieval-Augmented Generation with Open-Source Language Model. International Journal of Innovative Research in Technology (IJIRT), 11(10), 3138–3144.

Related Articles