Open Domain Question Answering System Using Wikipedia

  • Unique Paper ID: 151185
  • PageNo: 165-173
  • Abstract:
  • The open-domain question answering task has recently been addressed using unstructured data such as websites and online encyclopedias. Here, open-domain questions are answered by making full use of knowledge sources of Wikipedia via its API for many types of questions, it is critical to analyze user questions in terms of the nature of the answers being sought. The analyzed result of a question has three components: Answer Format, Answer Theme and Question Target (question analysis). The next step involves finding the most relevant documents or passages related to the question using either word embedding distances or Deep Learning (document retrieval). Finally, the answers are extracted from the passage (machine comprehension). PageRank technique can be used while computing the "document score" to assess relevance of a document to a query. BERT or ALBERT architecture (document reader trained on SQuAD 2.0 dataset) can be used which will dramatically improve performance for machine comprehension. The Answer Ranker (SoftMax function) extracts the 1-2 lines of answer for the query. The question provided and the answer generated are in audio format using the SpeechRecognition library and gTTS API.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{151185,
        author = {Ria Singh and Preetam B. A. and Prince Zalavadia and Dr. Sharvari Govilkar},
        title = {Open Domain Question Answering System Using Wikipedia},
        journal = {International Journal of Innovative Research in Technology},
        year = {},
        volume = {7},
        number = {12},
        pages = {165-173},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=151185},
        abstract = {The open-domain question answering task has recently been addressed using unstructured data such as websites and online encyclopedias. Here, open-domain questions are answered by making full use of knowledge sources of Wikipedia via its API for many types of questions, it is critical to analyze user questions in terms of the nature of the answers being sought. The analyzed result of a question has three components: Answer Format, Answer Theme and Question Target (question analysis). The next step involves finding the most relevant documents or passages related to the question using either word embedding distances or Deep Learning (document retrieval). Finally, the answers are extracted from the passage (machine comprehension). PageRank technique can be used while computing the "document score" to assess relevance of a document to a query. BERT or ALBERT architecture (document reader trained on SQuAD 2.0 dataset) can be used which will dramatically improve performance for machine comprehension. The Answer Ranker (SoftMax function) extracts the 1-2 lines of answer for the query. The question provided and the answer generated are in audio format using the SpeechRecognition library and gTTS API.},
        keywords = {Question Answering System, Open Domain, Wikipedia, Deep Learning, Natural Language Processing, Document Retrieval, Document Reader, Answer Ranker.},
        month = {},
        }

Cite This Article

Singh, R., & A., P. B., & Zalavadia, P., & Govilkar, D. S. (). Open Domain Question Answering System Using Wikipedia. International Journal of Innovative Research in Technology (IJIRT), 7(12), 165–173.

Related Articles