Dense Passage Retrieval for Open-Domain Question Answering

  • Unique Paper ID: 197413
  • Volume: 12
  • Issue: 11
  • PageNo: 5929-5935
  • Abstract:
  • Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%–19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks. The proposed system uses a BERT-based dual-encoder architecture with Inner DOT product similarity, achieving 94% accuracy (DPR) and 99% accuracy (Bi-Encoder extension) on the Web Questions dataset, against an 83% TF-IDF baseline.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{197413,
        author = {DR.MOHAMMED TAJUDDIN and MOHAMMED ZAKI UZ ZAMA and MOHAMMED MUDASIR MUBEEN and RAYYAN MASOOD},
        title = {Dense Passage Retrieval for Open-Domain Question Answering},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {11},
        pages = {5929-5935},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=197413},
        abstract = {Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method. In this work, we show that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework. When evaluated on a wide range of open-domain QA datasets, our dense retriever outperforms a strong Lucene-BM25 system largely by 9%–19% absolute in terms of top-20 passage retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA benchmarks. The proposed system uses a BERT-based dual-encoder architecture with Inner DOT product similarity, achieving 94% accuracy (DPR) and 99% accuracy (Bi-Encoder extension) on the Web Questions dataset, against an 83% TF-IDF baseline.},
        keywords = {Open-Domain Question Answering; Dense Passage Retrieval (DPR); BERT; Bi-Encoder; Sentence Transformer; Word Embedding; Cosine Similarity; TF-IDF; BM25; Semantic Search; Natural Language Processing (NLP); Accuracy; Feature Extraction; Machine Learning; Text Similarity},
        month = {April},
        }

Cite This Article

TAJUDDIN, D., & ZAMA, M. Z. U., & MUBEEN, M. M., & MASOOD, R. (2026). Dense Passage Retrieval for Open-Domain Question Answering. International Journal of Innovative Research in Technology (IJIRT), 12(11), 5929–5935.

Related Articles