Modeling and Learning Continuous Word Embedding with Metadata for Question Retrieval

  • Unique Paper ID: 145876
  • PageNo: 1091-1102
  • Abstract:
  • Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings new challenge for question retrieval in cQA. In this paper, we propose to model and learn continuous word embeddings with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is basic category powered model called MB-NET and the other one is enhanced category powered model called ME-NET which can better learn the word embeddings and alleviate the lexical gap problem. To deal with the variable size of word embedding vectors, we employ the framework of fisher kernel to aggregate them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art translation models and topic-based models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{145876,
        author = {C HARI and G.V. RAMESH BABU},
        title = {Modeling and Learning Continuous Word Embedding with Metadata for Question Retrieval},
        journal = {International Journal of Innovative Research in Technology},
        year = {},
        volume = {4},
        number = {11},
        pages = {1091-1102},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=145876},
        abstract = {Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings new challenge for question retrieval in cQA. In this paper, we propose to model and learn continuous word embeddings with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is basic category powered model called MB-NET and the other one is enhanced category powered model called ME-NET which can better learn the word embeddings and alleviate the lexical gap problem. To deal with the variable size of word embedding vectors, we employ the framework of fisher kernel to aggregate them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art translation models and topic-based models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.},
        keywords = {},
        month = {},
        }

Cite This Article

HARI, C., & BABU, G. R. (). Modeling and Learning Continuous Word Embedding with Metadata for Question Retrieval. International Journal of Innovative Research in Technology (IJIRT), 4(11), 1091–1102.

Related Articles