Reviewing various techniques for Text and Image based Summarization

  • Unique Paper ID: 190065
  • Volume: 12
  • Issue: 8
  • PageNo: 736-742
  • Abstract:
  • Automated information retrieval and text summarization concept is a difficult process in natural language processing because of the infrequent structure and high complexity of the documents. The text summarization process creates a summary by paraphrasing a long text. Image captioning is to automatically describe an image with a sentence, which is a topic connecting computer vision and natural language processing. Research on image captioning has great impact to help visually impaired people understand their surroundings, and it has potential benefits for the sentence-level photo organization.Modern methods were mainly based on a combination of Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN). However, generating accurate and descriptive captions remains a challenging task. Accurate captions refer to sentences consistent with the visual content, and descriptive captions refer to those with diverse descriptions rather than plain common sentences. Generally, the vision model is required to encode the context comprehensively and the language model is required to express the visual representation into a readable sentence consistently. Additionally, the training strategy also affects the performance. Additionally, the results show that summaries generated using these semi supervised approaches lead indeed to higher ROUGE scores than n-gram language models reported in previous work. We propose a multi-modal based upon Skip gram word2vec mechanism is proposed to attend original sentences, images, and captions when decoding. The text summarization process creates a summary by paraphrasing a long text. Earlier models on information retrieval and summarization are based on a massive labeled dataset by the use of handcrafted features, leveraging on knowledge for a particular domain, and concentrated on the narrow sub-domain to improve efficiency.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{190065,
        author = {Manisha Rashinkar},
        title = {Reviewing various techniques for Text and Image based Summarization},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {8},
        pages = {736-742},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=190065},
        abstract = {Automated information retrieval and text summarization concept is a difficult process in natural language processing because of the infrequent structure and high complexity of the documents. The text summarization process creates a summary by paraphrasing a long text. Image captioning is to automatically describe an image with a sentence, which is a topic connecting computer vision and natural language processing. Research on image captioning has great impact to help visually impaired people understand their surroundings, and it has potential benefits for the sentence-level photo organization.Modern methods were mainly based on a combination of Convolution Neural Networks (CNN) and Recurrent Neural Networks (RNN). However, generating accurate and descriptive captions remains a challenging task. Accurate captions refer to sentences consistent with the visual content, and descriptive captions refer to those with diverse descriptions rather than plain common sentences. Generally, the vision model is required to encode the context comprehensively and the language model is required to express the visual representation into a readable sentence consistently. Additionally, the training strategy also affects the performance. Additionally, the results show that summaries generated using these semi supervised approaches lead indeed to higher ROUGE scores than n-gram language models reported in previous work. We propose a multi-modal based upon Skip gram word2vec mechanism is proposed to attend original sentences, images, and captions when decoding. The text summarization process creates a summary by paraphrasing a long text. Earlier models on information retrieval and summarization are based on a massive labeled dataset by the use of handcrafted features, leveraging on knowledge for a particular domain, and concentrated on the narrow sub-domain to improve efficiency.},
        keywords = {Information retrieval, text summarization, deep learning, word2vec, dense captioning, Stanford, NLP},
        month = {December},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 8
  • PageNo: 736-742

Reviewing various techniques for Text and Image based Summarization

Related Articles