RESEARCH PAPER SUMMARIZER USING NLP

  • Unique Paper ID: 164748
  • Volume: 10
  • Issue: 12
  • PageNo: 2076-2080
  • Abstract:
  • In today's digital era, the abundance of textual information presents a challenge for efficient comprehension and analysis. This challenge is particularly evident in the handling of lengthy documents such as PDF files. To address this, a Python script leveraging the PyMuPDF library for PDF text extraction and the Hugging Face Transformers library for text summarization, specifically utilizing the T5 model, has been developed. The script operates seamlessly from the command line, offering a user-friendly interface for summarizing PDF documents. Upon receiving the path to a PDF file as input, it employs PyMuPDF to extract text from the document. The extracted text then undergoes preprocessing, including the removal of extraneous spaces, newlines, and optionally, the "References" section. Subsequently, the preprocessed text is fed into a pre-trained T5 model, obtained via the Transformers library. The T5 model's capabilities are harnessed for text summarization, where it condenses the input text into a concise summary. The summarization process is fine-tuned to produce summaries of optimal length, ensuring comprehensibility while avoiding information loss. The script showcases robust error handling, gracefully managing exceptions encountered during PDF processing or model utilization. Output is provided in the form of both the original text snippet and the generated summary, aiding users in quickly grasping the document's essence.

Cite This Article

  • ISSN: 2349-6002
  • Volume: 10
  • Issue: 12
  • PageNo: 2076-2080

RESEARCH PAPER SUMMARIZER USING NLP

Related Articles