PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis

  • Unique Paper ID: 174790
  • PageNo: 1016-1021
  • Abstract:
  • In the digital age, PDF files are widely used for document sharing, but their popularity also makes them a target for malware attacks. This project, titled "PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis," aims to develop and evaluate machine learning models for detecting malware in PDF files. Utilizing a dataset from Kaggle, which contains labeled examples of malicious and benign PDFs, various algorithms including Random Forest, C5.0, J48, Support Vector Machine (SVM), AdaBoost, Deep Neural Network (DNN), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) will be applied. The primary focus is on achieving high detection accuracy while also providing explainability to understand the decision-making process of the models. By leveraging machine learning techniques, this project seeks to enhance cybersecurity measures, offering a robust solution to identify and mitigate potential threats embedded in PDF documents.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{174790,
        author = {Kilaru Pradeepthi and Ravi Sri Lakshmi and Boddu vaishnavi and VADUGU NARENDRA BABU and Tatineni vijayasree},
        title = {PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {1016-1021},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=174790},
        abstract = {In the digital age, PDF files are widely used for document sharing, but their popularity also makes them a target for malware attacks. This project, titled "PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis," aims to develop and evaluate machine learning models for detecting malware in PDF files. Utilizing a dataset from Kaggle, which contains labeled examples of malicious and benign PDFs, various algorithms including Random Forest, C5.0, J48, Support Vector Machine (SVM), AdaBoost, Deep Neural Network (DNN), Gradient Boosting Machine (GBM), and K-Nearest Neighbors (KNN) will be applied. The primary focus is on achieving high detection accuracy while also providing explainability to understand the decision-making process of the models. By leveraging machine learning techniques, this project seeks to enhance cybersecurity measures, offering a robust solution to identify and mitigate potential threats embedded in PDF documents.},
        keywords = {PDF malware detection, machine learning, Random Forest, SVM, DNN, explainability, cybersecurity, malicious PDF, classification algorithms, Kaggle dataset.},
        month = {April},
        }

Cite This Article

Pradeepthi, K., & Lakshmi, R. S., & vaishnavi, B., & BABU, V. N., & vijayasree, T. (2025). PDF Malware Detection: Toward Machine Learning Modeling With Explainability Analysis. International Journal of Innovative Research in Technology (IJIRT), 11(11), 1016–1021.

Related Articles