Enhancing Email Classification Accuracy with Long Short-Term Memory (LSTM) Networks: A Comparative Analysis

  • Unique Paper ID: 176167
  • PageNo: 7041-7046
  • Abstract:
  • Email classification is critical for cybersecurity (e.g., phishing detection) and organizational efficiency (e.g., spam filtering). Traditional methods like Support Vector Machines (SVM) and Random Forests (RF) often fail to capture the sequential and contextual nuances of email text. This paper proposes a bidirectional LSTM (BiLSTM) model enhanced with BERT embeddings and structural features (headers, URLs) for multi-category email classification. We curate a dataset combining Enron, UCI Spambase, and PhishTank sources, balancing classes for phishing, spam, promotional, personal, and urgent emails. The hybrid BiLSTM-BERT architecture achieves 98.72% accuracy and 98.65% F1-score, outperforming standalone BERT (97.89% accuracy) and CNNs (96.34% accuracy). Structural features improve phishing detection recall by 2.7%, while bidirectional LSTMs resolve long-term dependency challenges in email text. Our results demonstrate the viability of sequential deep learning models for real-time email threat mitigation.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{176167,
        author = {Arul Gupta and Samarth Agarwal and Mani Deepak Choudhary},
        title = {Enhancing Email Classification Accuracy with Long Short-Term Memory (LSTM) Networks: A Comparative Analysis},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {7041-7046},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=176167},
        abstract = {Email classification is critical for cybersecurity (e.g., phishing detection) and organizational efficiency (e.g., spam filtering). Traditional methods like Support Vector Machines (SVM) and Random Forests (RF) often fail to capture the sequential and contextual nuances of email text. This paper proposes a bidirectional LSTM (BiLSTM) model enhanced with BERT embeddings and structural features (headers, URLs) for multi-category email classification. We curate a dataset combining Enron, UCI Spambase, and PhishTank sources, balancing classes for phishing, spam, promotional, personal, and urgent emails. The hybrid BiLSTM-BERT architecture achieves 98.72% accuracy and 98.65% F1-score, outperforming standalone BERT (97.89% accuracy) and CNNs (96.34% accuracy). Structural features improve phishing detection recall by 2.7%, while bidirectional LSTMs resolve long-term dependency challenges in email text. Our results demonstrate the viability of sequential deep learning models for real-time email threat mitigation.},
        keywords = {Email classification, LSTM, BERT, phishing detection, spam filtering, cybersecurity, deep learning.},
        month = {April},
        }

Cite This Article

Gupta, A., & Agarwal, S., & Choudhary, M. D. (2025). Enhancing Email Classification Accuracy with Long Short-Term Memory (LSTM) Networks: A Comparative Analysis. International Journal of Innovative Research in Technology (IJIRT), 11(11), 7041–7046.

Related Articles