AI-Based Phishing Email and SMS Detection Using TF-IDF Vectorization and Logistic Regression

  • Unique Paper ID: 200949
  • PageNo: 121-130
  • Abstract:
  • Phishing attacks have become increasingly sophisticated, routinely bypassing traditional rule-based and keyword-filter-based detection systems. Both email and SMS communication channels are targeted by attackers who exploit social engineering techniques to deceive users into revealing sensitive credentials or clicking malicious links. This paper proposes a multi-layer forensic phishing detection system capable of identifying threats across both email and SMS communication simultaneously. The proposed system leverages TF-IDF (Term Frequency-Inverse Document Frequency) vectorization with N-Gram analysis to extract discriminative linguistic features from message text, and employs a Logistic Regression classifier augmented with a Weighted Scoring mechanism that combines probabilistic model output with heuristic-based threat-action pattern detection. Unlike static blacklist filters, the proposed approach uses explainable AI logic to distinguish between legitimate alerts and phishing threats, enabling detection of previously unseen Zero-Day attack patterns. A hybrid dataset of over 39,000 Email and SMS samples was used for training and evaluation. The system achieves a classification accuracy of 99.05%, substantially outperforming existing keyword-filter and blacklist-based baselines. A real-time Tkinter-based desktop GUI provides visual forensic mapping for end users, making the system a practical, scalable solution for personal and enterprise-level phishing protection.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{200949,
        author = {Abinash S and Ms K. Monisha and Dhanushkodi K and Karmegan V},
        title = {AI-Based Phishing Email and SMS Detection Using TF-IDF Vectorization and Logistic Regression},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {no},
        pages = {121-130},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=200949},
        abstract = {Phishing attacks have become increasingly sophisticated, routinely bypassing traditional rule-based and keyword-filter-based detection systems. Both email and SMS communication channels are targeted by attackers who exploit social engineering techniques to deceive users into revealing sensitive credentials or clicking malicious links. This paper proposes a multi-layer forensic phishing detection system capable of identifying threats across both email and SMS communication simultaneously. The proposed system leverages TF-IDF (Term Frequency-Inverse Document Frequency) vectorization with N-Gram analysis to extract discriminative linguistic features from message text, and employs a Logistic Regression classifier augmented with a Weighted Scoring mechanism that combines probabilistic model output with heuristic-based threat-action pattern detection. Unlike static blacklist filters, the proposed approach uses explainable AI logic to distinguish between legitimate alerts and phishing threats, enabling detection of previously unseen Zero-Day attack patterns. A hybrid dataset of over 39,000 Email and SMS samples was used for training and evaluation. The system achieves a classification accuracy of 99.05%, substantially outperforming existing keyword-filter and blacklist-based baselines. A real-time Tkinter-based desktop GUI provides visual forensic mapping for end users, making the system a practical, scalable solution for personal and enterprise-level phishing protection.},
        keywords = {AI-driven hybrid, rule-based, probability-based classification, instant message verification},
        month = {May},
        }

Cite This Article

S, A., & Monisha, M. K., & K, D., & V, K. (2026). AI-Based Phishing Email and SMS Detection Using TF-IDF Vectorization and Logistic Regression. International Journal of Innovative Research in Technology (IJIRT), 121–130.

Related Articles