Machine Learning Approaches for SMS Spam Detection: A Comparative Analysis

  • Unique Paper ID: 183547
  • PageNo: 2715-2721
  • Abstract:
  • The exponential growth of Short Message Service (SMS) has led to a significant increase in unsolicited commercial advertisements, commonly known as SMS spam, particularly prevalent in regions like Asia. Developing effective SMS spam filtering systems presents challenges due to the limited availability of real SMS spam databases and the short length and informal language of messages, which hinder traditional filtering algorithms. To address these issues, this project leverages a publicly available SMS spam dataset from the UCI machine learning repository. Through rigorous feature extraction and pre-processing techniques, including tokenization, stop word removal, lemmatization, and normalization, the data is prepared for classification. This study employs and compares the performance of various machine learning algorithms, specifically K-Nearest Neighbour (KNN), Logistic Regression (LR), and Random Forest (RF), to classify SMS messages as either spam or legitimate. Our experimental results demonstrate that the Random Forest algorithm achieved the highest accuracy of 97.7%, with a precision of 97.5%. The Logistic Regression model achieved 95.1% accuracy and 92.3% precision, while K-Nearest Neighbour showed 90.3% accuracy and 100% precision. This research contributes to advancing spam filtering techniques by addressing the unique challenges of SMS communication, paving the way for more robust and accurate spam detection systems.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{183547,
        author = {Priyanshu and Sarthak Mittal and Sanchit Vasdev},
        title = {Machine Learning Approaches for SMS Spam Detection: A Comparative Analysis},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {3},
        pages = {2715-2721},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=183547},
        abstract = {The exponential growth of Short Message Service (SMS) has led to a significant increase in unsolicited commercial advertisements, commonly known as SMS spam, particularly prevalent in regions like Asia. Developing effective SMS spam filtering systems presents challenges due to the limited availability of real SMS spam databases and the short length and informal language of messages, which hinder traditional filtering algorithms. To address these issues, this project leverages a publicly available SMS spam dataset from the UCI machine learning repository. Through rigorous feature extraction and pre-processing techniques, including tokenization, stop word removal, lemmatization, and normalization, the data is prepared for classification. This study employs and compares the performance of various machine learning algorithms, specifically K-Nearest Neighbour (KNN), Logistic Regression (LR), and Random Forest (RF), to classify SMS messages as either spam or legitimate. Our experimental results demonstrate that the Random Forest algorithm achieved the highest accuracy of 97.7%, with a precision of 97.5%. The Logistic Regression model achieved 95.1% accuracy and 92.3% precision, while K-Nearest Neighbour showed 90.3% accuracy and 100% precision. This research contributes to advancing spam filtering techniques by addressing the unique challenges of SMS communication, paving the way for more robust and accurate spam detection systems.},
        keywords = {},
        month = {August},
        }

Cite This Article

Priyanshu, , & Mittal, S., & Vasdev, S. (2025). Machine Learning Approaches for SMS Spam Detection: A Comparative Analysis. International Journal of Innovative Research in Technology (IJIRT), 12(3), 2715–2721.

Related Articles