A Review on Click Fraud Detection in Online Advertising Using Machine Learning Algorithms

  • Unique Paper ID: 186269
  • Volume: 12
  • Issue: 6
  • PageNo: 562-574
  • Abstract:
  • Click fraud is one of the significant problems that keeps escalating in the digital advertising ecosystem. As a result, it causes a substantial loss of both revenue and trust from the advertisers' side. When someone performs click fraud, they make fake clicks on online advertisements to either artificially inflate the metric or exhaust a competitor’s budget. Conventional rule-based methods of detection are not capable of keeping up with the complexity and the scale of today's advertisement data. Machine learning (ML) and deep learning (DL) algorithms have recently been considered promising tools for detecting click fraud, as they can learn to recognize behavioural patterns and distinguish between valid and fraudulent traffic. This review paper assesses machine-learning-based methods, which primarily include decision trees (DT), random forests (RF), as well as other ensemble methods, such as gradient-boosted decision trees (GBDT), XGBoost, and LightGBM. The paper summarizes the ML model architectures, their feature engineering methods, datasets, and the significant performance results extracted from the literature available in this field. Various experiments have demonstrated that tree-based ensemble models are more efficient than traditional classifiers in machine learning scenarios, as they can address the problems of data imbalances, temporal dependencies, and non-linear relationships that exist in clickstream data. Today’s hybrid architectures, which utilize a combination of CNN, BiLSTM, and RF, achieve an extremely high level of accuracy (up to 99%) and are thus very suitable for practical applications. However, there are still issues of feature generalization, interpretability, adversarial robustness, and real-time scalability. In this paper, we identify the gaps in existing research and propose future research topics that consider Explainable AI (XAI), online learning, and privacy-preserving analytics as means to enhance the transparency and trustworthiness of advertising fraud detection systems. The present paper serves as a stepping stone towards future developments in intelligence, adaptation, and interpretability in machine learning models for identifying online advertising fraud, which in turn would provide robust protection mechanisms for the digital advertising ecosystem.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{186269,
        author = {Dr.Ganesh Gorakhnath Taware and Ms.Vaishali Balasaheb Pawar},
        title = {A Review on Click Fraud Detection in Online Advertising Using Machine Learning Algorithms},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {6},
        pages = {562-574},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=186269},
        abstract = {Click fraud is one of the significant problems that keeps escalating in the digital advertising ecosystem. As a result, it causes a substantial loss of both revenue and trust from the advertisers' side. When someone performs click fraud, they make fake clicks on online advertisements to either artificially inflate the metric or exhaust a competitor’s budget. Conventional rule-based methods of detection are not capable of keeping up with the complexity and the scale of today's advertisement data. Machine learning (ML) and deep learning (DL) algorithms have recently been considered promising tools for detecting click fraud, as they can learn to recognize behavioural patterns and distinguish between valid and fraudulent traffic. This review paper assesses machine-learning-based methods, which primarily include decision trees (DT), random forests (RF), as well as other ensemble methods, such as gradient-boosted decision trees (GBDT), XGBoost, and LightGBM. The paper summarizes the ML model architectures, their feature engineering methods, datasets, and the significant performance results extracted from the literature available in this field. Various experiments have demonstrated that tree-based ensemble models are more efficient than traditional classifiers in machine learning scenarios, as they can address the problems of data imbalances, temporal dependencies, and non-linear relationships that exist in clickstream data. Today’s hybrid architectures, which utilize a combination of CNN, BiLSTM, and RF, achieve an extremely high level of accuracy (up to 99%) and are thus very suitable for practical applications. However, there are still issues of feature generalization, interpretability, adversarial robustness, and real-time scalability. In this paper, we identify the gaps in existing research and propose future research topics that consider Explainable AI (XAI), online learning, and privacy-preserving analytics as means to enhance the transparency and trustworthiness of advertising fraud detection systems. The present paper serves as a stepping stone towards future developments in intelligence, adaptation, and interpretability in machine learning models for identifying online advertising fraud, which in turn would provide robust protection mechanisms for the digital advertising ecosystem.},
        keywords = {Click Fraud Detection; Machine Learning; Decision Tree; Random Forest; Gradient Boosting; XGBoost; LightGBM; Ensemble Learning; Deep Learning; Online Advertising; Explainable AI; Fraud Analytics.},
        month = {November},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 6
  • PageNo: 562-574

A Review on Click Fraud Detection in Online Advertising Using Machine Learning Algorithms

Related Articles