Analytical evaluation of machine learning models for detecting insurance fraud

  • Unique Paper ID: 190866
  • Volume: 12
  • Issue: 8
  • PageNo: 5479-5484
  • Abstract:
  • Insurance fraud is a global challenge that imposes heavy financial burdens on the industry and economy. This research provides an analytical evaluation of four machine learning models—Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and XGBoost—to identify fraudulent claims. Addressing the common issue of class imbalance in insurance data, the study utilizes a quantitative methodology involving data preprocessing and the evaluation of key performance metrics like Accuracy, Precision, Recall, and F1-score.The experimental results demonstrate that ensemble methods significantly outperform individual classifiers. While Logistic Regression served as a baseline with lower predictive power, XGBoost emerged as the superior model, achieving a Precision of 75% and an F1-score of 71%. These findings suggest that implementing gradient-boosted models can drastically improve fraud detection rates, helping insurers minimize financial leakage from undetected fraudulent activities.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{190866,
        author = {Anish Arvind Karne and Shubham Kailas Badhe and Sriniwas Narayanan Vengarai},
        title = {Analytical evaluation of machine learning models for detecting insurance fraud},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {8},
        pages = {5479-5484},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=190866},
        abstract = {Insurance fraud is a global challenge that imposes heavy financial burdens on the industry and economy. This research provides an analytical evaluation of four machine learning models—Logistic Regression (LR), Decision Tree (DT), Random Forest (RF), and XGBoost—to identify fraudulent claims. Addressing the common issue of class imbalance in insurance data, the study utilizes a quantitative methodology involving data preprocessing and the evaluation of key performance metrics like Accuracy, Precision, Recall, and F1-score.The experimental results demonstrate that ensemble methods significantly outperform individual classifiers. While Logistic Regression served as a baseline with lower predictive power, XGBoost emerged as the superior model, achieving a Precision of 75% and an F1-score of 71%. These findings suggest that implementing gradient-boosted models can drastically improve fraud detection rates, helping insurers minimize financial leakage from undetected fraudulent activities.},
        keywords = {Insurance Fraud Detection, Machine Learning, Fraudulent Claims, Supervised Learning, Classification Algorithms, Data Preprocessing, Feature Selection, Imbalanced Data, Predictive Analytics, Risk Management},
        month = {January},
        }

Cite This Article

Karne, A. A., & Badhe, S. K., & Vengarai, S. N. (2026). Analytical evaluation of machine learning models for detecting insurance fraud. International Journal of Innovative Research in Technology (IJIRT), 12(8), 5479–5484.

Related Articles