Improving Medical Insurance Cost Prediction Accuracy with Explainable Supervised Machine Learning based Classification Techniques

Q: How many days will it take for my paper to be published?

The review time for papers is not fixed. However, if the paper is accepted and the author completes the processing charges formalities, the paper will be published within a few working days.

Q: I would like to receive a hard copy of the journal materials. Are there any additional charges?

You can log in to the author portal and pay 500 INR to receive the hard copy materials.

Appurva Sharma; Alok Bansal; Subhash Chandra Jat

Improving Medical Insurance Cost Prediction Accuracy with Explainable Supervised Machine Learning based Classification Techniques

Authors: Appurva Sharma, Alok Bansal, Subhash Chandra Jat

Unique Paper ID: 182714
Volume: 12
Issue: 2
PageNo: 3181-3195

Keywords: Healthcare Medical Insurance Costs Machine Learning LightGBM CatBoost Decision Tree Class Imbalance Explainable AI GridSearchCV.

Abstract:
Health insurance plans help people financially by covering medical bills and reducing the financial burden of disease. Healthcare and health insurance premiums are influenced by a multitude of variables. The right level of coverage and possible advantages may be better identified with the help of early cost predictions for health insurance. In the insurance sector, ML has the potential to increase policy efficiency. Machine learning algorithms are quite good at predicting expensive healthcare costs. Traditional actuarial methods often fall short in capturing complex relationships in the data. Machine learning models, especially ensemble techniques like LightGBM, CatBoost, and Decision Trees, offer improved accuracy and interpretability. The primary objective of this study is to create supervised ML models capable of producing accurate predictions about the cost of health insurance. The dataset, Medicalpremium.csv from Kaggle, was preprocessed through data cleaning, feature scaling using Standard Scaler, and class balancing using Random Over Sampler. Three advanced regression models—LightGBM, CatBoost, and Decision Tree were developed and compared against baseline models like XGBoost and Random Forest. Model performance was assessed using R-square, MAE, RMSE, and MAPE, and hyperparameter tweaking was done via Grid-SearchCV. LightGBM emerged as the best model with an R-square of 98.67%, outperforming CatBoost (97.62%) and Decision Tree (96.18%), as well as traditional models like XGBoost (82.78%) and Random Forest (82.25%). Visual explainability was incorporated through learning curves, actual vs. predicted plots, residuals, Q-Q plots, prediction error plots, and ICE plots. The study concludes that ensemble-based boosting models, especially LightGBM, offer superior accuracy and generalization in predicting medical insurance costs, establishing a reliable methodology for real-world healthcare applications.

Download article

email to a friend

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{182714,
        author = {Appurva Sharma and Alok Bansal and Subhash Chandra Jat},
        title = {Improving Medical Insurance Cost Prediction Accuracy with Explainable Supervised Machine Learning based Classification Techniques},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {2},
        pages = {3181-3195},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=182714},
        abstract = {Health insurance plans help people financially by covering medical bills and reducing the financial burden of disease.  Healthcare and health insurance premiums are influenced by a multitude of variables. The right level of coverage and possible advantages may be better identified with the help of early cost predictions for health insurance.  In the insurance sector, ML has the potential to increase policy efficiency.  Machine learning algorithms are quite good at predicting expensive healthcare costs. Traditional actuarial methods often fall short in capturing complex relationships in the data. Machine learning models, especially ensemble techniques like LightGBM, CatBoost, and Decision Trees, offer improved accuracy and interpretability.  The primary objective of this study is to create supervised ML models capable of producing accurate predictions about the cost of health insurance. The dataset, Medicalpremium.csv from Kaggle, was preprocessed through data cleaning, feature scaling using Standard Scaler, and class balancing using Random Over Sampler. Three advanced regression models—LightGBM, CatBoost, and Decision Tree were developed and compared against baseline models like XGBoost and Random Forest. Model performance was assessed using R-square, MAE, RMSE, and MAPE, and hyperparameter tweaking was done via Grid-SearchCV. LightGBM emerged as the best model with an R-square of 98.67%, outperforming CatBoost (97.62%) and Decision Tree (96.18%), as well as traditional models like XGBoost (82.78%) and Random Forest (82.25%). Visual explainability was incorporated through learning curves, actual vs. predicted plots, residuals, Q-Q plots, prediction error plots, and ICE plots. The study concludes that ensemble-based boosting models, especially LightGBM, offer superior accuracy and generalization in predicting medical insurance costs, establishing a reliable methodology for real-world healthcare applications.},
        keywords = {Healthcare, Medical Insurance Costs, Machine Learning, LightGBM, CatBoost, Decision Tree, Class Imbalance, Explainable AI, GridSearchCV.},
        month = {July},
        }

Download .bib

Cite This Article

ISSN: 2349-6002
Volume: 12
Issue: 2
PageNo: 3181-3195

Improving Medical Insurance Cost Prediction Accuracy with Explainable Supervised Machine Learning based Classification Techniques

Available:https://ijirt.org/article?manuscript=182714

Impact Factor
8.01 (Year 2024)

An UGC-Compliant International Research Journal

Join Our IPN

IJIRT Partner Network

Submit your research paper and those of your network (friends, colleagues, or peers) through your IPN account, and receive 800 INR for each paper that gets published.

Join Now