Decoding Diabetes: A Journey Through Random Forest and SHAP Interpretability

  • Unique Paper ID: 179066
  • Volume: 11
  • Issue: 12
  • PageNo: 7974-7982
  • Abstract:
  • Millions of people worldwide suffer from diabetes every day, a chronic illness that affects how your body processes sugar, posing a significant burden on healthcare systems due to its long-term complications. Early diagnosis and timely intervention are essential to manage and prevent the progression of the disease. This study presents the development of a machine learning-based system designed to predict the likelihood of diabetes in individuals using commonly avail- able health parameters. Leveraging the Pima Indians Diabetes Dataset, the system incorporates features such as age, BMI, glucose level, blood pressure, insulin levels, and family history to train and evaluate multiple classification algorithms including Logistic Regression, Decision Trees, Random Forests, and Artificial Neural Networks (ANN). Among these, the Random Forest model achieved the highest performance with an accuracy of over 85%, precision of 0.90, recall of 0.86, and an F1-score of 0.87. The system also integrates SHAP-based interpretability to provide transparency in predictions, making it suitable for clinical decision support. This approach offers a scalable, cost- effective, and user-friendly solution for early diabetes detection, particularly valuable in resource-constrained healthcare settings.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{179066,
        author = {Mithun Kamashetty and Chetan C and Vishnu Anand T and P Sushmita Singh},
        title = {Decoding Diabetes: A Journey Through Random Forest and SHAP Interpretability},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {12},
        pages = {7974-7982},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=179066},
        abstract = {Millions of people worldwide suffer from
diabetes every day, a chronic illness that affects how
your body processes sugar, posing a significant burden
on healthcare systems due to its long-term
complications. Early diagnosis and timely intervention
are essential to manage and prevent the progression of
the disease. This study presents the development of a
machine learning-based system designed to predict the
likelihood of diabetes in individuals using commonly
avail- able health parameters. Leveraging the Pima
Indians Diabetes Dataset, the system incorporates
features such as age, BMI, glucose level, blood
pressure, insulin levels, and family history to train
and evaluate multiple classification algorithms including Logistic Regression, Decision Trees, Random
Forests, and Artificial Neural Networks (ANN). Among
these, the Random Forest model achieved the highest
performance with an accuracy of over 85%, precision
of 0.90, recall of 0.86, and an F1-score of 0.87. The
system also integrates SHAP-based interpretability to
provide transparency in predictions, making it suitable
for clinical decision support. This approach offers a
scalable, cost- effective, and user-friendly solution for
early diabetes detection, particularly valuable in
resource-constrained healthcare settings.},
        keywords = {Diabetes Prediction, Machine Learning, Ran- dom Forest, Artificial Neural Networks, Pima Indians Dataset, SHAP, Early Diagnosis, Clinical Decision Support},
        month = {May},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 12
  • PageNo: 7974-7982

Decoding Diabetes: A Journey Through Random Forest and SHAP Interpretability

Related Articles