AI-Driven Ensemble Learning for Heart Disease Prediction with SMOTE

  • Unique Paper ID: 202217
  • Volume: 12
  • Issue: 12
  • PageNo: 6423-6434
  • Abstract:
  • cardiovascular disease (CVD) continues to be the major cause of mortality worldwide, responsible for approximately 17.9 million deaths annually on a larger scale. The integration of artificial intelligence (AI) and machine learning (ML) in clinical diagnostics has opened new avenues for early, accurate, and automated detection of heart disease. This paper presents a comprehensive heart disease prediction model which employs a soft Voting Classifier that strategically integrates three complementary base learners: Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The proposed system is trained on the publicly available Kaggle Heart Disease dataset, which is derived from the widely recognized UCI Cleveland Heart Disease repository. Class imbalance, a common occurring challenge in medical datasets, which is effectively addressed through the application of the Synthetic Minority Over-sampling Technique (SMOTE). Feature standardization is enforced via StandardScaler to ensure uniform feature scaling prior to model training. The ensemble model achieves an exceptional accuracy of 98.54%, precision of 100%, recall of 97.09%, and F1-Score of 98.52%, largely outperforming each individual base classifier. The confusion matrix confirms only three misclassifications across 205 test instances, with zero false positives. Our proposed model shows superior performance in comparative tests, with Random Forest identifying key predictive features such as chest pain type, maximum heart rate, and major vessel fluoroscopy results. The pipeline and model are serialized using Python's pickle for direct deployment into CDSS. This research advances cardiovascular risk assessment by providing a practical, reproducible, and deployable tool for clinical use.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{202217,
        author = {Abhishek Singh and Dr.Anil Mishra},
        title = {AI-Driven Ensemble Learning for Heart Disease Prediction with SMOTE},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {12},
        pages = {6423-6434},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=202217},
        abstract = {cardiovascular disease (CVD) continues to be the major cause of mortality worldwide, responsible for approximately 17.9 million deaths annually on a larger scale. The integration of artificial intelligence (AI) and machine learning (ML) in clinical diagnostics has opened new avenues for early, accurate, and automated detection of heart disease. This paper presents a comprehensive heart disease prediction model which employs a soft Voting Classifier that strategically integrates three complementary base learners: Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The proposed system is trained on the publicly available Kaggle Heart Disease dataset, which is derived from the widely recognized UCI Cleveland Heart Disease repository. Class imbalance, a common occurring challenge in medical datasets, which is effectively addressed through the application of the Synthetic Minority Over-sampling Technique (SMOTE). Feature standardization is enforced via StandardScaler to ensure uniform feature scaling prior to model training. The ensemble model achieves an exceptional accuracy of 98.54%, precision of 100%, recall of 97.09%, and F1-Score of 98.52%, largely outperforming each individual base classifier. The confusion matrix confirms only three misclassifications across 205 test instances, with zero false positives. Our proposed model shows superior performance in comparative tests, with Random Forest identifying key predictive features such as chest pain type, maximum heart rate, and major vessel fluoroscopy results. The pipeline and model are serialized using Python's pickle for direct deployment into CDSS. This research advances cardiovascular risk assessment by providing a practical, reproducible, and deployable tool for clinical use.},
        keywords = {Heart Disease Prediction, Ensemble Learning, Soft Voting Classifier, Random Forest, XGBoost, Logistic Regression, SMOTE, Class Imbalance, Machine Learning, Healthcare Analytics, Clinical Decision Support, Cardiovascular Risk Assessment, Feature Importance, Standard Scaler.},
        month = {May},
        }

Cite This Article

Singh, A., & Mishra, D. (2026). AI-Driven Ensemble Learning for Heart Disease Prediction with SMOTE. International Journal of Innovative Research in Technology (IJIRT), 12(12), 6423–6434.

Related Articles