Optimizing Breast Cancer Prediction and Identifying a Machine Learning Model using a Data – Driven Approach

  • Unique Paper ID: 171280
  • PageNo: 4070-4077
  • Abstract:
  • Breast cancer is among the most common cancers that have affected men and women in the world. Identifying it promptly is essential to improve patient outcomes. Beginning with the preprocessing of a dataset related to tumor characteristics, the project involves cleaning the data, encoding categorical variables, and standardizing features. It tests a variety of machine learning algorithms, ranging from basic models such as Gradient Boosting, Neural Networks, K-Nearest Neighbors (KNN), Decision Trees, Random Forests, Support Vector Machine (SVM), and Logistic Regression, alongside advanced models like XGBoost and LightGBM. Evaluation of each model in terms of metrics including accuracy, precision, recall, F1-score, and ROC-AUC score with hyperparameter tuning is conducted for selected models like Random Forest to enhance performance. The results go through comparison and documentation in order to identify the optimal model so that future work recommendations can be provided, including potential dataset enhancements and model refinements.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{171280,
        author = {Ishita Roy and K. Vaishnavi and J. Mahesh babu and B. Mojesh},
        title = {Optimizing Breast Cancer Prediction and Identifying a Machine Learning Model using a Data – Driven Approach},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {7},
        pages = {4070-4077},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=171280},
        abstract = {Breast cancer is among the most common cancers that have affected men and women in the world. Identifying it promptly is essential to improve patient outcomes. Beginning with the preprocessing of a dataset related to tumor characteristics, the project involves cleaning the data, encoding categorical variables, and standardizing features. It tests a variety of machine learning algorithms, ranging from basic models such as Gradient Boosting, Neural Networks, K-Nearest Neighbors (KNN), Decision Trees, Random Forests, Support Vector Machine (SVM), and Logistic Regression, alongside advanced models like XGBoost and LightGBM. Evaluation of each model in terms of metrics including accuracy, precision, recall, F1-score, and ROC-AUC score with hyperparameter tuning is conducted for selected models like Random Forest to enhance performance. The results go through comparison and documentation in order to identify the optimal model so that future work recommendations can be provided, including potential dataset enhancements and model refinements.},
        keywords = {Data driven approach, Model Evaluation, Gradient Boosting, Neural Networks, K-Nearest Neighbors, Decision Trees, Random Forests, Support Vector Machine (SVM), Logistic Regression, LightGBM, XGBoost.},
        month = {January},
        }

Cite This Article

Roy, I., & Vaishnavi, K., & babu, J. M., & Mojesh, B. (2025). Optimizing Breast Cancer Prediction and Identifying a Machine Learning Model using a Data – Driven Approach. International Journal of Innovative Research in Technology (IJIRT), 11(7), 4070–4077.

Related Articles