A Deep Learning Enhanced Xgboost Model for Automated Data Correction

  • Unique Paper ID: 175712
  • PageNo: 4200-4203
  • Abstract:
  • In order to predict and fill in the gaps in categorical datasets, this research looked into the use of machine learning algorithms. The emphasis was on ensemble models constructed using the Error Correction Output Codes (ECOC) framework, including models based on SVM and KNN as well as a hybrid classifier that combines models based on SVM, KNN, and MLP. Three diverse datasets—the CPU, Hypothyroid, and Breast Cancer datasets—were employed to validate these algorithms. Results indicated that these machine learning techniques provided substantial performance in predicting and completing missing data, with the effectiveness varying based on the specific dataset and missing data pattern. Compared to solo models, ensemble models that made use of the ECOC framework significantly improved prediction accuracy and robustness. Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data and the possibility of over-fitting. Subsequent research endeavors ought to evaluate the feasibility and efficacy of deep learning algorithms in the context of the imputation of missing data.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{175712,
        author = {G. Haritha and Dr. R. Yamuna},
        title = {A Deep Learning Enhanced Xgboost Model for Automated Data Correction},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {4200-4203},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=175712},
        abstract = {In order to predict and fill in the gaps in categorical datasets, this research looked into the use of machine learning algorithms. The emphasis was on ensemble models constructed using the Error Correction Output Codes (ECOC) framework, including models based on SVM and KNN as well as a hybrid classifier that combines models based on SVM, KNN, and MLP. Three diverse datasets—the CPU, Hypothyroid, and Breast Cancer datasets—were employed to validate these algorithms. Results indicated that these machine learning techniques provided substantial performance in predicting and completing missing data, with the effectiveness varying based on the specific dataset and missing data pattern. Compared to solo models, ensemble models that made use of the ECOC framework significantly improved prediction accuracy and robustness. Deep learning for missing data imputation has obstacles despite these encouraging results, including the requirement for large amounts of labeled data and the possibility of over-fitting. Subsequent research endeavors ought to evaluate the feasibility and efficacy of deep learning algorithms in the context of the imputation of missing data.},
        keywords = {Data cleansing, missing data imputation, classification, regression and categorical datasets.},
        month = {April},
        }

Cite This Article

Haritha, G., & Yamuna, D. R. (2025). A Deep Learning Enhanced Xgboost Model for Automated Data Correction. International Journal of Innovative Research in Technology (IJIRT), 11(11), 4200–4203.

Related Articles