Multiple Dataset Problems while Detecting Early Stage Lung Cancer

  • Unique Paper ID: 172781
  • Volume: 11
  • Issue: 9
  • PageNo: 820-826
  • Abstract:
  • Lung cancer is one of the leading causes of cancer-related deaths worldwide, and early detection significantly improves patient survival rates. Deep learning models, such as Convolutional Neural Networks (CNNs), Linear Discriminant Analysis (LDA), Recurrent Neural Networks (RNNs), Autoencoders, and Transformer-based models, can be utilized to automate lung cancer detection from medical imaging. However, a major challenge in developing a robust deep learning model is the variability in imaging data, which arises due to differences in X-ray machines and scanning techniques. This research highlights the impact of dataset variability on lung cancer detection. We utilize the LIDC-IDRI dataset from The Cancer Imaging Archive (TCIA), which contains lung CT scans from multiple imaging sources. The variability in image quality, contrast, and resolution across different machines introduces inconsistencies that hinder effective model training and generalization. This study focuses on analyzing these challenges and discussing potential solutions, such as dataset standardization and domain adaptation techniques, to enhance the reliability of deep learning-based lung cancer detection.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{172781,
        author = {Rahul Sharma and Lakshay Singhal and Gursharan Singh},
        title = {Multiple Dataset Problems while Detecting Early Stage Lung Cancer},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {9},
        pages = {820-826},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=172781},
        abstract = {Lung cancer is one of the leading causes of cancer-related deaths worldwide, and early detection significantly improves patient survival rates. Deep learning models, such as Convolutional Neural Networks (CNNs), Linear Discriminant Analysis (LDA), Recurrent Neural Networks (RNNs), Autoencoders, and Transformer-based models, can be utilized to automate lung cancer detection from medical imaging. However, a major challenge in developing a robust deep learning model is the variability in imaging data, which arises due to differences in X-ray machines and scanning techniques. This research highlights the impact of dataset variability on lung cancer detection. We utilize the LIDC-IDRI dataset from The Cancer Imaging Archive (TCIA), which contains lung CT scans from multiple imaging sources. The variability in image quality, contrast, and resolution across different machines introduces inconsistencies that hinder effective model training and generalization. This study focuses on analyzing these challenges and discussing potential solutions, such as dataset standardization and domain adaptation techniques, to enhance the reliability of deep learning-based lung cancer detection.},
        keywords = {Lung Cancer Detection, Deep Learning, Convolutional Neural Networks (CNN), Linear Discriminant Analysis (LDA), Recurrent Neural Networks (RNN). Auto-encoders, Transformer-based Models, Medical Imaging, Dataset Variability, LIDC-IDRI Dataset, X-ray Machine Variability, Image Preprocessing, Domain Adaptation, Early Stage Lung Cancer.},
        month = {February},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 9
  • PageNo: 820-826

Multiple Dataset Problems while Detecting Early Stage Lung Cancer

Related Articles