Comparative Evaluation of CNN and Transformer-Based Models for Rice Leaf Disease Detection using Digital Image Processing Techniques

  • Unique Paper ID: 192243
  • Volume: 12
  • Issue: 9
  • PageNo: 715-720
  • Abstract:
  • Current advances in deep learning (DL) have significantly improved operation of digital image processing (DIP) systems in numerous applications. Transformer-based models have recently become formidable competitors as they have been able to recreate long-range dependencies using self-attention mechanisms and Convolutional Neural Networks (CNNs) have long dominated due to their highly effective local feature extraction ability. The paper being analysed gives an in-depth comparative analysis of CNN and Transformer-based systems to address complex digital image processing schemes. The CNN models state-of-the-art (SOTA) as well as variants of Vision Transformer are relatively compared within one framework. To ensure the fair comparison, the analysis of benchmark image datasets is conducted with standard preprocessing, training protocol development and hyperparameter circumstances. Based on a combination of a number of quantitative metrics, the performance is evaluated in terms of accuracy, precision, recall, F1-score, computational complexity, inferential time. The experimental results indicate that CNN-based models are more effective and robust at learning local spatial features whereas Transformer-based models can learn the visual global context better thus performing better in situations that require analysis of complex images. There are also studies that point out tradeoffs between the accuracy and the cost of computation, which provide an insight into the selection of the model regarding resource-constrained applications, and high-performance applications. The findings of the study can offer plausible suggestions to the researchers and practitioners to apply appropriate DL designs to optimise the digital image processing applications.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{192243,
        author = {R SATHIYAPRIYA and Dr.HANNAH INBARANI H},
        title = {Comparative Evaluation of CNN and Transformer-Based Models for Rice Leaf Disease Detection using Digital Image Processing Techniques},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {9},
        pages = {715-720},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=192243},
        abstract = {Current advances in deep learning (DL) have significantly improved operation of digital image processing (DIP) systems in numerous applications. Transformer-based models have recently become formidable competitors as they have been able to recreate long-range dependencies using self-attention mechanisms and Convolutional Neural Networks (CNNs) have long dominated due to their highly effective local feature extraction ability. The paper being analysed gives an in-depth comparative analysis of CNN and Transformer-based systems to address complex digital image processing schemes. The CNN models state-of-the-art (SOTA) as well as variants of Vision Transformer are relatively compared within one framework. To ensure the fair comparison, the analysis of benchmark image datasets is conducted with standard preprocessing, training protocol development and hyperparameter circumstances. Based on a combination of a number of quantitative metrics, the performance is evaluated in terms of accuracy, precision, recall, F1-score, computational complexity, inferential time. The experimental results indicate that CNN-based models are more effective and robust at learning local spatial features whereas Transformer-based models can learn the visual global context better thus performing better in situations that require analysis of complex images. There are also studies that point out tradeoffs between the accuracy and the cost of computation, which provide an insight into the selection of the model regarding resource-constrained applications, and high-performance applications. The findings of the study can offer plausible suggestions to the researchers and practitioners to apply appropriate DL designs to optimise the digital image processing applications.},
        keywords = {},
        month = {February},
        }

Related Articles