YOLO_ViT_CNN: A Hybrid YOLOv8-Based Vision Transformer-CNN Model for Thermal Fault Detection in Electrical Switchgear

  • Unique Paper ID: 181855
  • PageNo: 338-343
  • Abstract:
  • Thermal faults in electrical switchgear systems pose significant operational and safety risks in industrial settings. Traditional inspection methods using infrared thermography are often manual, time-consuming, and heavily reliant on expert interpretation. To address these limitations, this paper presents YOLO_ViT_CNN, a hybrid deep learning model that integrates the real-time object detection power of YOLOv8, the local feature extraction capabilities of Convolutional Neural Networks (CNNs), and the global contextual understanding of Vision Transformers (ViT).The proposed model is designed to detect six critical fault types in thermal images of electrical switchgear: loose connections, insulation degradation, circuit breaker faults, overloads, phase imbalances, and normal operation. A custom infrared dataset was developed with labeled bounding boxes and YOLO-format annotations for training and evaluation. YOLO_ViT_CNN utilizes a CNN backbone to extract spatial features, followed by ViT-based encoder blocks to capture long-range dependencies. The YOLOv8 detection head is retained to enable high-speed inference. Experimental training over five epochs demonstrated strong performance, achieving a validation accuracy of 94.62%, a mean average precision (mAP@0.5) of 95.6%, and a real-time processing speed of 58 FPS.The results confirm the model’s ability to enhance fault detection precision in thermal imaging scenarios. Future work will explore hardware deployment, broader dataset generalization, and integration with smart grid monitoring systems.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{181855,
        author = {Nikhil Sanjay Dongare and Shubhangi P Tidke and Prashant Kulkarni},
        title = {YOLO_ViT_CNN: A Hybrid YOLOv8-Based Vision Transformer-CNN Model for Thermal Fault Detection in Electrical Switchgear},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {2},
        pages = {338-343},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=181855},
        abstract = {Thermal faults in electrical switchgear systems pose significant operational and safety risks in industrial settings. Traditional inspection methods using infrared thermography are often manual, time-consuming, and heavily reliant on expert interpretation. To address these limitations, this paper presents YOLO_ViT_CNN, a hybrid deep learning model that integrates the real-time object detection power of YOLOv8, the local feature extraction capabilities of Convolutional Neural Networks (CNNs), and the global contextual understanding of Vision Transformers (ViT).The proposed model is designed to detect six critical fault types in thermal images of electrical switchgear: loose connections, insulation degradation, circuit breaker faults, overloads, phase imbalances, and normal operation. A custom infrared dataset was developed with labeled bounding boxes and YOLO-format annotations for training and evaluation. YOLO_ViT_CNN utilizes a CNN backbone to extract spatial features, followed by ViT-based encoder blocks to capture long-range dependencies. The YOLOv8 detection head is retained to enable high-speed inference. Experimental training over five epochs demonstrated strong performance, achieving a validation accuracy of 94.62%, a mean average precision (mAP@0.5) of 95.6%, and a real-time processing speed of 58 FPS.The results confirm the model’s ability to enhance fault detection precision in thermal imaging scenarios. Future work will explore hardware deployment, broader dataset generalization, and integration with smart grid monitoring systems.},
        keywords = {YOLOv8 Thermal Imaging Infrared Fault Detection Vision Transformer CNN Electrical Switchgear Real-Time Detection Deep Learning mAP@0.5},
        month = {July},
        }

Cite This Article

Dongare, N. S., & Tidke, S. P., & Kulkarni, P. (2025). YOLO_ViT_CNN: A Hybrid YOLOv8-Based Vision Transformer-CNN Model for Thermal Fault Detection in Electrical Switchgear. International Journal of Innovative Research in Technology (IJIRT), 12(2), 338–343.

Related Articles