Advances In Real-Time Object Detection

  • Unique Paper ID: 194729
  • Volume: 12
  • Issue: 10
  • PageNo: 7481-7495
  • Abstract:
  • Object detection has emerged as one of the most transformative subfields of computer vision, underpinned by breakthroughs in deep learning, convolutional neural networks (CNNs), and, most recently, vision transformer architectures. Its commercial and scientific impact spans autonomous vehicles, medical diagnostics, industrial automation, smart surveillance, retail intelligence, and human–computer interaction. The rapid pace of innovation from classical handcrafted feature methods to fully data-driven, attention-based pipelines has continuously redefined the boundaries of what machines can perceive and interpret in real time. This research manuscript presents a unified, in-depth examination of real-time object detection along two complementary axes. The first dimension covers the practical design, implementation, and experimental evaluation of a CPU-friendly real-time object detection system built on SSD MobileNet V3 and the OpenCV Deep Neural Network (DNN) module, featuring an interactive Tkinter GUI supporting image, video, and webcam inputs. The second dimension provides an extensive theoretical survey of the full spectrum of deep learning–based object detection paradigms from region-based detectors (R-CNN, Fast R-CNN, Faster R-CNN) and one-stage detectors (YOLO, SSD, RetinaNet) to anchor-free methods (FCOS, CenterNet), transformer-based architectures (DETR, Deformable DETR, RT-DETR), and neural architecture search (NAS)-optimized models (YOLO-NAS). The manuscript integrates the latest innovations from 2023 to 2025, including YOLOv8 through YOLOv11, hybrid CNN–transformer architectures, multi-modal sensor fusion, and edge-AI deployment strategies such as quantization, pruning, and knowledge distillation. Experimental results confirm that SSD MobileNet V3 achieves 19–23 frames per second on standard CPU hardware with approximately 22–25 mAP on the COCO benchmark, representing an ideal balance between accessibility, efficiency, and practical accuracy for resource-constrained deployments. The comparative analysis with contemporary models including YOLOv11-L at 58+ mAP and RT-DETR at 46–50 mAP contextualizes these results within the current state of the art. The manuscript concludes with a forward-looking discussion of emerging research frontiers, including self-supervised learning, tiny-object detection, multi-modal fusion, and explainable AI in detection systems.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{194729,
        author = {Rupali Premraj Kale},
        title = {Advances In Real-Time Object Detection},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {10},
        pages = {7481-7495},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=194729},
        abstract = {Object detection has emerged as one of the most transformative subfields of computer vision, underpinned by breakthroughs in deep learning, convolutional neural networks (CNNs), and, most recently, vision transformer architectures. Its commercial and scientific impact spans autonomous vehicles, medical diagnostics, industrial automation, smart surveillance, retail intelligence, and human–computer interaction. The rapid pace of innovation from classical handcrafted feature methods to fully data-driven, attention-based pipelines has continuously redefined the boundaries of what machines can perceive and interpret in real time.
This research manuscript presents a unified, in-depth examination of real-time object detection along two complementary axes. The first dimension covers the practical design, implementation, and experimental evaluation of a CPU-friendly real-time object detection system built on SSD MobileNet V3 and the OpenCV Deep Neural Network (DNN) module, featuring an interactive Tkinter GUI supporting image, video, and webcam inputs. The second dimension provides an extensive theoretical survey of the full spectrum of deep learning–based object detection paradigms from region-based detectors (R-CNN, Fast R-CNN, Faster R-CNN) and one-stage detectors (YOLO, SSD, RetinaNet) to anchor-free methods (FCOS, CenterNet), transformer-based architectures (DETR, Deformable DETR, RT-DETR), and neural architecture search (NAS)-optimized models (YOLO-NAS).
The manuscript integrates the latest innovations from 2023 to 2025, including YOLOv8 through YOLOv11, hybrid CNN–transformer architectures, multi-modal sensor fusion, and edge-AI deployment strategies such as quantization, pruning, and knowledge distillation. Experimental results confirm that SSD MobileNet V3 achieves 19–23 frames per second on standard CPU hardware with approximately 22–25 mAP on the COCO benchmark, representing an ideal balance between accessibility, efficiency, and practical accuracy for resource-constrained deployments. The comparative analysis with contemporary models including YOLOv11-L at 58+ mAP and RT-DETR at 46–50 mAP contextualizes these results within the current state of the art. The manuscript concludes with a forward-looking discussion of emerging research frontiers, including self-supervised learning, tiny-object detection, multi-modal fusion, and explainable AI in detection systems.},
        keywords = {Real-time object detection, SSD Mobile Net V3, YOLO, DETR, OpenCV-DNN, transformer architectures, edge AI, COCO benchmark, deep learning, convolutional neural networks, multi-modal detection, neural architecture search},
        month = {March},
        }

Cite This Article

Kale, R. P. (2026). Advances In Real-Time Object Detection. International Journal of Innovative Research in Technology (IJIRT), 12(10), 7481–7495.

Related Articles