Deep Learning-Based Underwater Object Detection and Tracking with An Intelligent Chatbot Interface

  • Unique Paper ID: 200939
  • PageNo: 68-77
  • Abstract:
  • Underwater environments introduce severe imaging degradations including color distortion, light scattering, and turbidity-induced blur, which fundamentally impair the performance of conventional computer vision systems. This paper presents a comprehensive deep learning-based framework for underwater object detection and tracking that integrates image enhancement, transformer-augmented convolutional feature extraction, multi- modal sensor fusion, and a real-time chatbot interface for intuitive user interaction. The proposed detection backbone employs a YOLOv8-based architecture enhanced with attention gating and depth wise separable convolutions to achieve efficient inference on resource-constrained platforms. Object tracking is accomplished through a hybrid approach combining Deep SORT with Siamese feature embedding networks, sustaining trajectory continuity under occlusion and scale variation. Multi-modal fusion of optical RGB imagery and sonar depth data improves scene understanding in zero-visibility conditions. All detected objects, associated confidence scores, class labels, and temporal metadata are persistently stored in a structured relational database. A Retrieval-Augmented Generation (RAG) chatbot interface, built on a large language model backend with Model Context Protocol (MCP) integration, enables natural language querying of stored detection results. Evaluated on the UIEB and RUOD benchmark datasets, the proposed system achieves a mean Average Precision (mAP@0.5) of 84.7%, a real-time inference speed of 38 FPS, and an Identity F1-Score (IDF1) of 82.3% for multi-object tracking, outperforming comparable state-of-the-art methods.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{200939,
        author = {Dr. A. Jagan and Arunkumar S and Ragunath D and Ragul J and Sudharsanan K},
        title = {Deep Learning-Based Underwater Object Detection and Tracking with An Intelligent Chatbot Interface},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {no},
        pages = {68-77},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=200939},
        abstract = {Underwater environments introduce severe imaging degradations including color distortion, light scattering, and turbidity-induced blur, which fundamentally impair the performance of conventional computer vision systems. This paper presents a comprehensive deep learning-based framework for underwater object detection and tracking that integrates image enhancement, transformer-augmented convolutional feature extraction, multi- modal sensor fusion, and a real-time chatbot interface for intuitive user interaction. The proposed detection backbone employs a YOLOv8-based architecture enhanced with attention gating and depth wise separable convolutions to achieve efficient inference on resource-constrained platforms. Object tracking is accomplished through a hybrid approach combining Deep SORT with Siamese feature embedding networks, sustaining trajectory continuity under occlusion and scale variation. Multi-modal fusion of optical RGB imagery and sonar depth data improves scene understanding in zero-visibility conditions. All detected objects, associated confidence scores, class labels, and temporal metadata are persistently stored in a structured relational database. A Retrieval-Augmented Generation (RAG) chatbot interface, built on a large language model backend with Model Context Protocol (MCP) integration, enables natural language querying of stored detection results. Evaluated on the UIEB and RUOD benchmark datasets, the proposed system achieves a mean Average Precision (mAP@0.5) of 84.7%, a real-time inference speed of 38 FPS, and an Identity F1-Score (IDF1) of 82.3% for multi-object tracking, outperforming comparable state-of-the-art methods.},
        keywords = {Underwater computer vision, deep learning, YOLOv8, object detection, multi-object tracking, image enhancement, multi-modal fusion, chatbot interface, retrieval-augmented generation, real-time inference.},
        month = {May},
        }

Cite This Article

Jagan, D. A., & S, A., & D, R., & J, R., & K, S. (2026). Deep Learning-Based Underwater Object Detection and Tracking with An Intelligent Chatbot Interface. International Journal of Innovative Research in Technology (IJIRT), 68–77.

Related Articles