Vision to Voice Object Detection with Real-Time Audio Assistance

  • Unique Paper ID: 175924
  • Volume: 11
  • Issue: 11
  • PageNo: 5039-5046
  • Abstract:
  • Vision to Voice uses the YOLOv8 algorithm for object detection which provides real-time auditory assistance to the blind and presents the environment in vocal form. Navigating through such system enhances accessibility and inclusion due to environmental cues' vocalization, smart navigation, and image processing in real time. Thus, it gives a better chance for the visually impaired to operate with audio guidance and speech feedback in their daily lives. This system makes the user feel confident in navigating complex environments by upgrading the contextual awareness balanced between human and environment interaction through deep learning and image localization. This paper aims at discussing architecture and features of YOLOv8, thereby elaborating on its achievements as compared to its previous versions. YOLOv8 with its next-generation backbone for effective feature extraction joined with another refinement for better localizing objects within the neck and anchor-free detection for better performance and flexibility. State-of-the-art augmentations such as mosaic augmentation and adaptive training strategies on the model greatly improve robustness and generalization across various datasets. YOLOv8 provides framework alternatives through PyTorch, increasing portability and allowing customization of the code for deployment on other platforms like edge devices. Experimental results have demonstrated the model's efficacy in tireless real-world applications such as assistive technologies, autonomous navigation, video surveillance, industrial automation, and healthcare.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{175924,
        author = {Bhumireddi Harish and T. Anusha and Gandreti Sneha and Botta Vasavi Anusha and Kolipaka Keerthana and Gollamala Saiteja},
        title = {Vision to Voice Object Detection with Real-Time Audio Assistance},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {5039-5046},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=175924},
        abstract = {Vision to Voice uses the YOLOv8 algorithm for object detection which provides real-time auditory assistance to the blind and presents the environment in vocal form. Navigating through such system enhances accessibility and inclusion due to environmental cues' vocalization, smart navigation, and image processing in real time. Thus, it gives a better chance for the visually impaired to operate with audio guidance and speech feedback in their daily lives.
This system makes the user feel confident in navigating complex environments by upgrading the contextual awareness balanced between human and environment interaction through deep learning and image localization. This paper aims at discussing architecture and features of YOLOv8, thereby elaborating on its achievements as compared to its previous versions. YOLOv8 with its next-generation backbone for effective feature extraction joined with another refinement for better localizing objects within the neck and anchor-free detection for better performance and flexibility.
State-of-the-art augmentations such as mosaic augmentation and adaptive training strategies on the model greatly improve robustness and generalization across various datasets. YOLOv8 provides framework alternatives through PyTorch, increasing portability and allowing customization of the code for deployment on other platforms like edge devices. Experimental results have demonstrated the model's efficacy in tireless real-world applications such as assistive technologies, autonomous navigation, video surveillance, industrial automation, and healthcare.},
        keywords = {Vision to Voice, YOLOv8, Real-time Voice Assistance, Blinded Assistance, AI Navigation, Accessibility, Image Localization, Data Augmentation, PyTorch, Edge Devices, Auditory Feedback, Smart Navigation, Personalized Audio Guidance.},
        month = {April},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 11
  • PageNo: 5039-5046

Vision to Voice Object Detection with Real-Time Audio Assistance

Related Articles