Vision to Voice Object Detection with Real-Time Audio Assistance

Q: How many days will it take for my paper to be published?

The review time for papers is not fixed. However, if the paper is accepted and the author completes the processing charges formalities, the paper will be published within a few working days.

Q: I would like to receive a hard copy of the journal materials. Are there any additional charges?

You can log in to the author portal and pay 500 INR to receive the hard copy materials.

Bhumireddi Harish; T. Anusha; Gandreti Sneha; Botta Vasavi Anusha; Kolipaka Keerthana; Gollamala Saiteja

Vision to Voice Object Detection with Real-Time Audio Assistance

Authors: Bhumireddi Harish, T. Anusha, Gandreti Sneha, Botta Vasavi Anusha, Kolipaka Keerthana, Gollamala Saiteja

Unique Paper ID: 175924
Volume: 11
Issue: 11
PageNo: 5039-5046

Keywords: Vision to Voice YOLOv8 Real-time Voice Assistance Blinded Assistance AI Navigation Accessibility Image Localization Data Augmentation PyTorch Edge Devices Auditory Feedback Smart Navigation Personalized Audio Guidance.

Abstract:
Vision to Voice uses the YOLOv8 algorithm for object detection which provides real-time auditory assistance to the blind and presents the environment in vocal form. Navigating through such system enhances accessibility and inclusion due to environmental cues' vocalization, smart navigation, and image processing in real time. Thus, it gives a better chance for the visually impaired to operate with audio guidance and speech feedback in their daily lives. This system makes the user feel confident in navigating complex environments by upgrading the contextual awareness balanced between human and environment interaction through deep learning and image localization. This paper aims at discussing architecture and features of YOLOv8, thereby elaborating on its achievements as compared to its previous versions. YOLOv8 with its next-generation backbone for effective feature extraction joined with another refinement for better localizing objects within the neck and anchor-free detection for better performance and flexibility. State-of-the-art augmentations such as mosaic augmentation and adaptive training strategies on the model greatly improve robustness and generalization across various datasets. YOLOv8 provides framework alternatives through PyTorch, increasing portability and allowing customization of the code for deployment on other platforms like edge devices. Experimental results have demonstrated the model's efficacy in tireless real-world applications such as assistive technologies, autonomous navigation, video surveillance, industrial automation, and healthcare.

Download article

email to a friend

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{175924,
        author = {Bhumireddi Harish and T. Anusha and Gandreti Sneha and Botta Vasavi Anusha and Kolipaka Keerthana and Gollamala Saiteja},
        title = {Vision to Voice Object Detection with Real-Time Audio Assistance},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {5039-5046},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=175924},
        abstract = {Vision to Voice uses the YOLOv8 algorithm for object detection which provides real-time auditory assistance to the blind and presents the environment in vocal form. Navigating through such system enhances accessibility and inclusion due to environmental cues' vocalization, smart navigation, and image processing in real time. Thus, it gives a better chance for the visually impaired to operate with audio guidance and speech feedback in their daily lives.
This system makes the user feel confident in navigating complex environments by upgrading the contextual awareness balanced between human and environment interaction through deep learning and image localization. This paper aims at discussing architecture and features of YOLOv8, thereby elaborating on its achievements as compared to its previous versions. YOLOv8 with its next-generation backbone for effective feature extraction joined with another refinement for better localizing objects within the neck and anchor-free detection for better performance and flexibility.
State-of-the-art augmentations such as mosaic augmentation and adaptive training strategies on the model greatly improve robustness and generalization across various datasets. YOLOv8 provides framework alternatives through PyTorch, increasing portability and allowing customization of the code for deployment on other platforms like edge devices. Experimental results have demonstrated the model's efficacy in tireless real-world applications such as assistive technologies, autonomous navigation, video surveillance, industrial automation, and healthcare.},
        keywords = {Vision to Voice, YOLOv8, Real-time Voice Assistance, Blinded Assistance, AI Navigation, Accessibility, Image Localization, Data Augmentation, PyTorch, Edge Devices, Auditory Feedback, Smart Navigation, Personalized Audio Guidance.},
        month = {April},
        }

Download .bib

Cite This Article

ISSN: 2349-6002
Volume: 11
Issue: 11
PageNo: 5039-5046

Vision to Voice Object Detection with Real-Time Audio Assistance

Available:https://ijirt.org/article?manuscript=175924

Impact Factor
8.01 (Year 2024)

An UGC-Compliant International Research Journal

Join Our IPN

IJIRT Partner Network

Submit your research paper and those of your network (friends, colleagues, or peers) through your IPN account, and receive 800 INR for each paper that gets published.

Join Now