IMAGE-TO-SPEECH CONVERSION USING OCR, TTS AND CNN

  • Unique Paper ID: 176766
  • Volume: 11
  • Issue: 11
  • PageNo: 6302-6306
  • Abstract:
  • This paper presents a system that converts textual content from images into audible speech, leveraging Optical Character Recognition (OCR), Convolutional Neural Networks (CNNs), and Text-to-Speech (TTS) technologies. The goal is to aid visually impaired individuals by enabling them to understand visual text through audio output. The system first employs CNN-based models to enhance image preprocessing, ensuring noise reduction and accurate text localization. OCR is then used to extract textual information from the processed images. Finally, a TTS engine converts the recognized text into natural-sounding speech. The integration of these technologies results in a robust and efficient pipeline capable of handling a variety of image inputs including printed documents, signage, and handwritten notes. Experimental results demonstrate the system’s effectiveness in real-world scenarios, offering a practical tool for assistive technology and human-computer interaction.

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 11
  • PageNo: 6302-6306

IMAGE-TO-SPEECH CONVERSION USING OCR, TTS AND CNN

Related Articles