Text-to-speech synthesis, Natural Language Processing, Digital Signal Processing.
Abstract
Recent progress in deep learning has shown impressive results in the area of speech-to-text. For this reason, a deep neural network is usually trained from a single speaker using a corpus of several hours of voice recorded professionally. Giving such a model a new voice is highly expensive, as it needs a new dataset to be collected and the model retrained. A recent research has developed a three-stage pipeline that allows you to clone an unseen voice from just a few seconds of reference speech during practice and without retraining the template. The researchers share strikingly natural-sounding findings. A Text-to-speech synthesizer is an application that converts text into spoken word, by analyzing and processing the text using Natural Language Processing (NLP) and then using Digital Signal Processing (DSP) technology to convert this processed text into synthesized speech representation of the text. Here, we developed a useful text-to-speech synthesizer in the form of a simple application that converts inputted text into synthesized speech and reads out to the user which can then be saved as an mp3. file. The development of a text to speech synthesizer will be of great help to people with visual impairment and make making through large volume of text easier.
Article Details
Unique Paper ID: 151003
Publication Volume & Issue: Volume 7, Issue 11
Page(s): 297 - 302
Article Preview & Download
Share This Article
Join our RMS
Conference Alert
NCSEM 2024
National Conference on Sustainable Engineering and Management - 2024