REAL TIME VOICE CLONING

  • Unique Paper ID: 151003
  • Volume: 7
  • Issue: 11
  • PageNo: 297-302
  • Abstract:
  • Recent progress in deep learning has shown impressive results in the area of speech-to-text. For this reason, a deep neural network is usually trained from a single speaker using a corpus of several hours of voice recorded professionally. Giving such a model a new voice is highly expensive, as it needs a new dataset to be collected and the model retrained. A recent research has developed a three-stage pipeline that allows you to clone an unseen voice from just a few seconds of reference speech during practice and without retraining the template. The researchers share strikingly natural-sounding findings. A Text-to-speech synthesizer is an application that converts text into spoken word, by analyzing and processing the text using Natural Language Processing (NLP) and then using Digital Signal Processing (DSP) technology to convert this processed text into synthesized speech representation of the text. Here, we developed a useful text-to-speech synthesizer in the form of a simple application that converts inputted text into synthesized speech and reads out to the user which can then be saved as an mp3. file. The development of a text to speech synthesizer will be of great help to people with visual impairment and make making through large volume of text easier.

Cite This Article

  • ISSN: 2349-6002
  • Volume: 7
  • Issue: 11
  • PageNo: 297-302

REAL TIME VOICE CLONING

Related Articles