GenVox: A Voice-Activated Multi-Modal Generative AI Companion

  • Unique Paper ID: 175182
  • Volume: 11
  • Issue: 11
  • PageNo: 2051-2057
  • Abstract:
  • This paper describes GenVox is a multimodal voice-driven generative AI assistant that embeds the new innovation in natural language processing (NLP), image generation, and sound generation. GenVox enables users to interact in voice commands and receive feedback in various forms like text, images, and sound. Developed as an individual assistant, a creative companion, and a learning friend, GenVox employs generative AI models to respond accordingly in order to address the requirements of the users. Major features include voice-guided text generation, AI-based story creation, content generation, question and answer, summarization, voice-activated image generation, and interactive cross-modal content generation. The project also employs technologies such as large language models (LLMs), Google gTTS for text-to-speech synthesis, Pyttsx3 for natural language processing, Python 3.0 for coding, Kivy for Android app construction, and APIs from Together AI and Hugging Face for retrieval of pre-existing generative models.

Cite This Article

  • ISSN: 2349-6002
  • Volume: 11
  • Issue: 11
  • PageNo: 2051-2057

GenVox: A Voice-Activated Multi-Modal Generative AI Companion

Related Articles