AI-Based Multimodal Virtual Assistant for Desktop Automation

  • Unique Paper ID: 194812
  • Volume: 12
  • Issue: 10
  • PageNo: 6223-6228
  • Abstract:
  • Human–computer interaction has evolved significantly with the integration of artificial intelligence, enabling more natural and intuitive communication between users and computing systems. This paper presents the design and implementation of an AI-powered multimodal virtual assistant capable of performing desktop automation tasks through voice commands and hand gesture interactions. The system integrates hybrid speech recognition techniques using Google Speech API for online recognition and the Vosk speech recognition model for offline processing. This hybrid approach ensures continuous functionality even in the absence of internet connectivity. The assistant is implemented in Python and incorporates several libraries including Speech Recognition, PyAudio, Pyttsx3, OpenCV, Media Pipe, Pandas, Tkinter, and Matplotlib. In addition to voice interaction, the system supports gesture-based mouse control through real-time hand tracking. The assistant also includes data analysis capabilities, allowing users to perform dataset visualization using voice commands. Experimental evaluation demonstrates that the proposed system achieves reliable speech recognition, smooth gesture-based cursor control, and efficient response time suitable for real-time applications. The proposed multimodal assistant provides an effective solution for intelligent desktop automation and enhances accessibility in human–computer interaction.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{194812,
        author = {Mikkili koushik and Uppala Akash and Uppala Abhilash and Merlyne Sandra Christina},
        title = {AI-Based Multimodal Virtual Assistant for Desktop Automation},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {10},
        pages = {6223-6228},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=194812},
        abstract = {Human–computer interaction has evolved significantly with the integration of artificial intelligence, enabling more natural and intuitive communication between users and computing systems. This paper presents the design and implementation of an AI-powered multimodal virtual assistant capable of performing desktop automation tasks through voice commands and hand gesture interactions. The system integrates hybrid speech recognition techniques using Google Speech API for online recognition and the Vosk speech recognition model for offline processing. This hybrid approach ensures continuous functionality even in the absence of internet connectivity.
The assistant is implemented in Python and incorporates several libraries including Speech Recognition, PyAudio, Pyttsx3, OpenCV, Media Pipe, Pandas, Tkinter, and Matplotlib. In addition to voice interaction, the system supports gesture-based mouse control through real-time hand tracking. The assistant also includes data analysis capabilities, allowing users to perform dataset visualization using voice commands.
Experimental evaluation demonstrates that the proposed system achieves reliable speech recognition, smooth gesture-based cursor control, and efficient response time suitable for real-time applications. The proposed multimodal assistant provides an effective solution for intelligent desktop automation and enhances accessibility in human–computer interaction.},
        keywords = {Hybrid voice assistant, Offline speech recognition, Gesture control, Media Pipe, Human–computer interaction.},
        month = {March},
        }

Cite This Article

koushik, M., & Akash, U., & Abhilash, U., & Christina, M. S. (2026). AI-Based Multimodal Virtual Assistant for Desktop Automation. International Journal of Innovative Research in Technology (IJIRT), 12(10), 6223–6228.

Related Articles