AI-Enhanced Telemedicine: Real-Time Speech Emotion Recognition and Contextual Recommendations Using LSTM and Generative AI

  • Unique Paper ID: 178909
  • Volume: 11
  • Issue: 12
  • PageNo: 4902-4909
  • Abstract:
  • This paper presents a context-aware speech emotion recognition system tailored for telemedicine, leveraging multi-input deep learning models and real-time voice analysis to detect patient emotions. The proposed system integrates audio feature extraction using Mel-frequency cepstral coefficients (MFCCs), LSTM-based emotion classification, and personalized recommendation generation via the Gemini Pro API. A Streamlit-based interface facilitates seamless interaction, while real-time audio input and session tracking enable clinicians to monitor patient emotional trends. Experimental results demonstrate the model’s effectiveness in identifying seven distinct emotions, offering a novel approach to enhancing empathetic care in remote medical consultations.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{178909,
        author = {Veerendar S and Dr. Karthick Raghunath and Vasanth Raj and Robel B},
        title = {AI-Enhanced Telemedicine: Real-Time Speech Emotion Recognition and Contextual Recommendations Using LSTM and Generative AI},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {12},
        pages = {4902-4909},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=178909},
        abstract = {This paper presents a context-aware speech emotion recognition system tailored for telemedicine, leveraging multi-input deep learning models and real-time voice analysis to detect patient emotions. The proposed system integrates audio feature extraction using Mel-frequency cepstral coefficients (MFCCs), LSTM-based emotion classification, and personalized recommendation generation via the Gemini Pro API. A Streamlit-based interface facilitates seamless interaction, while real-time audio input and session tracking enable clinicians to monitor patient emotional trends. Experimental results demonstrate the model’s effectiveness in identifying seven distinct emotions, offering a novel approach to enhancing empathetic care in remote medical consultations.},
        keywords = {Speech Emotion Recognition, Telemedicine, Deep Learning, Context-Aware Systems, MFCC, LSTM, Streamlit, Gemini Pro API, Real-Time Emotion Detection, Personalized Recommendations},
        month = {May},
        }

Related Articles