NeuroCalm: A Multimodal Affective Computing Framework for Mental Health Monitoring

  • Unique Paper ID: 195871
  • Volume: 12
  • Issue: 11
  • PageNo: 1996-2001
  • Abstract:
  • Mental health disorders afflict over one billion individuals globally, yet timely, affordable, and stigma-free care remains critically inaccessible for the majority. Conventional diagnostic pathways depend on in-person clinical consultations constrained by clinician availability, geographic reach, and cost. This paper presents NeuroCalm, a full-stack multimodal artificial intelligence platform bridging this gap through three tightly coupled affective computing subsystems: (1) a real-time facial micro-expression classifier built on a fine-tuned YOLOv11 convolutional neural network trained on the MEFC dataset, recognising seven discrete emotional states with 5-frame temporal smoothing; (2) an acoustic emotion recognition engine that distils speech into an 18-dimensional paralinguistic feature vector—encompassing Mel-Frequency Cepstral Coefficients (MFCCs), fundamental pitch, jitter, shimmer, and spectral bandwidth—fed into a trained Multi-Layer Perceptron (MLP) classifier; and (3) a context-aware natural language processing pipeline employing the VADER lexicon-based sentiment analyser coupled with Google Gemini 2.x large language models acting as an empathetic AI therapy companion. These channels are fused in a tri-modal inference pipeline exposed via a FastAPI backend and WebSocket interface, consumed by a Next.js 15 web application. The platform further integrates PIN-authenticated private journaling, guided breathing exercises, mood logging, and an analytics dashboard. Empirical results demonstrate 15–25 FPS real-time visual inference on CPU-only hardware and end-to-end audio analysis latency below 250 ms. NeuroCalm represents a meaningful advance toward democratising continuous, non-intrusive mental health support through multimodal affective computing.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{195871,
        author = {Hajrah Saleha Imran Kazi and Suryavanshi Shreya Shahaji and Hiral Pareshkumar Adesara},
        title = {NeuroCalm: A Multimodal Affective Computing Framework for Mental Health Monitoring},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {11},
        pages = {1996-2001},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=195871},
        abstract = {Mental health disorders afflict over one billion individuals globally, yet timely, affordable, and stigma-free care remains critically inaccessible for the majority. Conventional diagnostic pathways depend on in-person clinical consultations constrained by clinician availability, geographic reach, and cost. This paper presents NeuroCalm, a full-stack multimodal artificial intelligence platform bridging this gap through three tightly coupled affective computing subsystems: (1) a real-time facial micro-expression classifier built on a fine-tuned YOLOv11 convolutional neural network trained on the MEFC dataset, recognising seven discrete emotional states with 5-frame temporal smoothing; (2) an acoustic emotion recognition engine that distils speech into an 18-dimensional paralinguistic feature vector—encompassing Mel-Frequency Cepstral Coefficients (MFCCs), fundamental pitch, jitter, shimmer, and spectral bandwidth—fed into a trained Multi-Layer Perceptron (MLP) classifier; and (3) a context-aware natural language processing pipeline employing the VADER lexicon-based sentiment analyser coupled with Google Gemini 2.x large language models acting as an empathetic AI therapy companion. These channels are fused in a tri-modal inference pipeline exposed via a FastAPI backend and WebSocket interface, consumed by a Next.js 15 web application. The platform further integrates PIN-authenticated private journaling, guided breathing exercises, mood logging, and an analytics dashboard. Empirical results demonstrate 15–25 FPS real-time visual inference on CPU-only hardware and end-to-end audio analysis latency below 250 ms. NeuroCalm represents a meaningful advance toward democratising continuous, non-intrusive mental health support through multimodal affective computing.},
        keywords = {Affective Computing, Acoustic Emotion Recognition, Facial Micro-Expression Recognition, Large Language Models, Mental Health AI, Multimodal Emotion Fusion, VADER Sentiment Analysis, YOLOv11.},
        month = {April},
        }

Cite This Article

Kazi, H. S. I., & Shahaji, S. S., & Adesara, H. P. (2026). NeuroCalm: A Multimodal Affective Computing Framework for Mental Health Monitoring. International Journal of Innovative Research in Technology (IJIRT), 12(11), 1996–2001.

Related Articles