Facial and Speech Emotion Detection for Stress Monitoring

  • Unique Paper ID: 177475
  • PageNo: 756-759
  • Abstract:
  • Stress is now a common problem in modern life, frequently going unnoticed until it has an impact on one's physical and mental health. This study offers a multimodal stress detection framework that uses facial emotion analysis and speech recognition to more accurately predict stress levels. A refined DistilBERT model is used to analyse speech input into text before classifying emotions and mapping them to appropriate stress levels. In parallel, a Convolutional Neural Network (CNN) trained on the FER dataset is used to identify emotional states and MTCNN is used for face alignment in order to detect facial expressions. In both modalities, the suggested method shows good classification performance and has the potential to be included into real-time stress monitoring applications.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{177475,
        author = {Mahesh Veeraiah R and Jansi Rani S and Arul Moneesh E and Dharaneesh R},
        title = {Facial and Speech Emotion Detection for Stress Monitoring},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {12},
        pages = {756-759},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=177475},
        abstract = {Stress is now a common problem in modern life, frequently going unnoticed until it has an impact on one's physical and mental health. This study offers a multimodal stress detection framework that uses facial emotion analysis and speech recognition to more accurately predict stress levels. A refined DistilBERT model is used to analyse speech input into text before classifying emotions and mapping them to appropriate stress levels. In parallel, a Convolutional Neural Network (CNN) trained on the FER dataset is used to identify emotional states and MTCNN is used for face alignment in order to detect facial expressions. In both modalities, the suggested method shows good classification performance and has the potential to be included into real-time stress monitoring applications.},
        keywords = {BERT, CNN, Emotion Classification, Facial Expression Recognition, FER Dataset, MTCNN, Multimodal Analysis, Speech Emotion Recognition, Stress Detection, Transformer Models.},
        month = {May},
        }

Cite This Article

R, M. V., & S, J. R., & E, A. M., & R, D. (2025). Facial and Speech Emotion Detection for Stress Monitoring. International Journal of Innovative Research in Technology (IJIRT), 11(12), 756–759.

Related Articles