Multimodal Emotion Recognition using Transformers and Cross-modal attention

  • Unique Paper ID: 182424
  • Volume: 12
  • Issue: 2
  • PageNo: 1910-1915
  • Abstract:
  • Technologies under affective computing require multimodal emotion recognition through audio-visual information on system interfaces to detect emotions. Transformer and Cross-Modal Attention present a new architecture designed for Multimodal Emotion Recognition which facilitates semantic pattern connecting and temporal pattern retrieval between face and voice signals. The system employs RAVDESS and FER+ datasets for training and evaluation purposes to evaluate emotional states in various conditions. The system achieves long-term dependencies in each individual information stream using transformer encoders while connecting important features between audio and visual sections through cross-modal attention. Emotion classification methods require multiple set of modal representation data to be merged into unified representation packages by using fusion algorithms. Multimodal learning with attention-based emotion detection achieves superior performance than single-mode benchmarks according to the design structure. The system overcome temporal mismatches together with inconsistent attributes between different modalities using attention- guided refinement in joint optimization procedures. The real-time recognition system provides practical solutions through systematic methods that prove useful for human-machine interaction control as well as healthcare surveillance and health monitoring.

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 2
  • PageNo: 1910-1915

Multimodal Emotion Recognition using Transformers and Cross-modal attention

Related Articles