Unmasking the Unreal: Multi-Modal Deepfake Video Detection

  • Unique Paper ID: 190320
  • Volume: 12
  • Issue: 8
  • PageNo: 3365-3369
  • Abstract:
  • Deepfake technology poses a significant threat to digital media authenticity and trust. This paper presents a multi-modal deep learning framework for detecting deepfake videos by integrating three complementary detection mechanisms: Convolutional Neural Networks (CNN) for spatial artifact detection, Bidirectional Long Short-Term Memory (BiLSTM) networks for temporal pattern analysis, and eye-blink frequency analysis for physiological authenticity verification. The proposed system processes video frames through a CNN architecture to identify facial inconsistencies, employs BiLSTM to capture temporal anomalies across frame sequences, and leverages OpenCV-based eye-blink detection to assess natural human behavior patterns. The final classification is performed through weighted majority voting, combining predictions from all three modalities. Experiments conducted on the CelebDF dataset demonstrate the effectiveness of the multi-modal approach, achieving improved detection accuracy compared to single-modality methods. The system utilizes 500 real and 800 fake video samples for training, with MTCNN for face detection and Haar cascades for eye detection. Results indicate that the ensemble approach provides robust deepfake detection capabilities, addressing limitations inherent in individual detection techniques

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{190320,
        author = {Mrs Nita J. Mahale and Tejas Abhay Kulkarni and Paras Vithal Yadav},
        title = {Unmasking the Unreal: Multi-Modal Deepfake Video Detection},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {8},
        pages = {3365-3369},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=190320},
        abstract = {Deepfake technology poses a significant threat to digital media authenticity and trust. This paper presents a multi-modal deep learning framework for detecting deepfake videos by integrating three complementary detection mechanisms: Convolutional Neural Networks (CNN) for spatial artifact detection, Bidirectional Long Short-Term Memory (BiLSTM) networks for temporal pattern analysis, and eye-blink frequency analysis for physiological authenticity verification. The proposed system processes video frames through a CNN architecture to identify facial inconsistencies, employs BiLSTM to capture temporal anomalies across frame sequences, and leverages OpenCV-based eye-blink detection to assess natural human behavior patterns. The final classification is performed through weighted majority voting, combining predictions from all three modalities. Experiments conducted on the CelebDF dataset demonstrate the effectiveness of the multi-modal approach, achieving improved detection accuracy compared to single-modality methods. The system utilizes 500 real and 800 fake video samples for training, with MTCNN for face detection and Haar cascades for eye detection. Results indicate that the ensemble approach provides robust deepfake detection capabilities, addressing limitations inherent in individual detection techniques},
        keywords = {Deepfake Detection, Convolutional Neural Networks, BiLSTM, Eye-Blink Analysis, Multi-Modal Learning, Video Forensics.},
        month = {January},
        }

Cite This Article

Mahale, M. N. J., & Kulkarni, T. A., & Yadav, P. V. (2026). Unmasking the Unreal: Multi-Modal Deepfake Video Detection. International Journal of Innovative Research in Technology (IJIRT), 12(8), 3365–3369.

Related Articles