Artifact-Aware Cross-Modal Deepfake Detection Framework for Robust and Explainable Media Authentication

  • Unique Paper ID: 188143
  • PageNo: 1336-1340
  • Abstract:
  • Deepfake technologies have evolved to generate highly realistic synthetic audio–visual media, posing severe threats to public trust, digital security, and forensic verification. This work presents a unified Artifact-Aware Cross-Modal Deepfake Detection Framework that jointly analyzes visual, auditory, and semantic inconsistencies. The system learns manipulation-invariant artifact signatures, aligns audio–video temporal coherence, and incorporates adversarial robustness to resist gradient-based attacks. Through extensive evaluation on benchmark datasets—including FaceForensics++, DFDC, Celeb-DF, and ASVspoof—the model delivers state-of-the-art accuracy (98.1% on FF++) and exhibits significant cross-dataset generalization. Explainability tools such as SHAP and Grad-CAM++ provide transparent insights into the model’s decisions. The findings demonstrate a robust and interpretable detection framework suitable for real-world forensic and security applications.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{188143,
        author = {Bharath G P and Chethan H and Rakesh R Hebbar and Anoopa kumar p and Nagendra R},
        title = {Artifact-Aware Cross-Modal Deepfake Detection Framework for Robust and Explainable Media Authentication},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {7},
        pages = {1336-1340},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=188143},
        abstract = {Deepfake technologies have evolved to generate highly realistic synthetic audio–visual media, posing severe threats to public trust, digital security, and forensic verification. This work presents a unified Artifact-Aware Cross-Modal Deepfake Detection Framework that jointly analyzes visual, auditory, and semantic inconsistencies. The system learns manipulation-invariant artifact signatures, aligns audio–video temporal coherence, and incorporates adversarial robustness to resist gradient-based attacks. Through extensive evaluation on benchmark datasets—including FaceForensics++, DFDC, Celeb-DF, and ASVspoof—the model delivers state-of-the-art accuracy (98.1% on FF++) and exhibits significant cross-dataset generalization. Explainability tools such as SHAP and Grad-CAM++ provide transparent insights into the model’s decisions. The findings demonstrate a robust and interpretable detection framework suitable for real-world forensic and security applications.},
        keywords = {Deepfake Detection, Multimodal Fusion, Cross-Modal Learning, Adversarial Robustness, Explainable AI, Forensic Artifacts.},
        month = {December},
        }

Cite This Article

P, B. G., & H, C., & Hebbar, R. R., & p, A. K., & R, N. (2025). Artifact-Aware Cross-Modal Deepfake Detection Framework for Robust and Explainable Media Authentication. International Journal of Innovative Research in Technology (IJIRT), 12(7), 1336–1340.

Related Articles