CrossFuse: Robust IR–Visible Fusion via Self Supervision with Top-k Alignment

  • Unique Paper ID: 182043
  • Volume: 12
  • Issue: 2
  • PageNo: 664-671
  • Abstract:
  • In multimodal image fusion, robust generalization across diverse environments remains a significant challenge—especially under label-scarce conditions and out-of-distribution (OOD) shifts. We propose CrossFuse, a novel self-supervised learning (SSL) framework for infrared (IR) and visible image fusion, combining multi-view augmentations with a Top-k Selective Vision Alignment (SVA) mechanism. CrossFuse leverages weakly aggressive augmentations to maintain modality integrity while encouraging robust feature interactions. At its core, CrossFuse introduces a cross-modal contrastive loss with Top-k mining, enabling adaptive feature selection and improved cross-sensor alignment. Through extensive experiments on challenging benchmarks such as FLIR ADAS and MFNet, CrossFuse consistently outperforms existing fusion techniques in both in-distribution and OOD scenarios. Our approach is fully label-free, enabling scalable and generalizable multimodal training. This work paves the way toward more resilient sensor fusion systems, with potential implications in autonomous navigation, remote sensing, and surveillance.

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 2
  • PageNo: 664-671

CrossFuse: Robust IR–Visible Fusion via Self Supervision with Top-k Alignment

Related Articles