Multi-Scale Small Object Detection in Satellite Images using Vision Transformers

  • Unique Paper ID: 175770
  • PageNo: 4013-4019
  • Abstract:
  • The Object-Centric Masked Image Modeling (OCMIM)-based Self-Supervised Pre-training (SSP) method has revolutionized remote sensing object detection. Traditional SSP models struggle to detect small-scale objects due to their reliance on scene-level representations. OCMIM introduces an object-centric data generator and an attention-guided mask generator to enhance object-level representation learning. The proposed work extends this model by integrating advanced pre-trained architectures such as VGG16, improving detection accuracy. By reconstructing masked object regions using attention-based techniques, the system enhances remote sensing imagery analysis. Our results show that the extended approach significantly outperforms previous methodologies in precision, recall, and overall detection performance.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{175770,
        author = {Banduku Ramesh and A N Dinesh Kumar},
        title = {Multi-Scale Small Object Detection in Satellite Images using Vision Transformers},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {11},
        number = {11},
        pages = {4013-4019},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=175770},
        abstract = {The Object-Centric Masked Image Modeling (OCMIM)-based Self-Supervised Pre-training (SSP) method has revolutionized remote sensing object detection. Traditional SSP models struggle to detect small-scale objects due to their reliance on scene-level representations. OCMIM introduces an object-centric data generator and an attention-guided mask generator to enhance object-level representation learning. The proposed work extends this model by integrating advanced pre-trained architectures such as VGG16, improving detection accuracy. By reconstructing masked object regions using attention-based techniques, the system enhances remote sensing imagery analysis. Our results show that the extended approach significantly outperforms previous methodologies in precision, recall, and overall detection performance.},
        keywords = {},
        month = {April},
        }

Cite This Article

Ramesh, B., & Kumar, A. N. D. (2025). Multi-Scale Small Object Detection in Satellite Images using Vision Transformers. International Journal of Innovative Research in Technology (IJIRT), 11(11), 4013–4019.

Related Articles