Automated Resilience Engineering: Self-Healing Cloud Infrastructure

  • Unique Paper ID: 183846
  • Volume: 12
  • Issue: 1
  • PageNo: 3284-3293
  • Abstract:
  • As cloud computing ecosystems continue to expand in complexity and scale, ensuring continuous system availability and reliability becomes a central challenge. Automated resilience engineering, particularly through self-healing cloud infrastructure, addresses this challenge by embedding intelligent, autonomous recovery mechanisms into system operations. These systems use machine learning, observability tools, and real-time orchestration to detect, diagnose, and recover from faults without human intervention. This review paper explores the evolution and current landscape of self-healing cloud architectures, with emphasis on AI-driven solutions, theoretical models, and practical implementations. By analyzing recent experimental studies, architectural frameworks, and industry applications, this paper identifies key research trends and proposes a cognitive model to enhance system resilience. The discussion culminates in recommendations for future work, particularly around explainability, ethical AI, and resilience across hybrid environments.

Copyright & License

Copyright © 2025 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{183846,
        author = {Adity Dokania},
        title = {Automated Resilience Engineering: Self-Healing Cloud Infrastructure},
        journal = {International Journal of Innovative Research in Technology},
        year = {2025},
        volume = {12},
        number = {1},
        pages = {3284-3293},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=183846},
        abstract = {As cloud computing ecosystems continue to expand in complexity and scale, ensuring continuous system availability and reliability becomes a central challenge. Automated resilience engineering, particularly through self-healing cloud infrastructure, addresses this challenge by embedding intelligent, autonomous recovery mechanisms into system operations. These systems use machine learning, observability tools, and real-time orchestration to detect, diagnose, and recover from faults without human intervention. This review paper explores the evolution and current landscape of self-healing cloud architectures, with emphasis on AI-driven solutions, theoretical models, and practical implementations. By analyzing recent experimental studies, architectural frameworks, and industry applications, this paper identifies key research trends and proposes a cognitive model to enhance system resilience. The discussion culminates in recommendations for future work, particularly around explainability, ethical AI, and resilience across hybrid environments.},
        keywords = {Cloud resilience, self-healing systems, automated recovery, machine learning, fault detection, AI in cloud computing, system availability, cognitive architecture, DevOps, AIOps.},
        month = {August},
        }

Cite This Article

  • ISSN: 2349-6002
  • Volume: 12
  • Issue: 1
  • PageNo: 3284-3293

Automated Resilience Engineering: Self-Healing Cloud Infrastructure

Related Articles