Real-Time Multi-Agent Framework for Detecting Cascading Failures in Machine Learning Pipelines

  • Unique Paper ID: 193845
  • Volume: 12
  • Issue: 10
  • PageNo: 2849-2853
  • Abstract:
  • Modern machine learning systems are deployed as multi-stage pipelines where the output of one stage becomes the input of the next. In such interconnected systems, small failures such as poor data quality, feature corruption, model performance degradation, or service latency can propagate across stages and lead to cascading failures that affect the reliability of the entire system. Traditional monitoring techniques typically evaluate each pipeline component independently and operate in offline environments, which limits their ability to detect real-time failure propagation. This paper proposes a Real-Time Multi-Agent Framework designed to detect cascading failures in machine learning pipelines. The framework assigns intelligent monitoring agents to different pipeline stages, including data processing, model evaluation, and service execution. These agents continuously monitor system metrics and communicate with a centralized risk analysis module that evaluates cascading failure propagation across the pipeline. When the system detects abnormal propagation patterns, an alert module generates early warnings to prevent large-scale system degradation. The proposed architecture improves system observability, enhances reliability, and provides proactive monitoring capabilities for modern machine learning operations. Experimental analysis demonstrates that the framework effectively identifies early-stage anomalies and reduces the risk of system-wide failures.

Copyright & License

Copyright © 2026 Authors retain the copyright of this article. This article is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

BibTeX

@article{193845,
        author = {P.Govardhan and K.Bhanu Tej and Sk.Hasan Basha and M.jayasri},
        title = {Real-Time Multi-Agent Framework for Detecting Cascading Failures in Machine Learning Pipelines},
        journal = {International Journal of Innovative Research in Technology},
        year = {2026},
        volume = {12},
        number = {10},
        pages = {2849-2853},
        issn = {2349-6002},
        url = {https://ijirt.org/article?manuscript=193845},
        abstract = {Modern machine learning systems are deployed as multi-stage pipelines where the output of one stage becomes the input of the next. In such interconnected systems, small failures such as poor data quality, feature corruption, model performance degradation, or service latency can propagate across stages and lead to cascading failures that affect the reliability of the entire system. Traditional monitoring techniques typically evaluate each pipeline component independently and operate in offline environments, which limits their ability to detect real-time failure propagation.
This paper proposes a Real-Time Multi-Agent Framework designed to detect cascading failures in machine learning pipelines. The framework assigns intelligent monitoring agents to different pipeline stages, including data processing, model evaluation, and service execution. These agents continuously monitor system metrics and communicate with a centralized risk analysis module that evaluates cascading failure propagation across the pipeline. When the system detects abnormal propagation patterns, an alert module generates early warnings to prevent large-scale system degradation.
The proposed architecture improves system observability, enhances reliability, and provides proactive monitoring capabilities for modern machine learning operations. Experimental analysis demonstrates that the framework effectively identifies early-stage anomalies and reduces the risk of system-wide failures.},
        keywords = {},
        month = {March},
        }

Cite This Article

P.Govardhan, , & Tej, K., & Basha, S., & M.jayasri, (2026). Real-Time Multi-Agent Framework for Detecting Cascading Failures in Machine Learning Pipelines. International Journal of Innovative Research in Technology (IJIRT), 12(10), 2849–2853.

Related Articles