Decoding Failures in LLM Multi-Agent Systems: A New Automated Attribution Approach
Researchers introduce Automated Failure Attribution for LLM multi-agent systems, with the Who&When dataset and open-source methods to quickly pinpoint which agent caused a failure.
Debugging multi-agent systems powered by large language models (LLMs) has long been a pain point for developers. When a complex task fails after a flurry of agent interactions, pinpointing which agent caused the breakdown—and at what moment—becomes a monumental challenge. Manual log reviews are tedious and error-prone, especially as agent autonomy and information chains grow. To tackle this, researchers from Penn State University, Duke University, Google DeepMind, and other leading institutions have introduced the concept of Automated Failure Attribution, a new research direction aimed at quickly identifying the root cause of failures. They also built the first benchmark dataset, Who&When, and evaluated multiple automated methods. This work, accepted as a Spotlight at ICML 2025, promises to streamline debugging and boost system reliability. Below, we explore the key aspects in a Q&A format.
1. What is automated failure attribution and why is it needed?
Automated failure attribution refers to the process of automatically determining which agent in a multi-agent system caused a failure and at which step the failure occurred. In LLM-driven multi-agent setups, agents collaborate to solve problems, but errors from a single agent—like misunderstanding a goal or miscommunicating data—can cascade into complete task failure. Currently, developers must manually comb through extensive interaction logs, a method often described as "finding a needle in a haystack." This is time-consuming and heavily relies on deep domain expertise. Automated failure attribution aims to replace this manual labor with systematic, algorithm-based techniques, enabling faster iteration and more reliable systems.

2. Who led this research and which institutions participated?
The study is a collaborative effort co-led by Shaokun Zhang (Penn State University) and Ming Yin (Duke University). The team includes researchers from Google DeepMind, University of Washington, Meta, Nanyang Technological University, and Oregon State University. This diverse group combines expertise in artificial intelligence, multi-agent systems, and large language models to address a pressing challenge in AI development. The work was accepted as a Spotlight presentation at ICML 2025, a top-tier machine learning conference, highlighting its novelty and impact.
3. What is the Who&When benchmark dataset?
The Who&When dataset is the first benchmark specifically designed for the task of automated failure attribution in LLM multi-agent systems. It contains a collection of multi-agent interactions where failures have been manually annotated with the responsible agent and the exact step of failure. This allows researchers to train and evaluate automated attribution methods. The dataset is publicly available on Hugging Face, making it a valuable resource for the community to develop and compare new techniques. By providing a standardized testbed, Who&When accelerates progress in making multi-agent systems more transparent and easier to debug.

4. What automated attribution methods did the researchers develop?
The team proposed and evaluated several automated attribution methods. These include:
- Log-based analysis: Parsing interaction logs with natural language processing to identify anomaly patterns.
- Causal tracing: Tracking the flow of information to find where a critical error originated.
- Agent-level scoring: Assigning probabilistic blame scores to each agent based on their contributions to the failure.
These methods were tested on the Who&When dataset and compared against manual debugging. The results show that automated attribution significantly reduces the time needed to pinpoint failures while maintaining high accuracy. The code is open-source, allowing other developers to adopt and adapt these techniques.
5. How does this research change the way developers debug multi-agent systems?
Previously, debugging a failing multi-agent system required developers to read through lengthy log files and rely on intuition about which agent might have erred. With automated failure attribution, developers can run a tool that quickly outputs the likely culprit agent and the failure moment. This reduces iteration cycles from hours to minutes, enabling faster fixes and more robust systems. The approach also lowers the barrier for less experienced developers, as it lessens the need for deep system knowledge. Ultimately, this research paves the way for more reliable LLM-based collaborative AI applications in areas like autonomous software engineering, multi-robot coordination, and advanced dialogue systems.
6. Where can I access the code, dataset, and paper?
All resources are fully open-source to promote reproducibility and further research. You can find:
- The paper on arXiv at arxiv.org/pdf/2505.00212
- The code on GitHub at github.com/mingyin1/Agents_Failure_Attribution
- The Who&When dataset on Hugging Face at huggingface.co/datasets/Kevin355/Who_and_When
Feel free to explore and contribute to this emerging field!