A multi-agent system is a setup where several AI agents, each running on a large language model, tackle different parts of a task at the same time and hand their findings to a coordinating agent that pulls the answer together. Anthropic's own research system used this pattern to beat a single agent by 90.2% on its internal evaluation, but it burned roughly 15 times the tokens of an ordinary chat. So it is powerful, and it is not cheap.
What is a multi-agent system in simple terms?
Picture a lead researcher who reads your question, breaks it into three or four strands, and hands each strand to a specialist. Each specialist works alone, then reports back. The lead stitches the reports into one answer. That is a multi-agent system: one orchestrator, several worker agents, one shared goal.
The key word is parallel. A single agent works through a problem step by step, one thread at a time. A multi-agent system runs several threads at once, each with its own memory and its own tools. For a broad question with many independent parts, that saves wall-clock time and covers more ground.
In Anthropic's engineering write-up from June 2025, the lead agent typically spins up 3 to 5 subagents in parallel for a complex query. Each subagent gets a self-contained brief, its own fresh context window, and no knowledge that the others exist. They cannot chat mid-task. They just report.
How does a multi-agent system actually work?
Most production systems use what engineers call an orchestrator-worker pattern. One agent leads; the rest do the legwork. The flow is consistent across the major frameworks, even where the plumbing differs.
The coordinator plans
The lead agent reads the request, decides how many workers it needs, and writes a brief for each one. Simple fact-finding might need a single worker making a handful of tool calls. A sprawling research question might justify four or five workers running at once.
Workers run in parallel
Each worker operates in isolation with its own context window and its own set of tools, such as web search or a database query. Because they run at the same time rather than in sequence, the system covers far more ground per minute than a single agent could.
The coordinator synthesises
When the workers report back, the lead agent combines their findings, resolves any contradictions, and runs a separate pass to attach citations. This division of labour is why the pattern scales: no single context window has to hold everything at once.
What is the difference between a single-agent and a multi-agent system?
The honest answer is that most tasks do not need more than one agent. A single agent is cheaper, easier to debug, and behaves predictably. Multi-agent systems earn their keep on breadth: open-ended research, tasks with many independent sub-questions, work that benefits from several viewpoints at once. The table below sets the two side by side.
Dimension | Single-agent system | Multi-agent system |
|---|---|---|
Structure | One agent, one context window, works sequentially | Coordinator plus 3 to 5 workers, each with its own context, running in parallel |
Best for | Focused tasks, clear single objective, step-by-step reasoning | Broad research, many independent sub-questions, parallel exploration |
Token cost (per Anthropic) | About 4x an ordinary chat | About 15x an ordinary chat |
Approx. cost per complex query (Opus 4.8 pricing) | Around £0.39 | Around £1.48 |
Debugging | Straightforward, one trajectory to follow | Harder, several parallel trajectories and non-deterministic behaviour |
Failure mode | Gets stuck or loops on one thread | Duplicated work, contradictory findings, coordinator overload |
The cost figures above are a Tom & Co estimate applying Anthropic's published token multipliers to Claude Opus 4.8 list pricing of $5 per million input tokens and $25 per million output tokens (July 2026, converted at $1.27 to the pound). Treat them as directional, not a quote.
How much do multi-agent systems cost to run?
This is where UK leaders should slow down. Anthropic found that token usage alone explained about 80% of the variance in how its research system performed. More agents and more tool calls meant better answers, but the bill scaled with them.
A multi-agent research run used about 15 times the tokens of an ordinary chat. On Claude Opus 4.8 pricing, a single complex query works out at roughly £1.48, against about 10p for the same question handled as a plain chat turn (Tom & Co estimate, July 2026).
That gap is the whole decision. If a task genuinely benefits from parallel breadth, and the output value clears the cost, a multi-agent system pays off. If you are reaching for it to answer a question a single agent could handle, you are paying a 15x premium for nothing. Start with one agent. Add more only when a single one demonstrably falls short.
Where do UK businesses actually use multi-agent systems in 2026?
Adoption is early. The British Chambers of Commerce "Powering Productivity" report, published in March 2026 with Atos and the University of Essex, found agentic AI was the least-adopted category of AI at just 7% of firms using it. Text generation, by contrast, sat at 85%.
So most UK firms are still on generative basics, not orchestrated agents. The same BCC research put overall AI use at 54% of firms, up from 35% in 2025 and 23% in 2023. Adoption of AI is racing ahead. Multi-agent orchestration is the part still finding its feet.
The realistic 2026 use cases are narrow and high-value. Deep research and competitive analysis, where a coordinator fans out across many sources. Complex document review across large portfolios. Coding tasks where one agent plans and others implement in parallel. These are exactly the jobs where breadth beats a single thread.
What are the main risks of multi-agent systems?
The failure modes are different from a single agent, and worth knowing before you build. The biggest is simple: cost that scales faster than value. Gartner predicts over 40% of agentic AI projects will be scrapped by the end of 2027, often because costs outran the business case.
Coordination is the other hard part. Workers can duplicate each other's effort, return contradictory findings, or hand the coordinator more than it can reconcile. Because the agents run in parallel and non-deterministically, the same input can produce different behaviour on different runs, which makes debugging genuinely harder than with a single agent.
There is a governance angle too. Under the ICO's guidance on AI and data protection, any system processing personal data at scale needs a Data Protection Impact Assessment. Several agents each touching data at once makes it harder to trace exactly what was accessed and why, so your audit trail needs to be built in from the start, not bolted on.
Should a UK business build a multi-agent system?
Use this rule of thumb. Reach for multiple agents only when the task splits cleanly into independent parts that can run at once, and when the answer is valuable enough to justify roughly 15 times the token cost of a single query. Research, wide-ranging analysis, and large-scale document work fit. Most day-to-day tasks do not.
The practical path is to start with a single well-scoped agent. Measure what it produces and where it falls short. Only add a second and third agent once you can point to a specific limit a single agent cannot clear. That keeps the cost honest and the system debuggable, and it is the sequence the frontier labs themselves followed.



