Back to Blog
multi-agent contextsub-agent architecturecontext managementAI agent designpersistent context

The Missing Layer in Every Multi-Agent Architecture

Most multi-agent designs split work by role. The smarter approach splits by context boundaries. But both assume agents have no memory. Add a shared context layer and the entire calculus changes — you can design by expertise again, and sub-agents stop being a band-aid for context window limits.

KevinMar 15, 20266 min read

Akshay Pachaar wrote a sharp breakdown of Claude's multi-agent architecture on X recently. His core argument: most people split multi-agent systems by role — planner, implementer, tester — and that's almost always wrong. The better approach is to split by context boundaries.

He's right. But there's a layer missing from the analysis, and it changes everything about how you should think about agent architecture.

The Context Boundary Argument

Here's the problem with role-based agent design.

When you create a "planner" agent, an "implementer" agent, and a "tester" agent, you've created a telephone game. The implementer doesn't know what the planner decided, only what made it into the handoff. The tester doesn't know what the implementer chose, only the final output. Every boundary between agents is a place where information gets lost.

The fix, Pachaar argues, is context-centric decomposition. Instead of asking "what role does this agent play," ask "what context does this subtask actually need?" If two subtasks need deeply overlapping information, keep them in the same agent. Only split where context genuinely isolates.

A practical example: an agent that implements a feature should also write the tests for that feature. It already has the context — what it built, what edge cases it handled, what assumptions it made. Splitting the tests into a separate agent creates a handoff problem that costs more than the parallelism saves.

This is good advice. And for most teams building with agents today, it's the right starting point.

But it's built on an assumption that doesn't have to be true.

The Assumption: Context Dies at the Boundary

The entire argument for splitting by context boundaries rests on one thing: when an agent's session ends or its output gets compressed into a summary, the nuance is gone. The intermediate findings, the dead ends, the implicit decisions — all of it vanishes. The next agent downstream only gets the compressed version.

That's why the telephone game happens. That's why role-based splitting fails. That's why you're told to keep overlapping context in the same agent.

But what if context didn't die at the boundary?

What if every decision an agent made, every edge case it discovered, every constraint it identified got captured into a persistent layer that any other agent — or the same agent in a future session — could query?

The telephone game disappears. Not because you eliminated the boundaries, but because you made them permeable.

With a Shared Context Layer, You Can Design by Expertise Again

When information loss is no longer the constraint, specialization starts working.

A security-focused agent with a security-specific system prompt will catch vulnerabilities that a generalist won't — even if the generalist has the same context. A performance agent tuned for bottleneck detection finds issues a full-stack agent glosses over. That's not hypothetical. Specialization has measurable value.

The reason developer's are moving away from a nautural inclination to role-based splitting isn't that specialization is bad. It's that specialization without shared memory creates handoff failures. The information loss outweighs the expertise gain.

Fix the memory problem and the equation flips. You can split by expertise because the context layer handles the coordination that used to require co-location in the same context window.

Without shared contextWith shared context
Split by context boundaries to prevent information lossSplit by expertise because information persists
Generalist agents that do multiple jobs to keep context togetherSpecialist agents that query the shared layer for what they need
Sub-agents as a context management strategySub-agents as a parallelism strategy

Sub-Agents Without Persistence Are a Band-Aid

This is the part that doesn't get said enough.

Sub-agents — isolated Claude instances that do focused work and return compressed results — are often framed as a solution to context window pressure. Offload exploration to a sub-agent so the parent stays clean. The sub-agent does its work, returns a summary, and dies.

Sounds elegant. In practice, it's a memory leak disguised as architecture.

Everything the sub-agent explored — intermediate findings, dead ends it ruled out, nuance that didn't fit the summary — gone. The parent gets the compressed output. The context window stays clean.

But next time you need that information? You spawn another sub-agent to re-discover it. And another. You're paying tokens to re-learn things you already learned. The context window looks manageable, but your token bill and your latency tell a different story.

Sub-agents without context persistence manage pressure in the moment. They don't manage context over time. And the difference compounds every session.

With a persistent layer underneath, the dynamic changes:

  • Sub-agent findings get captured before the agent dies
  • The next sub-agent queries those findings instead of re-exploring
  • The parent retrieves structured answers instead of re-generating raw exploration
  • Context window pressure drops because retrieval is cheaper than discovery

The sub-agent becomes what it should have been all along: a compute pattern for parallelism. Not a memory pattern pretending to be architecture.

Two Things That Still Matter

A shared context layer doesn't eliminate orchestration decisions. Two constraints remain.

Write conflicts. Two agents editing the same file will make incompatible assumptions even if they can both see each other's decisions. If they're writing simultaneously, those choices collide. You still need sequencing or clear file ownership. Shared context reduces this but doesn't eliminate it — this is a coordination problem, not an information problem.

Context volume. A shared layer doesn't mean every agent should consume all of it. A security reviewer doesn't need the full UI component tree. The layer needs to be queryable — intent-aware retrieval that surfaces what's relevant, not a dump of everything. Context overload degrades performance just as badly as context loss.

When you have a shared context layer, you can design by expertise. But you still have to carefully consider the orchestration patterns — parallelism, sequencing, routing, evaluation loops. Those decisions don't go away. They just get made with better information.

The Architecture Decision Most People Skip

The standard progression looks like this:

  1. Start with a single agent
  2. Push it until it breaks
  3. Split into multiple agents by context boundaries

There's a step missing between 1 and 2:

Give the single agent persistent memory.

Most developers reach for multi-agent systems too early because their single agent hits context limits too fast. It forgets decisions from earlier in the session. It loses track of constraints. It re-discovers things it already found. So they split the work across agents to manage the load.

But if the single agent had durable memory — decisions that persist, lessons that fire when relevant, constraints that load dynamically instead of burning tokens on every prompt — it wouldn't hit those limits nearly as soon. The threshold for needing multi-agent complexity moves significantly higher.

And when you do cross that threshold, the architecture is cleaner. Each specialized agent queries the shared layer for what it needs. Findings get captured back. The next agent or the next session starts with everything that came before.

One Question Before You Architect

Before you choose sub-agents or agent teams. Before you pick an orchestration pattern. Before you split anything.

Ask: where does context go when this agent's session ends?

If the answer is "nowhere," every decision downstream will be shaped by that limitation. You'll design around context loss instead of designing for the actual problem.

If the answer is "into a persistent layer that any agent can query," your options open up. Split by expertise. Run sub-agents that build on each other's work. Switch tools mid-workflow without starting over.

The architecture question isn't "sub-agents or teams?" It's "do my agents have memory?"

Everything else follows from that.

Ready to build with persistent context?

ContextStream keeps your team decisions, code intelligence, and memory connected from first prompt to production.