Back to Blog
AGENTS.mdCLAUDE.mdcontext persistencecontext engineeringAI coding context

A New Study Says Delete Your CLAUDE.md / AGENTS.md. Here's What Actually Works Instead

A recent paper just measured what CLAUDE.md and AGENTS.md files actually do to coding agents. The results are bad. Developer-written context files improved task success by about 4% on average, while LLM-generated context files *decreased* success by roughly 3% and both increased cost and latency by over 20%. And this isn't a fringe result it matches what most developers already suspect.

KevinFeb 24, 20267 min read

A recent paper just measured what CLAUDE.md and AGENTS.md files actually do to coding agents. The results are bad. Developer-written context files improved task success by about 4% on average, while LLM-generated context files decreased success by roughly 3% and both increased cost and latency by over 20%. Theo T3.GG's video "Delete your CLAUDE.md (and your AGENT.md too)" walks through the findings live: a task with a fresh CLAUDE.md takes ~1:29 versus ~1:11 without it. And this isn't a fringe result it matches what most developers already suspect.

Static "brain dump" context files don't scale.

Now, here's the part where you should know who's writing this.

We built ContextStream a persistent, queryable memory layer for AI coding tools so we're not neutral on this topic. But the study's findings aren't ours, and neither is Theo's conclusion. We're going to walk through why static context files fail, what the alternatives are, and how a dynamic architecture addresses the specific pathologies the research identified. You can read the paper and watch the video yourself and form your own opinion.

This is an honest breakdown of the problem and one approach to solving it.


Why your context files are making agents worse

The study and Theo's video surface five specific failure modes. If you've ever written a long CLAUDE.md and wondered why your agent still does weird things, you'll recognize these.

Your instructions go stale and nobody notices. Your repo architecture changes. Your context file doesn't. The agent places files in wrong locations, follows dead conventions, and uses outdated patterns. Why? Because you told it to.

Every token in that file costs you on every single call. The entire CLAUDE.md / [AGENTS.md sits in the high-priority "developer message" slot whether or not it's relevant to the current task. That's why the study found a 20%+ cost increase. You're paying for context the agent doesn't need.

You're priming the model toward whatever you mention most. If your context file says "TRPC" prominently but you've mostly moved to Convex, the agent keeps reaching for TRPC. You created a bias by writing it down.

Debugging becomes nearly impossible. When behavior goes wrong, you now have to untangle: provider instructions, system prompt, dev message (your massive context file), user prompt, code, tools, and history. Good luck figuring out which layer caused the problem.

You're optimizing the wrong thing. Devs spend hours hand-crafting complex rule files instead of improving tests, architecture, and feedback loops the structural improvements that actually give agents more leverage.

The net result: tiny or no gains in success, noticeable cost and latency increases, and increasingly brittle behavior as your project evolves.


The alternatives that might be enough for you

Before we get into ContextStream, here's an honest look at what else works.

Keep your context file under 20 lines. The simplest fix. Focus only on recurring mistakes and sharp edges. For smaller or stable codebases, this is the right answer and it costs you nothing.

Trust the model to explore. This is Theo's core recommendation. Modern agents are genuinely good at reading repos, running tests, and inferring structure. If your architecture is clean and your tests are solid, you may not need external context at all.

Use your editor's built-in context features. Cursor, Windsurf, and other AI-native editors already have file indexing, @ references, and project-wide search. Low friction, improving fast.

Invest in tests, types, and architecture. Theo makes this point well, and it's the right first investment regardless of what else you use. Better tests and clearer module boundaries give agents more leverage than any context file or memory system.

Each of these is valid. Where they break down is when you need persistent, cross-session memory that scales across contributors and projects. That's a different class of problem.


What changes when context is dynamic instead of static

ContextStream replaces the static monolith with a persistent, queryable memory layer that agents pull from on demand. Here's what that means for you in practice.

You stop paying for context you don't need

With AGENTS.md, the entire file is traditionally loaded into every prompt whether the agent needed it or not. While newer models and harnesses have found ways to optimize this, many tools still exhibit this behavior.

With a dynamic system, your decisions, code understanding, and lessons learned live in an external store. Only the relevant slices get injected into a given call. Everything else stays out of the prompt. You get closer to Theo's ideal minimal context for the task while still having effectively infinite project memory available when an agent needs it.

You ask questions instead of writing essays

Traditional context files try to pre-explain everything: architecture, commands, patterns, data models. You're guessing what the agent will need and writing it all down in advance.

Semantic retrieval flips this. Your codebase, docs, and prior conversations are indexed with embeddings. An agent can ask "what's the video processing flow?" or "what did we decide about auth?" and get back only the specific functions, docs, or decisions that matter. That directly matches the paper's recommendation: include only minimal, task-relevant requirements rather than exhaustive context.

Your decisions survive across sessions and teammates

This is where a text file fundamentally can't keep up. AGENTS.md encodes rules as unstructured text with no notion of how decisions relate to specific endpoints, tables, or modules.

A knowledge graph links decisions, code, and dependencies. Queries like "if we change this schema, what breaks?" or "where do we validate org membership?" return structured, code-linked answers. When someone on your team makes a decision in one session, it's immediately available to every agent and every teammate in the next session without anyone manually updating a file.

You never run /init again

Theo shows how /init-generated context files mostly restate information the agent can already discover, then tax every future request with that redundancy.

Auto-session init precomputes the current relevant context for each conversation recent decisions, touched files, impacted modules and feeds just that slice to your agent. No manual /init. No long-lived rule files going stale.


The honest tradeoffs

A text file is simpler than a networked service. That's true in the same way a .txt file is simpler than a word processor (this is an analogy, not a direct equivalence). A .txt file opens instantly, works offline, requires no installation, and costs nothing. None of that is evidence it solves the formatting and collaboration problem better than a word processor.

The honest question isn't "is AGENTS.md simpler?" It obviously is. The question is whether simplicity is what you need, or whether your team is losing decisions, repeating mistakes, and fighting context drift across sessions and contributors.

ContextStream requires account creation, MCP server installation, and editor configuration. For teams, each member needs to set up and authenticate. That's a genuine cost. But it works with every MCP-compatible tool you're already using Claude Code, Cursor, Windsurf, Codex CLI, Cline, and others. It indexes your codebase on setup, so there's no cold-start waiting period. And for teams, the value compounds: shared decisions, cross-session memory, and synchronized context across contributors mean the investment pays back as the team and project grow.


How to wire this into your agent stack

If the tradeoffs make sense for your project, here's the practical setup:

  1. Strip your AGENTS.md / CLAUDE.md down to almost nothing. Keep only high-level guardrails and a pointer to ContextStream for richer context.
  2. Install ContextStream as an MCP tool that supports semantic search over code and docs, "what did we decide about X?" queries, dependency and impact analysis, and "lessons learned" retrieval. This syncs across your whole team.
  3. In your agent's planning loop, have it ask ContextStream for targeted context (e.g., "video pipeline implementation," "auth rules for org admins"), use that plus the current prompt to act, and optionally write back "this change broke Y, we fixed it by Z" as a new memory node.
  4. Keep investing in tests, types, and architecture as Theo suggests. ContextStream keeps those decisions and patterns sticky across tools and sessions without stuffing them into every prompt.

You get what people hoped CLAUDE.md would deliver project-aware, history-aware agents that remember decisions without the static, stale files that the research shows are empirically ineffective.


The bottom line

If your project is small and stable, keep a short AGENTS.md and move on. That's the right tool for that job.

But if you're on a team, shipping fast, accumulating decisions that matter, and tired of watching agents forget everything between sessions the problem isn't your context file. The problem is that a context file was never the right abstraction for persistent, shared memory.

The study proved the ceiling is low. Theo showed it live. The question is whether you keep bumping into that ceiling or build on something designed to scale past it.

Try ContextStream free


Sources:

Ready to build with persistent context?

ContextStream keeps your team decisions, code intelligence, and memory connected from first prompt to production.