Production AI Agent Architecture: Lessons from Claude Code | Artinoid

A bug in Claude Code's autocompaction system was silently retrying failed compression calls on an infinite loop. By the time anyone noticed, it had burned through roughly 250,000 API calls per day — globally, across every user — until an engineer patched it with three lines of code.

We only know this because on March 31, 2026, Anthropic accidentally shipped version 2.1.88 of @anthropic-ai/claude-code with a 59.8 MB source map file attached to their npm package. A missing .npmignore entry. That's all it took to expose 512,000 lines of unobfuscated TypeScript across 1,900 files — the most detailed blueprint of a production AI agent ever made public. Anthropic confirmed the incident immediately: "This was a release packaging issue caused by human error, not a security breach."

What followed was predictable: GitHub mirrors, DMCA takedowns, a community Rust rewrite called Claw Code that hit 50,000 GitHub stars in under two hours (reportedly the fastest any repository had ever reached that milestone). Thousands of developers spent the next 48 hours picking apart what Anthropic had spent years building. What matters to anyone building agents isn't the drama or the leak's legal ambiguity. It's the engineering that was exposed.

Before March 31, 2026, building a reliable production AI agent required months of reverse-engineering patterns that the major labs had figured out and weren't sharing. That constraint is gone. And if you build agentic systems for a living — or you're responsible for a team that does — the architecture inside those 512,000 lines is the most valuable free education the industry has ever accidentally received.

This article is not a summary of the leak. It's a builder's guide, using Claude Code as the primary reference for what production AI agent architecture actually looks like when every component is taken seriously.

What Claude Code Actually Is — And Why the Numbers Should Surprise You

Most people's mental model of Claude Code is a chat interface that can also run terminal commands. That's wrong in a way that matters.

Claude Code is what Anthropic's engineering team calls an "agentic harness" — the orchestration layer that wraps the language model and gives it the ability to use tools, manage files, execute bash commands, coordinate sub-agents, and maintain coherent state across long, multi-step sessions. The Claude model is one component. The harness is the product. When you're using Claude Code, you're interacting with roughly 1,900 TypeScript files that exist specifically to manage what the model sees, what it's allowed to do, and what happens when things break.

The numbers in the leaked source are the first thing that reframes your expectations. QueryEngine.ts — the single module that handles all LLM API calls, streaming, caching, token counting, cost tracking, prompt caching, and retry logic — is 46,000 lines. The base tool definition is 29,000 lines. That's not bloat. That's what rigorous schema validation, permission enforcement, and production-grade error handling actually look like at scale.

By early 2026, Claude Code had reached an estimated $2.5 billion annualized revenue run-rate, according to getpanto.ai's Claude AI statistics report — from a general availability launch in May 2025. For context: Anthropic's overall annualized revenue jumped from roughly $1 billion at the start of 2025 to $9 billion by year-end. Claude Code drove a significant portion of that growth. That run-rate isn't a measure of how good Claude's underlying model is. It's a measure of how much the harness matters commercially.

The Architecture, Explained Simply

A production AI agent is an autonomous software system that combines a large language model with a permission-gated tool registry, multi-layer context management, and a structured orchestration loop — enabling it to plan, execute, verify, and iterate across complex tasks without constant human input. It works by running the model in a loop over tool calls: the model generates a response, optionally invokes tools, receives results, then generates the next step. Unlike a chatbot that processes one prompt and returns one answer, a production agent manages state across dozens or hundreds of tool calls in a single session, often spawning sub-agents to handle work in parallel.

Claude Code organizes around three layers: an orchestration loop that manages planning and execution, a permission-gated tool system, and a multi-layer memory architecture. Each layer is doing things that most open-source agent frameworks either skip or treat as an afterthought.

The Orchestration Loop. At its core, the Claude Code execution loop is a while loop over tool calls. The model generates a response. If it contains tool calls, those execute. Results come back. The model generates the next response. This sounds almost insultingly simple — and it is. The sophistication isn't in the loop design. It's in what the loop is protected by: context management, permission gates, retry logic, circuit breakers, and sub-agent coordination. Getting the loop simple and stable first, then layering protection around it, is the right order of operations. Most teams do it backwards.

The Tool System. Every capability in Claude Code is a discrete, self-contained tool module. BashTool, FileReadTool, GrepTool, WebFetchTool, and roughly 40 others — each defines its own input schema, permission level, and execution logic independently. There's no shared mutable state between tools. The model decides what to attempt. The tool system decides what is permitted. Those two things are architecturally separate, and that separation is what makes it safe to give an AI agent shell execution. AgentTool is the clever one: it lets the system spawn sub-agents as just another tool call, with no special orchestration layer required. Sub-agents are first-class citizens of the same tool registry.

The Memory Architecture. Claude Code implements what the analysis community identified as a three-tier memory system. Short-term conversation state lives in the active context window. Mid-term session state is managed through CLAUDE.md files — markdown documents that persist architectural constraints, project rules, and "things to never do" across sessions. Long-term memory is handled through the autocompact system and structured summaries. The system actively distrusts its own memory: Claude Code only commits updated memory after verified success. If a file write fails or a test breaks, nothing gets stored. This prevents the agent from learning incorrect patterns from its own failures.

Lesson 1: Tool Design Is Your Agent's Character

Claude Code's tool system contains roughly 40 discrete capabilities. File reads, bash execution, web fetches, LSP integration, sub-agent spawning — each permission-gated independently. The architecture has one central design rule: never create a general-purpose tool when a specific one will do.

The Bash tool's own description explicitly warns against using it for find, grep, cat, head, tail, sed, awk, or echo commands. Not because those commands are dangerous, but because dedicated tools for each operation produce structured, auditable logs. A Grep call with typed parameters is trivially observable. A bash -c "grep -rn 'pattern' ." call requires parsing. Each dedicated tool has its own permission level and validation logic. The Edit tool requires a prior Read — preventing blind overwrites. When something breaks in a session with well-defined tools, you know exactly which tool call caused it.

Every tool carries explicit permission requirements checked before execution. The approval sequence runs in a strict order: trust establishment at project load time, a permission check before each individual tool runs, and explicit user confirmation for high-risk operations like file writes and bash commands. That order is not arbitrary — more on why it matters in the security section below.

For teams building custom agents, the practical translation is this: define your tools as narrowly as possible, document their constraints inside the tool definition where the model will read them at call time, and classify every tool by risk level before you build anything else. The tool registry is your agent's skeleton. Everything else depends on it being right.

When we built the AI Sales Intelligence Platform — which coordinates multiple specialized agents across field data, CRM sync, and territory analysis — the first architectural decision was defining exactly which capabilities each sub-agent could invoke. Not because we were following Claude Code's design, but because every time we gave an agent a broad tool, debugging became exponentially harder. The same lesson, arrived at independently.

Lesson 2: Context Management Is the Actual Engineering

Here's the problem every developer hits. You build an agent that works beautifully on the first 10 tool calls. By call 40, it's confused, repeating work it already did, or hallucinating based on stale outputs from three steps ago. The instinct is to blame the model. The correct diagnosis is almost always context rot.

Anthropic's engineering team calls this "context entropy" — the degradation of effective reasoning as the context window fills with accumulated tool outputs, conversation history, and intermediate states that are no longer relevant. Most agent demos work fine in notebooks because they never run long enough to hit this problem. Production agents always do.

Claude Code uses three distinct compression strategies, each triggered at a different point in a session's lifecycle. MicroCompact edits cached content locally with zero API calls — old tool outputs get trimmed directly in the conversation state. Fast, cheap, transparent to the user. AutoCompact fires when the conversation approaches the context window ceiling, reserving a 13,000-token buffer, then generating up to a 20,000-token structured summary of the session. There's a built-in circuit breaker — after three consecutive compression failures, it stops retrying. No infinite loops. That three-retry limit is the exact fix that ended the 250,000-API-calls-per-day waste we opened with. Full Compact compresses the entire conversation, then re-injects recently accessed files (capped at 5,000 tokens per file), active plans, and relevant skill schemas. Post-compression, the working token budget resets to 50,000 tokens.

What's notable about this design is the layering. Each strategy handles a different failure mode at a different cost. MicroCompact is free. AutoCompact is cheap. Full Compact is expensive and used sparingly. The system spends as little as possible to keep the context healthy, escalating only when necessary. This is context engineering done at infrastructure level — not prompt-level cleanup, but architectural management of what the model sees on every inference call.

The practical lessons for your own agent: build compression logic before you think you need it. Retrofitting context management into a running production system is painful. Instrument your context window from day one — track token usage per component, measure which context elements correlate with successful task completion, and know your compression triggers before your sessions grow long enough to hit them.

Philipp Schmid, Technical Lead at Google DeepMind, put the broader principle bluntly: most agent failures are not model failures anymore — they are context failures. The Claude Code source validates that framing structurally. The codebase isn't dense because of complex model interactions. It's dense because of everything that protects the model from itself.

Lesson 3: Multi-Agent Orchestration Is Prompts, Not Code

This is the finding that surprised practitioners most when they dug into the Claude Code source.

coordinatorMode.ts — the module that handles multi-agent coordination — is implemented entirely as system prompt instructions. Not as code-level orchestration logic, not as a separate scheduling service, not as a message bus. The orchestrator receives a prompt describing how to delegate work, what to aggregate from sub-agents, and how to synthesize results. Sub-agents aren't special processes. They're Claude instances with different system prompts.

One directive embedded in the coordinator's prompt stands out: "Never write 'based on your findings' — these phrases delegate understanding to workers instead of doing it yourself." A quality gate in a text file. That instruction is what keeps the orchestrator from becoming a passive relay, rubber-stamping whatever sub-agents return without actually synthesizing the results. Because it's prompt-driven, Anthropic can update orchestration behavior without a redeploy. The tradeoff is that it's harder to test formally — but the iteration speed is real.

The sub-agent execution model also reveals something worth noting. When Claude Code spawns a sub-agent through AgentTool, each sub-agent gets a fresh context window. It can explore extensively — using tens of thousands of tokens — and then returns a condensed, distilled summary of its findings to the main orchestrator. This is deliberate: sub-agents compress their own work before handing it back, preventing the orchestrator's context from filling with raw detail it doesn't need. The isolation matters too. Sub-agents work in their own space, preventing a failed or confused sub-agent from corrupting the orchestrator's reasoning.

For teams building multi-agent systems: start with prompts. Define your orchestrator's behavior in a well-structured system prompt before you write any coordination code. Add code-level orchestration only when you hit a ceiling that prompts genuinely can't solve — conditional branching with external state, durable execution across failures, typed state machines. In our work on the Medical Claims AI platform, coordinating agents across eligibility verification, clinical review, and billing validation, the most reliable orchestration logic started as explicit prompt instructions defining handoff criteria. We added code-level coordination only where regulatory audit requirements demanded provable state traces.

Lesson 4: Use Cheap Models for Cheap Decisions

Not every decision in an agentic system needs your most capable model. Claude Code figured this out and baked it into the architecture.

The safety classifier — yoloClassifier.ts, which runs a permission check before every tool call to decide whether that call requires explicit user approval — uses Claude Haiku. Not Sonnet, not Opus. Haiku. The reasoning is straightforward: classifying whether a bash command is high-risk or routine is not a hard reasoning problem. It's a classification problem. Running it through a smaller, faster, cheaper model means the permission gate adds minimal latency and negligible cost. ULTRAPLAN, the deep-planning mode that works through complex multi-step tasks, uses Opus 4.6 running remotely for up to 30 minutes with a browser-based cost approval workflow before it starts. Standard session work runs on Sonnet 4.6.

The practical translation: route aggressively based on task complexity. Routing, classification, summarization, and safety checks don't need your frontier model. Deep reasoning, novel problem-solving, and multi-step planning do. According to production practitioners analyzing the Claude Code patterns, roughly 80% of an agent's calls don't need the most expensive model — they just need a capable one. The cost difference between Haiku and Opus on that 80% is significant at scale.

Build model routing into your architecture before you're in production, not after. The refactor cost of adding routing to a system that assumes one model throughout is painful. One flat model architecture will either massively overspend on simple calls or underperform on hard ones.

Lesson 5: Security Must Live in the Tool, Not the Config File

The failure pattern we see most consistently in production agent builds follows the same script. A team implements a global permissions config — an allow-list, a guardrails file, a content filter sitting upstream — and at some point the agent does something it shouldn't. A destructive file operation. A force-push. Something subtler that only shows up in production.

Claude Code's approach to this is architecturally different and the leaked source makes the reason explicit.

Safety constraints in Claude Code are embedded directly inside tool descriptions, exactly where the model encounters them at call time. The git safety protocol — instructions to never run push --force, reset --hard, checkout ., restore ., or clean -f — lives inside the Bash tool's own description. This is not documentation style. It's an architectural decision. A language model attends to instructions more reliably when those instructions appear in the immediate context of the action they govern. A constraint buried in a system prompt section written 10,000 tokens ago is easier for the model to effectively ignore during a long session. The constraint next to the tool it governs is always present.

The trust sequencing model runs in a strict three-stage order: trust establishment at project load, a permission check before each individual tool runs, and explicit user confirmation for high-risk operations. CVE-2025-59828, patched in 2025, was a vulnerability where Yarn-related code could execute before directory trust had been established — exactly the kind of pre-trust initialization issue this sequencing is designed to prevent. The source makes clear that trust sequencing is treated as a correctness concern, not a UX concern.

The bashSecurity.ts module is 2,592 lines with 23 numbered security checks. The depth is remarkable — and every numbered check implies a real incident behind it. Zsh-specific defenses appear throughout, which most security tooling misses because it targets Bash. Claude Code runs in Zsh on macOS (default since Catalina), and Anthropic apparently discovered attack vectors unique to Zsh's expansion semantics. The =cmd expansion, for instance, is a Zsh feature that replaces =curl with the full path to curl — a substitution that can bypass naive command blocklists. 23 checks. 23 incidents.

The Interesting Facts — What the Leak Revealed That Nobody Expected

The architecture is the important part. But the unreleased features and internal details that surfaced in the source are worth their own section, because they reveal where the entire category of AI agents is heading — and they reveal it from the world's most commercially validated agentic system.

KAIROS is the most consequential finding. Referenced over 150 times in the source, it's an autonomous daemon mode that transforms Claude Code from a request-response tool into a persistent background process. KAIROS maintains append-only daily log files, receives periodic <tick> prompts that let it decide whether to act proactively or stay quiet, and enforces a 15-second blocking budget so its actions never interrupt the developer's workflow for longer than a brief pause. Think of it as a background service that watches, plans, and occasionally acts — without being explicitly asked. It's fully built. It's sitting behind a feature flag.

autoDream runs as a forked sub-agent companion to KAIROS, in the services/autoDream/ directory. During idle time, it consolidates what the agent observed during active sessions, removes contradictions from memory, and converts raw observations into structured long-term facts. Nightly memory distillation — the system reviews its own day and writes coherent notes to itself before the next session begins. No major open-source agent framework has shipped anything comparable.

Buddy is something else entirely. It's a Tamagotchi-style ASCII virtual pet with 18 possible species, rarity tiers, and dynamic stats including "Debugging" and "Snark." Internal source comments — unverified, from unannounced plans — suggested a teaser window of April 1–7 and a full launch target in May 2026. Anthropic has confirmed nothing publicly. Whether it ever ships as described is genuinely unclear. What is clear is that a meaningful amount of engineering went into it, including deterministic species generation so users could share their buddy identity. It's a small feature. It's also a signal that someone inside Anthropic was thinking seriously about what "working alongside Claude" feels like over time.

Undercover Mode is undercover.ts — a module that strips Anthropic-internal codenames and references when Claude Code operates in public or open-source repositories. The commit messages, PR titles, and PR bodies are cleaned of internal terminology. The source's exact directive: "You are operating UNDERCOVER in a PUBLIC/OPEN-SOURCE repository. Do not blow your cover." The purpose is preventing internal project names and codenames from leaking into public commit histories. Reasonable. But worth knowing if you're building on top of Claude Code and wondering why certain internal references never appear in your git log.

Anti-distillation is the one that generated the most genuine concern among AI researchers. The ANTI_DISTILLATION_CC flag controls a mechanism that injects fake tool definitions into API prompts — decoy tools that don't exist, included specifically to poison training data harvested by competitors scraping API traffic to build competing models. Dario Amodei has publicly spoken about the threat of foreign distillation of American AI models. The leaked source shows Anthropic addressing that threat at the infrastructure level, not just rhetorically. It's bypassable with relatively straightforward techniques, as security researchers noted, but the intent and implementation are clearly deliberate.

The internal codename for Claude Code is Tengu. Every telemetry event in the codebase carries a tengu_ prefix. The feature flags discovered include codenames like tengu_amber_flint, tengu_cobalt_frost, and tengu_miraculo_the_bard — internal experiment names that reveal A/B test structure and product strategy without explaining it. The unreleased model codenames found in the source — Capybara (a Claude 4.6 variant), Fennec, and Numbat — suggest a next-generation model lineup already in development, with Capybara's internal benchmarks showing it making false or exaggerated claims roughly 30% of the time versus 16.7% in an earlier version, suggesting a tradeoff toward more assertive (and less cautious) responses.

This was also Anthropic's second source map leak. An almost identical packaging error occurred with an earlier Claude Code version in February 2025 — confirmed by The Register. That one was patched. The March 2026 recurrence is the more interesting signal for engineering teams than the leak content itself. The same class of error, 13 months apart, at a company whose entire brand is predicated on careful, methodical safety research.

The Future — Where Claude Code and AI Agents Are Going

The unreleased features point in a direction that's clearer now than any product announcement could have made it.

KAIROS, Daemon Mode, and Bridge (a 31-file implementation in src/bridge/ enabling remote control of Claude Code from a phone or browser via WebSocket permission sync and JWT authentication) form an architectural trio. Together they describe an agent that starts on your desktop, persists when you close the terminal, receives async work while you're away, and can be supervised and approved from your phone on your commute. Combined with Claude Dispatch — the iOS/Android app that already exists for remote control — the picture is of an always-on, ambient coding agent operating on your codebase continuously rather than in discrete sessions you initiate.

ULTRAPLAN, the 30-minute remote planning mode running on an Opus-class model in a cloud container, points toward a bifurcation: fast local execution for routine work, expensive remote compute for genuinely hard planning problems. The cost/time approval workflow built into ULTRAPLAN suggests Anthropic already knows this feature will generate sticker shock — so they built explicit human checkpoints into the cost model.

The gap between what Anthropic has built behind feature flags and what the open-source ecosystem currently offers is widest in the background autonomy category. According to The New Stack's analysis of the leaked source, none of the major open-source agent frameworks — CrewAI, LangGraph, Google ADK, AWS Strands — has shipped a comparable background autonomy feature. The closest equivalent is Nous Research's Hermes Agent with persistent multi-agent profiles, but without KAIROS's proactive observation and consolidation loop.

The shift the roadmap describes is from "you use the tool" to "the tool acts on its own." That's the transition the category is moving toward. Claude Code's leaked source shows what it looks like when an organization with real resources decides to build for that future seriously.

How to Start Building Your Own Production Agent Today

Understanding the architecture intellectually is different from knowing where to start. Here's a concrete sequence based on what the Claude Code source actually teaches.

Define your tool registry before you write any agent logic. List every capability your agent needs. For each one, define the input schema, classify it as LOW, MEDIUM, or HIGH risk, and write the constraints directly in the tool description. Not in a config file, not in the system prompt header — in the tool definition itself. This takes a few hours upfront and saves you weeks of debugging downstream. If you can't define the tool's schema precisely, you don't understand the tool's scope well enough to give it to an agent.

Build your context budget before you need it. Before your first production session, decide: what is your working token budget? What triggers a compression pass? What gets preserved across compression (active plans, recently modified files) and what gets trimmed? Write the compression logic before sessions grow long enough to require it. Even a simple two-stage system — trim old tool outputs above a token threshold, generate a structured summary when the total context exceeds 70% of the window — is dramatically better than no system. Instrument every call to track what's in the window and why.

Start your orchestration in prompts. Write your coordinator's system prompt as if it's a job description for a senior engineer who manages junior engineers. Define delegation criteria, aggregation rules, and synthesis requirements explicitly. Run multi-agent sessions with this prompt before you write any coordination code. Only add code-level orchestration when you find specific, concrete things the prompt can't express reliably — not because the code feels more "real" or testable.

Route your model calls immediately. Map every call type your agent makes (safety checks, summarization, classification, routing decisions, deep planning) and assign each to the cheapest model that can reliably handle it. You can always upgrade a call type later if quality degrades. Downgrading is harder once your production system assumes Sonnet on every call.

Implement your security sequencing before you touch permissions. Trust establishment first, per-tool permission check second, explicit confirmation for high-risk operations third. In that order, always. If you're working with agentic AI for enterprise systems, this sequencing isn't optional — it's the difference between an agent that's safe to give elevated permissions and one that isn't.

For teams that want structured guidance on implementing production-grade agent architecture — from tool design through multi-agent orchestration and security hardening — our AI engineering service covers system design all the way through production deployment.

The Floor Just Changed

Before March 31, 2026, building a reliable production AI agent required months of accumulated trial-and-error that the major labs weren't sharing. You worked out the context management patterns yourself, hit the same permission architecture failures everyone else hit, and rebuilt orchestration logic from scratch. That constraint is now gone.

What the leak demonstrated is that the hard part of building a production AI agent was never the model. It was always the harness — the context management, the permission architecture, the tool registry, the orchestration logic, the circuit breakers on things that will fail at scale. Anthropic spent years and significant engineering effort on that harness. The blueprint is now public knowledge, analyzed by thousands of developers over a single weekend.

The teams that ship reliable agents in the next 12 months won't be the ones with access to better models. They'll be the ones who took the architecture seriously before the model quality became a commodity. The floor moved. Most teams haven't noticed yet.

Talk to our AI engineering team about building production-grade agents →