The plumbing behind Claude Code

This article analyzes publicly available source code from instructkr/claude-code. All code snippets are used for commentary and educational purposes. Claude Code is a product of Anthropic. This article is not affiliated with or endorsed by Anthropic.

The codebase nobody was supposed to see

A snapshot of Claude Code's source code leaked. I spent a day reading it (instructkr/claude-code). Not to copy it, but to understand the engineering decisions behind a tool that millions of developers use daily.

What I found was not magic. It was plumbing. The kind that reveals what actually matters when you build an AI tool that runs on someone's machine, touches their files, and burns their money.

The codebase is TypeScript, built on Bun, using React Ink for terminal UI. The entry point is src/entrypoints/cli.tsx. The main loop lives in src/query.ts. There are 40+ tool implementations in src/tools/, a permission system in src/utils/permissions/, and a compaction engine in src/services/compact/.

Here is what the code teaches.

1. Your biggest cost lever is the prompt cache

The system prompt is not a static string. It is a composable pipeline of sections, split into two halves by a boundary marker.

In src/constants/prompts.ts:114-115:

export const SYSTEM_PROMPT_DYNAMIC_BOUNDARY =
  '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'

Everything before this marker (identity, coding rules, tool descriptions, tone) is static across users and sessions. Everything after (memory, MCP instructions, language, output style) is per-session. The boundary is injected at prompts.ts:573:

// --- Static content (cacheable) ---
getSimpleIntroSection(outputStyleConfig),
getSimpleSystemSection(),
getSimpleDoingTasksSection(),
getActionsSection(),
getUsingYourToolsSection(enabledTools),
getSimpleToneAndStyleSection(),
getOutputEfficiencySection(),
// === BOUNDARY MARKER - DO NOT MOVE OR REMOVE ===
...(shouldUseGlobalCacheScope() ? [SYSTEM_PROMPT_DYNAMIC_BOUNDARY] : []),
// --- Dynamic content (registry-managed) ---
...resolvedDynamicSections,

The section system lives in src/constants/systemPromptSections.ts. Two constructors:

systemPromptSection() (line 20): memoized, computed once, cached until /clear or /compact
DANGEROUS_uncachedSystemPromptSection() (line 32): recomputes every turn, explicitly breaks the prompt cache. Requires a reason string.

The naming is intentional. If you want to add a volatile section, you have to type DANGEROUS_ and explain yourself.

Cache break detection

They built observability for cache breaks in src/services/api/promptCacheBreakDetection.ts. It hashes every component of the API request between turns: system blocks, tool schemas, beta headers, model ID, effort level, cache control settings, per-tool schema hashes, and AFK mode state. When any hash changes, it logs a diff showing exactly which field broke the cache.

The comment at line 37 explains why per-tool hashes exist:

Diffed to name which tool's description changed when toolSchemasChanged but added=removed=0 (77% of tool breaks per BQ 2026-03-22). AgentTool/SkillTool embed dynamic agent/command lists.

77% of their tool-related cache breaks came from tool descriptions changing, not tools being added or removed. The AgentTool's description includes a dynamic list of available sub-agents, which changes when MCP servers connect or plugins load.

Takeaway: Prompt caching gives a 90% cost reduction on cache hits. Treat cache breaks like production incidents. The static/dynamic split is table stakes, but the real wins come from tracking why the cache breaks and fixing the sources.

2. "Unlimited" context is a structured summary with circuit breakers

Auto-compaction triggers when token usage crosses a threshold. The constants live in src/services/compact/autoCompact.ts:62-70:

export const AUTOCOMPACT_BUFFER_TOKENS = 13_000

// BQ 2026-03-10: 1,279 sessions had 50+ consecutive failures (up to 3,272)
// in a single session, wasting ~250K API calls/day globally.
const MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3

That comment is gold. Before the circuit breaker, 1,279 sessions were hammering the API with 50+ doomed compaction attempts each. One session hit 3,272 consecutive failures. At scale, that was 250K wasted API calls per day.

The compaction prompt

The summary prompt in src/services/compact/prompt.ts is not "summarize this conversation." It is a rigid 9-section template (line 61, BASE_COMPACT_PROMPT):

Primary request and intent - all explicit user requests
Key technical concepts - technologies, frameworks discussed
Files and code sections - with full code snippets and why each matters
Errors and fixes - every error, how it was fixed, user feedback
Problem solving - solved problems and ongoing troubleshooting
All user messages - verbatim, non-tool-result messages
Pending tasks - explicitly requested work
Current work - what was happening right before compaction
Optional next step - with direct quotes from recent conversation

Section 6 is the critical one. All user messages preserved verbatim. Not paraphrased. User intent drifts when you rephrase it.

The analysis scratchpad

Before summarizing, the model writes an <analysis> block (line 31, DETAILED_ANALYSIS_INSTRUCTION_BASE). This is chain-of-thought for the compaction step itself. The formatCompactSummary() function at line 311 strips it:

export function formatCompactSummary(summary: string): string {
  let formattedSummary = summary
  // Strip analysis section
  formattedSummary = formattedSummary.replace(
    /<analysis>[\s\S]*?<\/analysis>/, ''
  )
  // ...
}

The analysis improves summary quality but never enters the conversation context. Free quality boost.

No tools allowed

The compaction agent inherits the parent's tool definitions (for cache sharing) but must not use them. The preamble at line 19 (NO_TOOLS_PREAMBLE) says it three times:

CRITICAL: Respond with TEXT ONLY. Do NOT call any tools.
- Do NOT use Read, Bash, Grep, Glob, Edit, Write, or ANY other tool.
- Tool calls will be REJECTED and will waste your only turn — you will fail the task.

A trailer at line 271 repeats it again. Models are stubborn about using tools when they see tool schemas in the request.

Takeaway: Structured summarization with a rigid template beats naive truncation. Preserve user messages word-for-word. Use a scratchpad that gets stripped. Build circuit breakers for any automated process that can fail in a loop.

3. 2,592 lines of bash security

src/tools/BashTool/bashSecurity.ts is the most paranoid file in the codebase. Five layers of validation.

Layer 1: Pattern matching (line 16)

COMMAND_SUBSTITUTION_PATTERNS blocks 13 categories of dangerous shell syntax:

const COMMAND_SUBSTITUTION_PATTERNS = [
  { pattern: /<\(/, message: 'process substitution <()' },
  { pattern: />\(/, message: 'process substitution >()' },
  { pattern: /=\(/, message: 'Zsh process substitution =()' },
  { pattern: /(?:^|[\s;&|])=[a-zA-Z_]/, message: 'Zsh equals expansion (=cmd)' },
  { pattern: /\$\(/, message: '$() command substitution' },
  { pattern: /\$\{/, message: '${} parameter substitution' },
  { pattern: /\$\[/, message: '$[] legacy arithmetic expansion' },
  { pattern: /~\[/, message: 'Zsh-style parameter expansion' },
  { pattern: /\(e:/, message: 'Zsh-style glob qualifiers' },
  { pattern: /\(\+/, message: 'Zsh glob qualifier with command execution' },
  { pattern: /\}\s*always\s*\{/, message: 'Zsh always block (try/always construct)' },
  { pattern: /<#/, message: 'PowerShell comment syntax' },
]

The Zsh equals expansion is subtle: =curl evil.com expands to /usr/bin/curl evil.com, bypassing deny rules that check the base command name.

Layer 2: Zsh dangerous commands (line 45)

ZSH_DANGEROUS_COMMANDS blocks 20+ Zsh builtins:

const ZSH_DANGEROUS_COMMANDS = new Set([
  'zmodload',   // gateway to zsh/mapfile, zsh/system, zsh/zpty, zsh/net/tcp
  'emulate',    // emulate with -c flag is an eval-equivalent
  'sysopen',    // Opens files with fine-grained control (zsh/system)
  'syswrite',   // Writes to file descriptors (zsh/system)
  'zpty',       // Executes commands on pseudo-terminals (zsh/zpty)
  'ztcp',       // Creates TCP connections for exfiltration (zsh/net/tcp)
  'zf_rm',      // Builtin rm from zsh/files — bypasses binary checks
  // ... 13 more
])

zmodload zsh/mapfile enables invisible file I/O through array assignment. No binary is executed. No command appears in ps. The file contents just appear in a variable.

Layer 3: Tree-sitter AST parsing

The ValidationContext type at line 103 includes an optional treeSitter?: TreeSitterAnalysis field. When available, validators use syntax tree analysis instead of regex. This catches cases where regex is ambiguous: nested quotes, escaped characters, multi-line commands.

Layer 4: Sandbox runtime

src/utils/sandbox/sandbox-adapter.ts wraps @anthropic-ai/sandbox-runtime with filesystem and network restrictions at the OS level. The convertToSandboxRuntimeConfig() function at line 172 translates Claude Code's permission rules into sandbox policies.

Layer 5: Permission rules

The full permission system in src/utils/permissions/permissions.ts. The hasPermissionsToUseTool() function at line 473 is the gate every tool call passes through. Rules load from enterprise managed settings (highest precedence), project config, user config, and defaults.

Command semantics

A separate concern in src/tools/BashTool/commandSemantics.ts. grep returning exit code 1 means "no matches," not "error." diff returning 1 means "files differ." The COMMAND_SEMANTICS map at line 31 defines per-command exit code interpretation. Without this, the model would see false errors constantly and waste turns trying to "fix" them.

Takeaway: Shell execution has more attack surface than you expect. Zsh alone has a dozen ways to execute arbitrary code that bypass naive command parsing. Budget serious engineering time for security, or use a sandbox and accept the tradeoffs.

4. Tool results go to disk, not context

src/constants/toolLimits.ts defines the budgets:

export const DEFAULT_MAX_RESULT_SIZE_CHARS = 50_000      // per-tool cap
export const MAX_TOOL_RESULT_TOKENS = 100_000             // ~400KB
export const MAX_TOOL_RESULTS_PER_MESSAGE_CHARS = 200_000 // per-turn aggregate

When a tool result exceeds the threshold, src/utils/toolResultStorage.ts persists it to disk. The model receives a preview wrapped in a <persisted-output> tag (line 30) with the file path. It can then use the Read tool to access the full content if needed.

The per-message aggregate budget (200K) prevents five parallel grep calls each returning 40K characters from dumping 200K into one message. The comment at line 49:

This prevents N parallel tools from each hitting the per-tool max and collectively producing e.g. 10 x 40K = 400K in one turn's user message.

Thresholds are configurable per-tool via GrowthBook feature flags. The getPersistenceThreshold() function at line 55 checks for runtime overrides before falling back to the default. Some tools opt out entirely: the Read tool sets its threshold to Infinity because persisting its output to a file the model reads back with Read is circular.

Takeaway: Large tool outputs kill the context window. Persist to disk, send a preview, let the model read the full file if it needs to. This beats truncation because truncation loses information permanently. Cap the aggregate per turn, not just per tool.

5. Fork sub-agents share the parent's cache

src/tools/AgentTool/forkSubagent.ts implements the fork path. The isForkSubagentEnabled() gate at line 32 checks three conditions: the feature flag is on, coordinator mode is off, and the session is interactive.

The FORK_AGENT definition at line 60:

export const FORK_AGENT = {
  agentType: FORK_SUBAGENT_TYPE,
  tools: ['*'],           // inherits parent's exact tool pool
  maxTurns: 200,
  model: 'inherit',       // same model = same cache
  permissionMode: 'bubble', // surfaces permission prompts to parent
  getSystemPrompt: () => '', // unused — parent's rendered bytes are threaded through
}

model: 'inherit' is critical. A different model cannot reuse the parent's prompt cache. The comment above the definition: "Reconstructing by re-calling getSystemPrompt() can diverge (GrowthBook cold to warm) and bust the prompt cache; threading the rendered bytes is byte-exact."

Byte-identical prefixes

The buildForkedMessages() function at line 107 constructs the child's conversation. All tool_result blocks use an identical placeholder (line 93):

const FORK_PLACEHOLDER_RESULT = 'Fork started — processing in background'

Every fork child from the same parent turn gets the same placeholder for every tool result. Only the final text block (the directive) differs. This maximizes cache hits across parallel forks.

The fork preamble

buildChildMessage() at line 171 injects a strict 10-rule preamble:

You are a forked worker process. You are NOT the main agent.
RULES (non-negotiable):
1. Your system prompt says "default to forking." IGNORE IT — that's for the parent.
   You ARE the fork. Do NOT spawn sub-agents; execute directly.
2. Do NOT converse, ask questions, or suggest next steps
...
8. Keep your report under 500 words unless the directive specifies otherwise.
9. Your response MUST begin with "Scope:". No preamble, no thinking-out-loud.
10. REPORT structured facts, then stop

Anti-recursion at line 82: fork children detect the <fork-boilerplate> tag in their history and refuse to fork again.

Takeaway: If your architecture spawns child agents, design for cache sharing from the start. Byte-identical prefixes are the difference between child agents costing 10% of a fresh request and 100%. Thread the parent's rendered system prompt bytes directly. Do not re-render them.

6. Build-time dead code elimination for feature flags

Throughout the codebase, feature('FLAG_NAME') from bun:bundle controls build-time DCE. In src/entrypoints/cli.tsx:

if (feature('ABLATION_BASELINE') && process.env.CLAUDE_CODE_ABLATION_BASELINE) {
  for (const k of [
    'CLAUDE_CODE_SIMPLE', 'CLAUDE_CODE_DISABLE_THINKING',
    'DISABLE_INTERLEAVED_THINKING', 'DISABLE_COMPACT',
    'DISABLE_AUTO_COMPACT', 'CLAUDE_CODE_DISABLE_AUTO_MEMORY',
    'CLAUDE_CODE_DISABLE_BACKGROUND_TASKS'
  ]) {
    process.env[k] ??= '1'
  }
}

In external builds, the entire block disappears. The string 'ABLATION_BASELINE' is gone. The env var names inside are gone. No runtime check, no dead branch.

The same pattern in src/constants/prompts.ts:

const proactiveModule =
  feature('PROACTIVE') || feature('KAIROS')
    ? require('../proactive/index.js')
    : null

If PROACTIVE and KAIROS are both false at build time, the entire require() and the module it loads are eliminated. The proactive module's code never ships.

Feature flags in the codebase include: TRANSCRIPT_CLASSIFIER, FORK_SUBAGENT, CONTEXT_COLLAPSE, ULTRATHINK, REACTIVE_COMPACT, NATIVE_CLIENT_ATTESTATION, VERIFICATION_AGENT, KAIROS, PROACTIVE, EXPERIMENTAL_SKILL_SEARCH, ABLATION_BASELINE, DAEMON, BRIDGE_MODE, BG_SESSIONS, TEMPLATES, and more.

Runtime flags use GrowthBook (getFeatureValue_CACHED_MAY_BE_STALE) for A/B testing and gradual rollout. The two layers serve different purposes: build-time controls code inclusion, runtime controls behavior.

Takeaway: If you ship a binary that talks to your API, assume someone will run strings on it. Build-time DCE is not optional for keeping internal details internal. Separate build-time flags (code inclusion) from runtime flags (behavior rollout).

7. Three tiers of memory

src/memdir/memdir.ts implements the memory system. The loadMemoryPrompt() function at line 419 assembles all applicable memories into the system prompt.

Three tiers:

Project memory: CLAUDE.md files in the repo (read by the system prompt builder)
User memory: ~/.claude/CLAUDE.md (global preferences)
Auto-memory: Automatically extracted memories in ~/.claude/memory/ with relevance-based retrieval via src/memdir/findRelevantMemories.ts

Team memory sync exists for shared knowledge across team members, gated behind the TEAMMEM feature flag.

8. Client attestation: proving you are real

src/constants/system.ts contains getAttributionHeader() at line 73. The comment at line 64:

When NATIVE_CLIENT_ATTESTATION is enabled, includes a cch=00000 placeholder. Before the request is sent, Bun's native HTTP stack finds this placeholder in the request body and overwrites the zeros with a computed hash. The server verifies this token to confirm the request came from a real Claude Code client.

const cch = feature('NATIVE_CLIENT_ATTESTATION') ? ' cch=00000;' : ''

The placeholder is a fixed-length string. Bun's HTTP layer overwrites it in-place in the serialized request body. No Content-Length change. No buffer reallocation. The server checks the hash to distinguish real Claude Code clients from API wrappers.

9. The permission system is the product

Five permission modes in src/utils/permissions/PermissionMode.ts:

Mode	Behavior
`default`	Ask for each action
`plan`	Read-only, no mutations
`acceptEdits`	Auto-approve file edits
`bypassPermissions`	Auto-approve everything
`auto`	ML classifier decides (internal only, gated behind `TRANSCRIPT_CLASSIFIER`)

The auto mode is the most interesting. In src/utils/permissions/permissions.ts:59-63:

const classifierDecisionModule = feature('TRANSCRIPT_CLASSIFIER')
  ? (require('./classifierDecision.js') as typeof import('./classifierDecision.js'))
  : null
const autoModeStateModule = feature('TRANSCRIPT_CLASSIFIER')
  ? (require('./autoModeState.js') as typeof import('./autoModeState.js'))
  : null

The classifier reads the conversation transcript and decides whether a tool call is safe to auto-approve. When it denies, the denial is recorded with the tool name, description, and reason. The user sees a notification and can check /permissions to review.

Enterprise managed settings take precedence over everything. An admin can lock the tool to read-only, restrict writable directories, or block commands. The user cannot override it. This is what makes the tool deployable in corporate environments.

10. Other patterns worth noting

Context window awareness

src/utils/context.ts defines model-specific context windows. Default is 200K tokens (line 9). Sonnet 4.6 and Opus 4.6 support 1M context. Output token defaults are capped at 8K (line 24, CAPPED_DEFAULT_MAX_TOKENS) with escalation to 64K on retry, because their data showed p99 output is only 4,911 tokens. Over-reserving 32K-64K wastes slot capacity.

Model cost tracking

src/utils/modelCost.ts has hardcoded pricing tiers. Opus 4.6 fast mode costs $30/$150 per Mtok (line 63, COST_TIER_30_150). Standard Sonnet is $3/$15 (COST_TIER_3_15). Real-time cost accumulation in src/cost-tracker.ts.

File history snapshots

src/utils/fileHistory.ts creates a backup before every file edit. Up to 100 snapshots per session (MAX_SNAPSHOTS at line 55). Supports /rewind to any point. Each snapshot is tied to a message ID for precise rollback.

Ultrathink

src/utils/thinking.ts detects the keyword "ultrathink" in user input (line 29, hasUltrathinkKeyword) and triggers extended thinking mode. The UI renders a rainbow color animation (line 60, RAINBOW_COLORS). Gated behind the ULTRATHINK feature flag and a GrowthBook killswitch (tengu_turtle_carbon).

The cyber risk instruction

src/constants/cyberRiskInstruction.ts is a single paragraph owned by the Safeguards team. The file header warns: "DO NOT MODIFY THIS INSTRUCTION WITHOUT SAFEGUARDS TEAM REVIEW." It defines the boundary between defensive security assistance and harmful activities. Injected into every system prompt.

The tool type system

src/Tool.ts defines the tool interface. Each tool declares: name, description (can be dynamic), inputSchema (Zod), call(), isEnabled(), hasPermissionsToUseTool(), renderToolUseMessage() / renderToolResultMessage() (React Ink components), getToolUseSummary(), and maxResultSizeChars. The buildTool() factory provides defaults. 40+ tools implement this interface across src/tools/.

What this means for builders

The Claude Code codebase is an engineering response to five hard problems:

Cost: Static/dynamic prompt split, cache break detection, fork-based cache sharing (prompts.ts, promptCacheBreakDetection.ts, forkSubagent.ts)
Context limits: Structured auto-compaction with circuit breakers (autoCompact.ts, compact/prompt.ts)
Security: Multi-layer bash validation, sandboxing, enterprise permissions (bashSecurity.ts, sandbox-adapter.ts)
Large outputs: Disk persistence with previews instead of truncation (toolResultStorage.ts, toolLimits.ts)
Feature management: Build-time DCE for internal features, runtime flags for rollout (cli.tsx)

None of these are novel ideas. Caching is old. Summarization is old. Sandboxing is old. What is new is how they compose in an AI coding tool where the model runs on someone else's machine, burns someone else's money, and touches someone else's code.

If you are building in this space, these are the problems you will hit. Not "how do I make the model smarter," but how do I make it cheaper, safer, and trustworthy enough that someone lets it run unsupervised.

The model is the easy part. The plumbing is the product.