Back to writing
· 6 min read

Why context engineering matters more than prompt engineering

Prompt engineering optimizes the question. Context engineering optimizes the information the model sees before it answers. Here's why the latter wins.

Most teams spend weeks tuning prompts. They A/B test phrasing, add few-shot examples, tweak system messages. This works — until the context window fills up with redundant, stale, or irrelevant information. Then no prompt can save you.

The problem with long contexts

LLMs don't fail because the prompt is wrong. They fail because the context is noisy. A 128k token window filled with duplicate documentation, outdated API specs, and irrelevant code snippets will produce worse results than a 4k window with exactly the right information.

This is the core insight behind Distill: deterministic context deduplication. No LLM calls, no embeddings, no probabilistic heuristics. Pure algorithms that clean your context in ~12ms.

What context engineering actually means

Context engineering is the discipline of controlling what information reaches the model:

  1. Deduplication — Remove redundant content before it enters the window
  2. Relevance filtering — Score and rank context chunks by task relevance
  3. Freshness — Prefer recent information over stale data
  4. Compression — Reduce token count without losing semantic content

Each of these is a systems problem, not a prompt problem.

Why this matters for agents

Autonomous agents make this worse. An agent that calls 10 tools accumulates context from each response. By the 5th tool call, the context window is 60% tool outputs from earlier steps — most of which are no longer relevant.

Without context engineering, agents degrade predictably: accuracy drops, latency increases, and costs scale linearly with conversation length.

The path forward

Context engineering should be a first-class concern in any LLM application. Not an afterthought. Not a "we'll optimize later" item. The quality of your context determines the ceiling of your model's performance.

Build the pipeline right, and a smaller model with clean context will outperform a larger model drowning in noise.

Support independent writing

If this post was useful, consider supporting my open source work and independent writing.