Table of contents
What Is Context Engineering?
Context engineering is the discipline of deliberately designing and managing what goes into an LLM's context window at runtime. It goes beyond writing a good system prompt — it is about deciding what information the model sees, in what form, and in what order, for every single request.
A model's output quality is directly bounded by its input quality. Context engineering is how you control that input.
flowchart LR
CE["Context Engineering"]
CE --> SP["What to put in\nthe system prompt"]
CE --> HS["How much history\nto include"]
CE --> RD["Which external data\nto retrieve & inject"]
CE --> TR["Which tool results\nto surface"]
CE --> FMT["How to format\nand order it all"]
Why It Matters
LLMs have no persistent memory. Every call is a fresh start — the model knows only what you put in front of it. Poor context leads to:
- Hallucinations caused by missing facts
- Ignored instructions buried under irrelevant text
- Wasted tokens on content that doesn't help the model answer
- Inconsistent behavior across turns
Context engineering is how you prevent all of these systematically rather than patching each symptom individually.
The Context Budget
Every model has a context window — a hard token limit for a single call. Everything inside that window costs tokens: system prompt, history, retrieved documents, tool outputs, and the user message.
pie title Context Budget Allocation (example)
"System prompt" : 10
"Conversation history" : 25
"Injected / retrieved data" : 40
"Tool results" : 15
"Current user message" : 10
Context engineering means treating tokens as a budget and deciding consciously how to spend them. The key principle: only include content that changes what the model outputs.
Core Techniques
Retrieval-Augmented Generation (RAG) Instead of relying on training knowledge, retrieve relevant documents at query time and inject them. The model gets exactly the facts it needs for this specific question.
flowchart LR
Q["User query"] --> R["Retrieve relevant\ndocuments"]
R --> CTX["Inject into context"]
CTX --> LLM["LLM generates\ngrounded answer"]
History summarization Rather than replaying every past turn verbatim, compress older turns into a short summary. Only recent turns stay in full.
flowchart LR
OLD["Old turns (verbatim)"] --> SUM["Summarize → compact paragraph"]
RECENT["Recent turns (verbatim)"] --> CTX["Context"]
SUM --> CTX
Structured system prompt Organize the system prompt into clear sections (role, rules, output format, examples) so the model can parse it predictably. Avoid prose walls.
## Role
You are a senior TypeScript developer assistant.
## Rules
- Answer only about TypeScript and Node.js.
- Always include type annotations in code examples.
## Output format
Respond in plain text. Use fenced code blocks for all code.
Dynamic injection Build the context programmatically per request — inject user profile, feature flags, current date, or live API results only when they are relevant to the query.
Context pruning Actively remove stale, redundant, or off-topic content before each call. A smaller, tighter context outperforms a large, diluted one.
Context Engineering vs Prompt Engineering
flowchart TD
PE["Prompt Engineering\nCrafting the wording of a single instruction"]
CE["Context Engineering\nDesigning the full information environment\nfor every model call"]
PE -->|"is one part of"| CE
Prompt engineering is about how you phrase something. Context engineering is about what the model can see at all — which makes it the broader, more impactful discipline for production systems.
A Practical Example
Imagine a customer support bot. A naive implementation passes the full chat history and a generic system prompt. A context-engineered implementation does this instead:
flowchart TD
UM["User message"] --> INT["Classify intent"]
INT --> RET["Retrieve relevant\nFAQ / policy docs"]
INT --> PROF["Fetch user account\nsummary"]
RET --> BUILD["Assemble context"]
PROF --> BUILD
SYS["Focused system prompt\n(role + rules only)"] --> BUILD
HIST["Last 3 turns\n(not full history)"] --> BUILD
BUILD --> LLM["LLM call"]
LLM --> ANS["Accurate, grounded answer"]
Each piece of context is chosen for this specific query. Nothing is included by default.
Common Pitfalls
Injecting everything by default. More context is not always better. Irrelevant content dilutes the useful signal and increases cost.
Never trimming history. Unbounded history eventually consumes the entire budget, leaving no room for retrieved data.
Burying key instructions in the middle. LLMs tend to underweight instructions in the middle of a long context. Put critical rules at the start or end of the system prompt.
Unsanitized user input in the system prompt. Placing raw user text in a privileged position opens prompt-injection attacks.
Static context for a dynamic world. Hardcoding documents or facts into the system prompt means the model works with stale information. Retrieve and inject dynamically instead.
