Table of contents
What Is an AI Harness?
An AI harness is the execution environment that wraps an agent loop. Where an agent handles the ReAct cycle — reason, act, observe — the harness manages everything around that cycle: starting and stopping sessions, enforcing permissions, firing lifecycle hooks, and injecting configuration.
Think of it the same way you would a test harness in software engineering: it is not the code under test, it is the scaffolding that controls how that code runs.
Where It Fits in the Stack
The harness sits between the user-facing interface and the agent loop. The LLM and tools are internal to the agent; the harness wraps the agent and exposes it outward.
graph TD
User(["👤 User\n(CLI / IDE / API)"])
subgraph Harness["🏗️ AI Harness"]
direction TB
Config["⚙️ Config & Env"]
Session["📂 Session Manager"]
Hooks["🪝 Hook System"]
Permissions["🔒 Permission Layer"]
end
Agent["🤖 Agent Loop\n(ReAct / Plan-Execute)"]
LLM["🧠 LLM\n(Claude / GPT-4)"]
Tools["🔧 Tools\n(FS / Web / Shell / APIs)"]
User --> Harness
Harness --> Agent
Agent --> LLM
Agent --> Tools
The key insight: the agent loop does not know about sessions, users, or permissions. That is intentional — the harness absorbs all cross-cutting complexity so the agent stays focused on reasoning.
Core Responsibilities
A harness has four well-defined responsibilities. Each maps to a distinct subsystem.
mindmap
root((AI Harness))
Session Management
Start and resume session
Persist conversation history
Cleanup on exit
Hook System
PreToolCall
PostToolCall
OnError
OnStop
Permission Layer
Allow-list tools
Approve destructive actions
Sandboxing
Config and Env
Model selection
API keys
Tool registration
Session Management
A session represents one continuous interaction from start to finish. The harness initialises the session (loads history, injects the system prompt), maintains it across turns (appending messages, managing the context window), and cleans it up on exit (persisting memory, closing resources).
Hook System
Hooks are callbacks the harness fires at defined points in the agent lifecycle. They let you observe and intercept agent behaviour without touching the agent loop itself.
Common hook points:
PreToolCall — inspect or block a tool call before it runsPostToolCall — log or transform the tool resultOnStop — run cleanup, commit state, send notificationsOnError — catch unhandled exceptions, trigger fallback logicPermission Layer
The permission layer sits in front of every tool call. Before the agent can execute an action, the harness checks whether that action is allowed:
This is what prevents a coding agent from accidentally deleting your repository or pushing to the wrong branch.
Configuration and Environment
The harness owns loading and propagating configuration: which model to use, which API keys to inject, which tools to register, what the system prompt contains. Centralising this in the harness means the agent code itself has no hard-coded dependencies on environment details.
Harness Lifecycle
Every request follows the same path through the harness before it reaches — and after it returns from — the agent loop.
flowchart TD
A([User Input]) --> B[Load Config and Session]
B --> C[Inject System Prompt]
C --> D[Agent Loop]
D --> E{Tool Call?}
E -- Yes --> F{Permission Check}
F -- Denied --> G[Return Denial Message]
F -- Allowed --> H[Fire PreToolCall Hook]
H --> I[Execute Tool]
I --> J[Fire PostToolCall Hook]
J --> D
E -- No --> K[Final Answer]
K --> L[Fire OnStop Hook]
L --> M[Persist Session]
M --> N([Return to User])
The permission check and hooks wrap every tool call — not just the first one. This gives the harness consistent control throughout the entire agent run, not just at the edges.
Building a Minimal Harness
Below is a minimal Python harness that demonstrates all four responsibilities in one readable unit. It uses the Anthropic SDK with Claude but the pattern is model-agnostic.
Install the dependency:
pip install anthropic
# harness.py
from __future__ import annotations
import asyncio
from dataclasses import dataclass, field
from typing import Awaitable, Callable, Optional
import anthropic
HookFn = Callable[["HookContext"], Awaitable[None] | None]
@dataclass
class HookContext:
tool: Optional[str] = None
args: Optional[dict] = None
result: Optional[str] = None
error: Optional[Exception] = None
@dataclass
class Hooks:
pre_tool_call: Optional[HookFn] = None
post_tool_call: Optional[HookFn] = None
on_stop: Optional[HookFn] = None
on_error: Optional[HookFn] = None
@dataclass
class HarnessConfig:
model: str
allowed_tools: list[str]
system_prompt: str
hooks: Hooks = field(default_factory=Hooks)
class AIHarness:
def __init__(self, config: HarnessConfig) -> None:
self._client = anthropic.AsyncAnthropic() # async client — does not block the event loop
self._config = config
self._history: list[dict] = [] # session history
def _is_allowed(self, tool: str) -> bool:
return tool in self._config.allowed_tools
async def _fire(self, hook: Optional[HookFn], ctx: HookContext) -> None:
if hook is None:
return
result = hook(ctx)
if asyncio.iscoroutine(result):
await result
async def run(self, user_input: str) -> str:
# Session management — append new user turn
self._history.append({"role": "user", "content": user_input})
try:
for _ in range(10):
response = await self._client.messages.create(
model=self._config.model,
max_tokens=1024,
system=self._config.system_prompt,
messages=self._history,
tools=[], # register your tool schemas here
)
# Append assistant turn to session
self._history.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
text = "".join(
b.text for b in response.content if b.type == "text"
)
await self._fire(self._config.hooks.on_stop, HookContext())
return text
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
ctx = HookContext(tool=block.name, args=dict(block.input)) # type: ignore[arg-type]
if not self._is_allowed(block.name):
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": "Permission denied.",
})
continue
await self._fire(self._config.hooks.pre_tool_call, ctx)
result = f"[result of {block.name}]" # replace with real execution
ctx.result = result
await self._fire(self._config.hooks.post_tool_call, ctx)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
if tool_results:
self._history.append({"role": "user", "content": tool_results})
except Exception as exc:
await self._fire(self._config.hooks.on_error, HookContext(error=exc))
raise
return "Iteration limit reached."
Wiring it up with hooks and an allow-list:
# main.py
import asyncio
from harness import AIHarness, HarnessConfig, Hooks, HookContext
def pre_tool(ctx: HookContext) -> None:
print(f"[pre] {ctx.tool} {ctx.args}")
def post_tool(ctx: HookContext) -> None:
print(f"[post] {ctx.tool} → {ctx.result}")
def on_stop(ctx: HookContext) -> None:
print("[stop] session complete")
def on_error(ctx: HookContext) -> None:
print(f"[error] {ctx.error}")
harness = AIHarness(
HarnessConfig(
model="claude-sonnet-4-6",
allowed_tools=["read_file", "search_web"],
system_prompt="You are a helpful coding assistant.",
hooks=Hooks(
pre_tool_call=pre_tool,
post_tool_call=post_tool,
on_stop=on_stop,
on_error=on_error,
),
)
)
answer = asyncio.run(harness.run("List the files in the current directory."))
print(answer)
The four concerns — session, hooks, permissions, config — are each handled in one clear place. None of that logic leaks into the agent loop.
Real-World Examples
| Harness | Wraps | Standout Feature |
|---|---|---|
| Claude Code | Claude API | Hooks, MCP servers, permission allow-lists, session memory |
| LangChain AgentExecutor | Any LLM | Callback system, memory adapters, tool routing |
| AutoGen | Any LLM | Multi-agent conversations, human-in-the-loop |
| LlamaIndex Agent | Any LLM | RAG-first pipelines, retrieval-augmented tool use |
Claude Code is the clearest example of a mature harness: it manages sessions across your file system, fires hooks before and after every tool call, enforces a configurable permission allow-list, integrates MCP servers as first-class tool providers, and persists memory across conversations — all without the agent loop having any awareness of those details.
When to Use a Harness
| Scenario | Raw LLM Call | Agent Loop | Harness |
|---|---|---|---|
| One-shot Q&A | ✓ | — | — |
| Multi-step tool use | — | ✓ | — |
| Persistent sessions across turns | — | — | ✓ |
| Permission gates before tool calls | — | — | ✓ |
| CLI / IDE / production deployment | — | — | ✓ |
| Audit logging and observability | — | — | ✓ |
The decision is straightforward: if your use case requires more than one turn, user-facing permissions, or production reliability, reach for a harness. A plain LLM call is still the right choice for simple, single-shot tasks like summarisation or classification.
Conclusion
An AI harness is not optional complexity — it is the production wrapper that makes an agent safe and maintainable. It absorbs the cross-cutting concerns (sessions, hooks, permissions, config) so the agent loop can stay focused on reasoning.
The minimal implementation above is genuinely enough to start. Add hook types as you discover new lifecycle points, tighten the permission layer as you deploy to real users, and plug in a persistent session store when you need memory across restarts.
Summary
