What Is an AI Harness and Why It Matters

Agent

Architecture

Python

Author: Oleh Baranovskyi

Published on Apr 30, 2026

8 MIN READ

What Is an AI Harness?
Where It Fits in the Stack
Core Responsibilities
Harness Lifecycle
Building a Minimal Harness
Real-World Examples
When to Use a Harness
Conclusion

An AI harness is the execution environment that wraps an agent loop. Where an agent handles the ReAct cycle — reason, act, observe — the harness manages everything around that cycle: starting and stopping sessions, enforcing permissions, firing lifecycle hooks, and injecting configuration.

Think of it the same way you would a test harness in software engineering: it is not the code under test, it is the scaffolding that controls how that code runs.

Without a harness, an agent is powerful but unguarded — no session boundaries, no lifecycle control, no permission gates. The harness is what makes an agent production-ready.

The harness sits between the user-facing interface and the agent loop. The LLM and tools are internal to the agent; the harness wraps the agent and exposes it outward.

graph TD
    User(["👤 User\n(CLI / IDE / API)"])

    subgraph Harness["🏗️ AI Harness"]
        direction TB
        Config["⚙️ Config & Env"]
        Session["📂 Session Manager"]
        Hooks["🪝 Hook System"]
        Permissions["🔒 Permission Layer"]
    end

    Agent["🤖 Agent Loop\n(ReAct / Plan-Execute)"]
    LLM["🧠 LLM\n(Claude / GPT-4)"]
    Tools["🔧 Tools\n(FS / Web / Shell / APIs)"]

    User --> Harness
    Harness --> Agent
    Agent --> LLM
    Agent --> Tools

The key insight: the agent loop does not know about sessions, users, or permissions. That is intentional — the harness absorbs all cross-cutting complexity so the agent stays focused on reasoning.

A harness has four well-defined responsibilities. Each maps to a distinct subsystem.

mindmap
  root((AI Harness))
    Session Management
      Start and resume session
      Persist conversation history
      Cleanup on exit
    Hook System
      PreToolCall
      PostToolCall
      OnError
      OnStop
    Permission Layer
      Allow-list tools
      Approve destructive actions
      Sandboxing
    Config and Env
      Model selection
      API keys
      Tool registration

A session represents one continuous interaction from start to finish. The harness initialises the session (loads history, injects the system prompt), maintains it across turns (appending messages, managing the context window), and cleans it up on exit (persisting memory, closing resources).

Hooks are callbacks the harness fires at defined points in the agent lifecycle. They let you observe and intercept agent behaviour without touching the agent loop itself.

Common hook points:

PreToolCall — inspect or block a tool call before it runs

PostToolCall — log or transform the tool result

OnStop — run cleanup, commit state, send notifications

OnError — catch unhandled exceptions, trigger fallback logic

The permission layer sits in front of every tool call. Before the agent can execute an action, the harness checks whether that action is allowed:

Allow-lists — a static set of permitted tools or shell commands

Dynamic approval — prompt the user for explicit confirmation before destructive actions

Sandboxing — restrict file system access, network calls, or shell commands to a safe scope

This is what prevents a coding agent from accidentally deleting your repository or pushing to the wrong branch.

The harness owns loading and propagating configuration: which model to use, which API keys to inject, which tools to register, what the system prompt contains. Centralising this in the harness means the agent code itself has no hard-coded dependencies on environment details.

Every request follows the same path through the harness before it reaches — and after it returns from — the agent loop.

flowchart TD
    A([User Input]) --> B[Load Config and Session]
    B --> C[Inject System Prompt]
    C --> D[Agent Loop]
    D --> E{Tool Call?}
    E -- Yes --> F{Permission Check}
    F -- Denied --> G[Return Denial Message]
    F -- Allowed --> H[Fire PreToolCall Hook]
    H --> I[Execute Tool]
    I --> J[Fire PostToolCall Hook]
    J --> D
    E -- No --> K[Final Answer]
    K --> L[Fire OnStop Hook]
    L --> M[Persist Session]
    M --> N([Return to User])

The permission check and hooks wrap every tool call — not just the first one. This gives the harness consistent control throughout the entire agent run, not just at the edges.

Below is a minimal Python harness that demonstrates all four responsibilities in one readable unit. It uses the Anthropic SDK with Claude but the pattern is model-agnostic.

Install the dependency:

pip install anthropic

# harness.py
from __future__ import annotations

import asyncio
from dataclasses import dataclass, field
from typing import Awaitable, Callable, Optional

import anthropic

HookFn = Callable[["HookContext"], Awaitable[None] | None]


@dataclass
class HookContext:
    tool: Optional[str] = None
    args: Optional[dict] = None
    result: Optional[str] = None
    error: Optional[Exception] = None


@dataclass
class Hooks:
    pre_tool_call: Optional[HookFn] = None
    post_tool_call: Optional[HookFn] = None
    on_stop: Optional[HookFn] = None
    on_error: Optional[HookFn] = None


@dataclass
class HarnessConfig:
    model: str
    allowed_tools: list[str]
    system_prompt: str
    hooks: Hooks = field(default_factory=Hooks)


class AIHarness:
    def __init__(self, config: HarnessConfig) -> None:
        self._client = anthropic.AsyncAnthropic()  # async client — does not block the event loop
        self._config = config
        self._history: list[dict] = []  # session history

    def _is_allowed(self, tool: str) -> bool:
        return tool in self._config.allowed_tools

    async def _fire(self, hook: Optional[HookFn], ctx: HookContext) -> None:
        if hook is None:
            return
        result = hook(ctx)
        if asyncio.iscoroutine(result):
            await result

    async def run(self, user_input: str) -> str:
        # Session management — append new user turn
        self._history.append({"role": "user", "content": user_input})

        try:
            for _ in range(10):
                response = await self._client.messages.create(
                    model=self._config.model,
                    max_tokens=1024,
                    system=self._config.system_prompt,
                    messages=self._history,
                    tools=[],  # register your tool schemas here
                )

                # Append assistant turn to session
                self._history.append({"role": "assistant", "content": response.content})

                if response.stop_reason == "end_turn":
                    text = "".join(
                        b.text for b in response.content if b.type == "text"
                    )
                    await self._fire(self._config.hooks.on_stop, HookContext())
                    return text

                tool_results = []
                for block in response.content:
                    if block.type != "tool_use":
                        continue

                    ctx = HookContext(tool=block.name, args=dict(block.input))  # type: ignore[arg-type]

                    if not self._is_allowed(block.name):
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": "Permission denied.",
                        })
                        continue

                    await self._fire(self._config.hooks.pre_tool_call, ctx)
                    result = f"[result of {block.name}]"  # replace with real execution
                    ctx.result = result
                    await self._fire(self._config.hooks.post_tool_call, ctx)

                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": block.id,
                        "content": result,
                    })

                if tool_results:
                    self._history.append({"role": "user", "content": tool_results})

        except Exception as exc:
            await self._fire(self._config.hooks.on_error, HookContext(error=exc))
            raise

        return "Iteration limit reached."

Wiring it up with hooks and an allow-list:

# main.py
import asyncio
from harness import AIHarness, HarnessConfig, Hooks, HookContext


def pre_tool(ctx: HookContext) -> None:
    print(f"[pre]  {ctx.tool} {ctx.args}")


def post_tool(ctx: HookContext) -> None:
    print(f"[post] {ctx.tool} → {ctx.result}")


def on_stop(ctx: HookContext) -> None:
    print("[stop] session complete")


def on_error(ctx: HookContext) -> None:
    print(f"[error] {ctx.error}")


harness = AIHarness(
    HarnessConfig(
        model="claude-sonnet-4-6",
        allowed_tools=["read_file", "search_web"],
        system_prompt="You are a helpful coding assistant.",
        hooks=Hooks(
            pre_tool_call=pre_tool,
            post_tool_call=post_tool,
            on_stop=on_stop,
            on_error=on_error,
        ),
    )
)

answer = asyncio.run(harness.run("List the files in the current directory."))
print(answer)

The four concerns — session, hooks, permissions, config — are each handled in one clear place. None of that logic leaks into the agent loop.

Harness	Wraps	Standout Feature
Claude Code	Claude API	Hooks, MCP servers, permission allow-lists, session memory
LangChain AgentExecutor	Any LLM	Callback system, memory adapters, tool routing
AutoGen	Any LLM	Multi-agent conversations, human-in-the-loop
LlamaIndex Agent	Any LLM	RAG-first pipelines, retrieval-augmented tool use

Claude Code is the clearest example of a mature harness: it manages sessions across your file system, fires hooks before and after every tool call, enforces a configurable permission allow-list, integrates MCP servers as first-class tool providers, and persists memory across conversations — all without the agent loop having any awareness of those details.

Scenario	Raw LLM Call	Agent Loop	Harness
One-shot Q&A	✓	—	—
Multi-step tool use	—	✓	—
Persistent sessions across turns	—	—	✓
Permission gates before tool calls	—	—	✓
CLI / IDE / production deployment	—	—	✓
Audit logging and observability	—	—	✓

The decision is straightforward: if your use case requires more than one turn, user-facing permissions, or production reliability, reach for a harness. A plain LLM call is still the right choice for simple, single-shot tasks like summarisation or classification.

An AI harness is not optional complexity — it is the production wrapper that makes an agent safe and maintainable. It absorbs the cross-cutting concerns (sessions, hooks, permissions, config) so the agent loop can stay focused on reasoning.

The minimal implementation above is genuinely enough to start. Add hook types as you discover new lifecycle points, tighten the permission layer as you deploy to real users, and plug in a persistent session store when you need memory across restarts.

Summary

A harness wraps the agent loop and owns sessions, hooks, permissions, and config.

Every tool call passes through the permission layer and fires a pre and post hook.

The agent loop itself stays pure — all cross-cutting concerns live in the harness.

Claude Code, LangChain AgentExecutor, and AutoGen are production examples of this pattern.

Reach for a harness when your use case needs sessions, safety gates, or production reliability.

Table of contents

What Is an AI Harness?

Where It Fits in the Stack

Core Responsibilities

Session Management

Hook System

Permission Layer

Configuration and Environment

Harness Lifecycle

Building a Minimal Harness

Real-World Examples

When to Use a Harness

Conclusion