Table of contents
What Is an AI Agent?
Asking an LLM a question and getting an answer back is powerful — but it is still just a single call. You provide input, the model responds, done. This works well for isolated tasks, yet it breaks down the moment you need the model to do something: search the web, write a file, call an API, or retry a failed step based on what it learned a moment ago.
That is exactly the gap AI agents are designed to fill. An AI agent wraps an LLM with a loop that lets it observe its environment, decide what action to take, execute that action, and then observe the result — repeating until the goal is reached. Instead of answering once, the agent keeps going until the job is done.
This article explains what AI agents are, how they work internally, and how to build one from scratch in TypeScript with the OpenAI SDK — one file per diagram component, no heavy framework required.
What an AI Agent Consists Of
An AI agent is not a single thing — it is a composition of six interconnected components. The LLM sits at the top as the brain; beneath it, three subsystems feed it information (Perception, Memory, Tools); and at the bottom, two components close the loop (Planning and Action, with a Feedback loop returning results back up).
graph TD
LLM["🧠 LLM (Brain)\nReasons, plans, decides"]
LLM --> Perception["👁️ Perception\nText, images, data"]
LLM --> Memory["💾 Memory\nShort & long-term"]
LLM --> Tools["🔧 Tools\nWeb, code, APIs"]
Perception --> Planning["📋 Planning\nGoals, sub-tasks"]
Memory --> Action["⚡ Action\nExecute & respond"]
Tools --> Feedback["🔄 Feedback loop\nObserve & refine"]
Feedback -->|"loop"| LLM
LLM (Brain)
The central reasoning engine. Given everything currently in its context — the goal, conversation history, available tool schemas, and retrieved memories — the LLM decides what to do next: ask a clarifying question, call a tool, create a plan, or produce a final answer. It does not store anything permanently; every decision is made fresh from what is currently in the context window.
Memory
Because the LLM is stateless between calls, memory is managed externally on its behalf. Two types work together:
- Short-term (in-context) — the running conversation history appended to every request. Immediate and free, but bounded by the model's context window.
- Long-term (external) — a vector store, relational database, or key-value cache that survives across sessions. Relevant chunks are retrieved and injected into the prompt only when needed, so the agent can recall past work without bloating the context.
Tools
The bridge between the LLM's text output and the real world. Tools are described to the model as structured schemas (name, description, parameters). When the LLM decides a tool is needed, it emits a structured call; the orchestration layer runs the matching function and returns the result as a new observation. Typical categories:
- Retrieval — web search, vector DB lookup, document reader
- Computation — code interpreter, calculator, data transformer
- Side-effect — email sender, file writer, REST API caller, database writer
Perception
How the agent receives information from the outside world. This is not limited to plain text — modern multimodal models can also perceive images, structured data (JSON, CSV), PDFs, audio transcripts, and more. Perception is the entry point: whatever the agent cannot perceive, it cannot act on.
Planning
The ability to decompose a high-level goal into a sequence of concrete sub-tasks before or during execution. Some agents plan upfront (writing a full task list before starting); others plan dynamically (deciding the next step only after seeing the result of the previous one). Dynamic planning — used in the ReAct pattern — is more common in practice because real goals are rarely predictable end-to-end.
Action & Feedback Loop
Action is where plans meet reality: the agent executes a tool call, writes a file, calls an API, or responds to the user. The feedback loop closes the cycle — the result of each action is observed and appended to the conversation history, becoming the new context for the next reasoning step. This observe-and-refine cycle is what separates an agent from a one-shot LLM call.
Concepts
Before diving into code, it is worth locking down the vocabulary. These terms appear everywhere in the agent space and are often used loosely.
LLM (Large Language Model)
The "brain" of the agent. It reads text in and produces text out. On its own it has no memory between calls and cannot reach outside its context window. Everything else in an agent system is built around compensating for these limitations.
Tool Use
A way of letting the LLM call external functions. You describe tools in structured form (name, description, parameters), send them alongside the prompt, and the model can respond by saying "call this tool with these arguments" instead of answering directly. Your code then runs the tool and sends the result back. This is also called function calling.
Memory
Because LLMs are stateless, memory must be managed externally:
- In-context memory — the conversation history you append to each request. Cheap but bounded by the context window.
- External memory — a database, vector store, or key-value cache that survives across sessions and can be retrieved selectively.
Planning
The ability to decompose a goal into smaller steps before or during execution. Some agents plan upfront (a static plan), others plan dynamically (decide the next action after seeing the previous result). The second approach is far more common in practice because goals are rarely fully predictable.
Orchestration
The code that runs the agent loop — deciding when to call the LLM, when to run a tool, when to stop, and how to handle errors. This can be a few dozen lines of custom code or a dedicated framework like LangChain, LlamaIndex, or AutoGen.
How an AI Agent Works
At its core, every agent follows the same cycle, often called the ReAct loop (Reason → Act → Observe):
flowchart TD
A([User Goal]) --> B[LLM: Reason]
B --> C{Tool call needed?}
C -- Yes --> D[Run Tool]
D --> E[Append Result to History]
E --> B
C -- No --> F([Return Final Answer])
The loop is simple. What makes agents powerful — or fragile — is the quality of the tool descriptions, the system prompt, and how the orchestration layer handles edge cases.
Project Structure
The code is split into one file per diagram component so every layer has a single, clear responsibility.
ai-agent-demo/
├── src/
│ ├── perception.ts # Perception — normalise raw user input into agent messages
│ ├── memory.ts # Memory — short-term in-context history store
│ ├── tools.ts # Tools — schemas + implementations (web, code, APIs)
│ ├── planning.ts # Planning — decompose goal into an ordered sub-task list
│ ├── agent.ts # Action + Feedback loop — drive the LLM, run tools, observe
│ └── index.ts # Entry point — wire everything together and run
├── package.json
└── tsconfig.json
| File | Diagram component | Responsibility |
|---|---|---|
|
|
Perception | Converts raw input (text, images, data) into structured messages |
|
|
Memory | Keeps short-term history; extensible to long-term retrieval |
|
|
Tools | Declares tool schemas and the functions they execute |
|
|
Planning | Breaks the goal into ordered sub-tasks before the loop starts |
|
|
Action + Feedback loop | Calls the LLM, executes tools, observes results, repeats |
|
|
- | Entry point |
Building a Simple AI Agent
The agent below uses the OpenAI SDK with gpt-4o-mini. Each source file maps directly to one component from the diagram, so you can see exactly which layer of the architecture you are looking at.
Getting your key: Sign up at platform.openai.com, go to API Keys, create a new secret key, and save it as OPENAI_API_KEY.
Install the dependencies:
npm init -y
npm install openai
npm install -D typescript tsx @types/node
Step 1: Perception — normalise raw input
Perception is the first layer the user's input passes through. Its job is to accept any supported input type — plain text today, but easily extended to images or structured data — and return a clean UserMessage that the rest of the agent can consume without caring about the original format.
// src/perception.ts
export interface UserMessage {
type: "text";
content: string;
}
/**
* Perception layer.
* Accepts raw input (text, and in future: images, structured data)
* and normalises it into a UserMessage for the agent.
*/
export function perceive(rawInput: string): UserMessage {
// Trim whitespace and normalise line endings
const content = rawInput.trim().replace(/\r\n/g, "\n");
if (!content) {
throw new Error("[perception] Empty input — nothing to perceive.");
}
console.log(`[perception] received input (${content.length} chars)`);
return { type: "text", content };
}
Step 2: Memory — keep conversation history
Memory holds the running conversation so the LLM always has context. Short-term memory is the message array sent with every API call. The class below manages that array and exposes a retrieve hook — a natural extension point for a long-term vector-store retrieval later.
// src/memory.ts
import type OpenAI from "openai";
export type Message = OpenAI.Chat.ChatCompletionMessageParam;
/**
* Memory layer.
* Manages short-term in-context history.
* Extend `retrieve()` to add long-term / semantic memory.
*/
export class Memory {
private history: Message[] = [];
/** Add a message to in-context history. */
add(message: Message): void {
this.history.push(message);
}
/**
* Retrieve relevant context.
* Currently returns full history — swap for semantic search when needed.
*/
retrieve(): Message[] {
return [...this.history];
}
snapshot(): Message[] {
return this.history;
}
}
Step 3: Tools — web, code, APIs
The Tools layer bridges the LLM's text output with real-world actions. Each tool is described to the model as a JSON schema; when the model decides to call one, the orchestrator routes it to the matching function here.
// src/tools.ts
import type OpenAI from "openai";
export type ToolName = "get_weather" | "calculate";
export interface ToolResult {
output: string;
}
/** Tool schemas sent to the LLM so it knows what is available. */
export const toolSchemas: OpenAI.Chat.ChatCompletionTool[] = [
{
type: "function",
function: {
name: "get_weather",
description:
"Returns current weather for a city. " +
"Use when the user asks about weather conditions.",
parameters: {
type: "object",
properties: {
city: { type: "string", description: "City name, e.g. 'Tokyo'" },
},
required: ["city"],
},
},
},
{
type: "function",
function: {
name: "calculate",
description: "Evaluates an arithmetic expression and returns the result.",
parameters: {
type: "object",
properties: {
expression: {
type: "string",
description: "Arithmetic expression, e.g. '(42 + 8) * 2'",
},
},
required: ["expression"],
},
},
},
];
/** Execute a tool by name and return its result. */
export function runTool(name: ToolName, args: Record<string, string>): ToolResult {
if (name === "get_weather") {
const city = args.city ?? "Unknown";
const fakeData: Record<string, string> = {
Berlin: "12°C, partly cloudy",
Tokyo: "28°C, sunny",
London: "9°C, rainy",
};
return { output: `Weather in ${city}: ${fakeData[city] ?? "Data not available"}` };
}
if (name === "calculate") {
try {
// NOTE: replace with a safe math parser (e.g. mathjs) in production
const result = Function(`"use strict"; return (${args.expression})`)();
return { output: String(result) };
} catch {
return { output: "Error: invalid expression" };
}
}
return { output: "Unknown tool" };
}
Step 4: Planning — decompose the goal
Before the main loop starts, the Planning layer asks the LLM to break the goal into an ordered list of sub-tasks. This gives the agent an upfront map rather than improvising every step — especially useful for multi-step goals where later steps depend on earlier ones.
// src/planning.ts
import OpenAI from "openai";
const client = new OpenAI(); // reads OPENAI_API_KEY from env
export interface Plan {
steps: string[];
}
/**
* Planning layer.
* Asks the LLM to decompose a goal into ordered sub-tasks.
* Returns the list so the agent can track progress through the plan.
*/
export async function createPlan(goal: string): Promise<Plan> {
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
max_tokens: 512,
messages: [
{
role: "system",
content:
"You are a planning assistant. " +
"Given a user goal, output ONLY a numbered list of concrete sub-tasks needed to complete it. " +
"No extra text — just the numbered list.",
},
{ role: "user", content: `Goal: ${goal}` },
],
});
const text = response.choices[0]?.message.content ?? "";
// Parse "1. ...\n2. ..." into an array of step strings
const steps = text
.split("\n")
.map((line) => line.replace(/^\d+\.\s*/, "").trim())
.filter(Boolean);
console.log(`[planning] created ${steps.length}-step plan:`);
steps.forEach((s, i) => console.log(` ${i + 1}. ${s}`));
return { steps };
}
Step 5: Action + Feedback Loop
This is the core of the agent. It receives the perceived input, the plan, and access to memory and tools, then drives the LLM through the ReAct loop: call the LLM → execute any tool calls → observe the result → append to memory → repeat. The feedback loop is the for cycle itself — every observation feeds directly back into the next LLM call.
// src/agent.ts
import OpenAI from "openai";
import { Memory } from "./memory";
import { toolSchemas, runTool, ToolName } from "./tools";
import type { Plan } from "./planning";
import type { UserMessage } from "./perception";
const client = new OpenAI(); // reads OPENAI_API_KEY from env
const SYSTEM_PROMPT =
"You are a helpful assistant with access to tools. " +
"A plan has been prepared for you — work through it step by step. " +
"Use tools whenever they give more accurate or up-to-date answers. " +
"Once all steps are done, provide the final answer.";
const MAX_ITERATIONS = 10;
/**
* Action + Feedback loop.
* Drives the LLM through the ReAct cycle:
* perceive → (plan already done) → reason → act → observe → feedback → repeat
*/
export async function runAgent(
input: UserMessage,
plan: Plan,
memory: Memory
): Promise<string> {
// Seed memory: system prompt, user goal, then the prepared plan
memory.add({ role: "system", content: SYSTEM_PROMPT });
memory.add({ role: "user", content: input.content });
memory.add({
role: "user",
content:
"Before answering, follow this plan:\n" +
plan.steps.map((s, i) => `${i + 1}. ${s}`).join("\n"),
});
for (let iteration = 0; iteration < MAX_ITERATIONS; iteration++) {
// ── ACTION: call the LLM with full memory context ──────────────────────
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: memory.retrieve(),
tools: toolSchemas,
tool_choice: "auto",
});
const choice = response.choices[0];
if (!choice) return "No response from model.";
// Append assistant turn to memory (feedback loop — step 1)
memory.add(choice.message);
// ── OBSERVE: final answer or tool calls? ───────────────────────────────
if (choice.finish_reason !== "tool_calls" || !choice.message.tool_calls?.length) {
return choice.message.content ?? "No answer returned.";
}
// ── ACTION: execute each tool call ─────────────────────────────────────
for (const toolCall of choice.message.tool_calls) {
if (toolCall.type !== "function") continue;
const name = toolCall.function.name as ToolName;
const args = JSON.parse(toolCall.function.arguments) as Record<string, string>;
console.log(`[action] tool call → ${name}`, args);
const result = runTool(name, args);
// ── FEEDBACK: append observation — closes the loop ──────────────────
console.log(`[feedback] tool result ← ${result.output}`);
memory.add({
role: "tool",
tool_call_id: toolCall.id,
content: result.output,
});
}
}
return "Agent reached the iteration limit without a final answer.";
}
Step 6: Wire It All Together
index.ts connects every layer in the order the diagram shows: perceive → memory → plan → act + feedback loop.
// src/index.ts
import { perceive } from "./perception";
import { Memory } from "./memory";
import { createPlan } from "./planning";
import { runAgent } from "./agent";
const rawGoal = "What is the weather like in Tokyo, and what is 1234 multiplied by 56?";
(async () => {
console.log("Goal:", rawGoal);
console.log("---");
// 1. Perception — normalise raw input
const input = perceive(rawGoal);
// 2. Memory — initialise empty short-term store
const memory = new Memory();
// 3. Planning — decompose goal into sub-tasks
const plan = await createPlan(input.content);
// 4. Action + Feedback loop — run the agent
const answer = await runAgent(input, plan, memory);
console.log("\nFinal answer:\n", answer);
})();
Run it:
OPENAI_API_KEY=your_key npx tsx src/index.ts
Expected output:
Goal: What is the weather like in Tokyo, and what is 1234 multiplied by 56?
---
[perception] received input (60 chars)
[planning] created 2-step plan:
1. Retrieve the current weather for Tokyo
2. Calculate 1234 × 56
[action] tool call → get_weather { city: 'Tokyo' }
[feedback] tool result ← Weather in Tokyo: 28°C, sunny
[action] tool call → calculate { expression: '1234 * 56' }
[feedback] tool result ← 69104
Final answer:
The weather in Tokyo is currently 28°C and sunny. And 1234 × 56 = 69,104.
Each log prefix maps directly to a diagram component — [perception], [planning], [action], [feedback] — so you can trace exactly which layer is active at any point.
Agent Types
Not all agents work the same way. The architecture you choose should match the complexity of the tasks you expect.
| Agent Type | Decides Next Step? | Uses Memory? | Example Use Case |
|---|---|---|---|
| Simple Reflex | No | No | Trigger-based email responder |
| ReAct (Reason + Act) | Yes | No | Web research assistant |
| Plan-and-Execute | Yes | Partial | Multi-step coding assistant |
| Multi-Agent | Yes (per agent) | Yes | Software dev team simulation |
ReAct (the pattern used above) is the most practical starting point for most projects. It is flexible, easy to debug, and well-supported by modern LLMs.
Multi-agent setups assign specialised sub-agents to different roles (researcher, coder, reviewer) coordinated by an orchestrator. They scale to complex workflows but add significant operational overhead.
When to Use an AI Agent
Agents are not always the right tool. The extra complexity only pays off in specific situations.
| Scenario | Use an Agent? | Reason |
|---|---|---|
| Single, well-defined question | No | A plain LLM call is simpler and faster |
| Multi-step task with unknowns | Yes | Agent can plan and adapt |
| Needs real-time data (web, DB) | Yes | Tool use bridges the gap |
| Requires human approval mid-task | Yes | Human-in-the-loop pattern |
| Pure text transformation | No | No autonomy needed |
A plain LLM call is simpler, cheaper, and more predictable. Reach for agents when the task genuinely requires autonomy — iterating, deciding, or using external data.
Challenges and Pitfalls
Infinite loops. Without a hard iteration cap, an agent that cannot find an answer will keep trying. Always set MAX_ITERATIONS.
Prompt injection via tool results. If a tool fetches content from the web or a database, that content lands inside the LLM's context. A malicious string like "Ignore your instructions and..." embedded in an API response can hijack the agent. Sanitise or clearly delimit external content before returning it as a tool result.
Cost and latency. Every iteration is a full LLM call. A task that takes five iterations costs five times as much as a single call. Monitor token usage carefully in production.
Irreversible actions. If a tool deletes a file, sends an email, or charges a credit card, a reasoning mistake causes real-world consequences. Add confirmation steps or a human-in-the-loop checkpoint before any destructive or high-stakes action.
Hallucinated tool calls. Models occasionally invent tool names or parameters that do not exist. Always validate the tool name against your schema before executing anything.
Conclusion
An AI agent is an LLM embedded inside a loop that lets it use tools, observe results, and keep working until a goal is reached. The core pattern — Reason, Act, Observe — is straightforward to implement and already opens up a wide class of problems that a single LLM call cannot handle.
The example in this article is intentionally minimal: a loop, two tools, and an iteration cap. That is genuinely all you need to start. From this base you can add persistent memory, parallel tool execution, multi-agent coordination, or human-in-the-loop checkpoints — one layer at a time, driven by actual need rather than speculation.
Start small, instrument your loops, and add complexity only when the task demands it.
Summary
- An AI agent wraps an LLM with a ReAct loop: reason → act (tool call) → observe → repeat.
- Tools bridge the gap between the LLM's text output and real-world actions.
- Memory is managed externally — either in the message history or a separate store.
- Always bound your loop, sanitise tool results, and guard irreversible actions.
- Begin with a single-agent ReAct pattern; escalate to multi-agent only when justified.
