What Is an AI Agent and How to Build One

Agent

TypeScript

Author: Oleh Baranovskyi

Published on Mar 16, 2026

17 MIN READ

What Is an AI Agent?
What an AI Agent Consists Of
Concepts
How an AI Agent Works
Project Structure
Building a Simple AI Agent
Agent Types
When to Use an AI Agent
Challenges and Pitfalls
Conclusion

Asking an LLM a question and getting an answer back is powerful — but it is still just a single call. You provide input, the model responds, done. This works well for isolated tasks, yet it breaks down the moment you need the model to do something: search the web, write a file, call an API, or retry a failed step based on what it learned a moment ago.

That is exactly the gap AI agents are designed to fill. An AI agent wraps an LLM with a loop that lets it observe its environment, decide what action to take, execute that action, and then observe the result — repeating until the goal is reached. Instead of answering once, the agent keeps going until the job is done.

This article explains what AI agents are, how they work internally, and how to build one from scratch in TypeScript with the OpenAI SDK — one file per diagram component, no heavy framework required.

An AI agent is not a single thing — it is a composition of six interconnected components. The LLM sits at the top as the brain; beneath it, three subsystems feed it information (Perception, Memory, Tools); and at the bottom, two components close the loop (Planning and Action, with a Feedback loop returning results back up).

graph TD
    LLM["🧠 LLM (Brain)\nReasons, plans, decides"]

    LLM --> Perception["👁️ Perception\nText, images, data"]
    LLM --> Memory["💾 Memory\nShort & long-term"]
    LLM --> Tools["🔧 Tools\nWeb, code, APIs"]

    Perception --> Planning["📋 Planning\nGoals, sub-tasks"]
    Memory --> Action["⚡ Action\nExecute & respond"]
    Tools --> Feedback["🔄 Feedback loop\nObserve & refine"]

    Feedback -->|"loop"| LLM

The central reasoning engine. Given everything currently in its context — the goal, conversation history, available tool schemas, and retrieved memories — the LLM decides what to do next: ask a clarifying question, call a tool, create a plan, or produce a final answer. It does not store anything permanently; every decision is made fresh from what is currently in the context window.

Because the LLM is stateless between calls, memory is managed externally on its behalf. Two types work together:

Short-term (in-context) — the running conversation history appended to every request. Immediate and free, but bounded by the model's context window.
Long-term (external) — a vector store, relational database, or key-value cache that survives across sessions. Relevant chunks are retrieved and injected into the prompt only when needed, so the agent can recall past work without bloating the context.

The bridge between the LLM's text output and the real world. Tools are described to the model as structured schemas (name, description, parameters). When the LLM decides a tool is needed, it emits a structured call; the orchestration layer runs the matching function and returns the result as a new observation. Typical categories:

Retrieval — web search, vector DB lookup, document reader
Computation — code interpreter, calculator, data transformer
Side-effect — email sender, file writer, REST API caller, database writer

How the agent receives information from the outside world. This is not limited to plain text — modern multimodal models can also perceive images, structured data (JSON, CSV), PDFs, audio transcripts, and more. Perception is the entry point: whatever the agent cannot perceive, it cannot act on.

The ability to decompose a high-level goal into a sequence of concrete sub-tasks before or during execution. Some agents plan upfront (writing a full task list before starting); others plan dynamically (deciding the next step only after seeing the result of the previous one). Dynamic planning — used in the ReAct pattern — is more common in practice because real goals are rarely predictable end-to-end.

Action is where plans meet reality: the agent executes a tool call, writes a file, calls an API, or responds to the user. The feedback loop closes the cycle — the result of each action is observed and appended to the conversation history, becoming the new context for the next reasoning step. This observe-and-refine cycle is what separates an agent from a one-shot LLM call.

Before diving into code, it is worth locking down the vocabulary. These terms appear everywhere in the agent space and are often used loosely.

The "brain" of the agent. It reads text in and produces text out. On its own it has no memory between calls and cannot reach outside its context window. Everything else in an agent system is built around compensating for these limitations.

A way of letting the LLM call external functions. You describe tools in structured form (name, description, parameters), send them alongside the prompt, and the model can respond by saying "call this tool with these arguments" instead of answering directly. Your code then runs the tool and sends the result back. This is also called function calling.

Because LLMs are stateless, memory must be managed externally:

In-context memory — the conversation history you append to each request. Cheap but bounded by the context window.
External memory — a database, vector store, or key-value cache that survives across sessions and can be retrieved selectively.

The ability to decompose a goal into smaller steps before or during execution. Some agents plan upfront (a static plan), others plan dynamically (decide the next action after seeing the previous result). The second approach is far more common in practice because goals are rarely fully predictable.

The code that runs the agent loop — deciding when to call the LLM, when to run a tool, when to stop, and how to handle errors. This can be a few dozen lines of custom code or a dedicated framework like LangChain, LlamaIndex, or AutoGen.

At its core, every agent follows the same cycle, often called the ReAct loop (Reason → Act → Observe):

Receive goal — the user provides a task in natural language.

Think — the LLM reasons about what to do next and optionally selects a tool.

Act — if a tool was selected, your code runs it.

Observe — the tool result is appended to the conversation history.

Repeat — steps 2–4 run again until the LLM decides the task is complete.

flowchart TD
    A([User Goal]) --> B[LLM: Reason]
    B --> C{Tool call needed?}
    C -- Yes --> D[Run Tool]
    D --> E[Append Result to History]
    E --> B
    C -- No --> F([Return Final Answer])

The loop is simple. What makes agents powerful — or fragile — is the quality of the tool descriptions, the system prompt, and how the orchestration layer handles edge cases.

The code is split into one file per diagram component so every layer has a single, clear responsibility.

ai-agent-demo/
├── src/
│   ├── perception.ts     # Perception       — normalise raw user input into agent messages
│   ├── memory.ts         # Memory           — short-term in-context history store
│   ├── tools.ts          # Tools            — schemas + implementations (web, code, APIs)
│   ├── planning.ts       # Planning         — decompose goal into an ordered sub-task list
│   ├── agent.ts          # Action + Feedback loop — drive the LLM, run tools, observe
│   └── index.ts          # Entry point      — wire everything together and run
├── package.json
└── tsconfig.json

File	Diagram component	Responsibility
perception.ts	Perception	Converts raw input (text, images, data) into structured messages
memory.ts	Memory	Keeps short-term history; extensible to long-term retrieval
tools.ts	Tools	Declares tool schemas and the functions they execute
planning.ts	Planning	Breaks the goal into ordered sub-tasks before the loop starts
agent.ts	Action + Feedback loop	Calls the LLM, executes tools, observes results, repeats
index.ts	-	Entry point

The agent below uses the OpenAI SDK with gpt-4o-mini. Each source file maps directly to one component from the diagram, so you can see exactly which layer of the architecture you are looking at.

Getting your key: Sign up at platform.openai.com, go to API Keys, create a new secret key, and save it as OPENAI_API_KEY.

Install the dependencies:

npm init -y
npm install openai
npm install -D typescript tsx @types/node

Perception is the first layer the user's input passes through. Its job is to accept any supported input type — plain text today, but easily extended to images or structured data — and return a clean UserMessage that the rest of the agent can consume without caring about the original format.

// src/perception.ts

export interface UserMessage {
  type: "text";
  content: string;
}

/**
 * Perception layer.
 * Accepts raw input (text, and in future: images, structured data)
 * and normalises it into a UserMessage for the agent.
 */
export function perceive(rawInput: string): UserMessage {
  // Trim whitespace and normalise line endings
  const content = rawInput.trim().replace(/\r\n/g, "\n");

  if (!content) {
    throw new Error("[perception] Empty input — nothing to perceive.");
  }

  console.log(`[perception] received input (${content.length} chars)`);
  return { type: "text", content };
}

Memory holds the running conversation so the LLM always has context. Short-term memory is the message array sent with every API call. The class below manages that array and exposes a retrieve hook — a natural extension point for a long-term vector-store retrieval later.

// src/memory.ts
import type OpenAI from "openai";

export type Message = OpenAI.Chat.ChatCompletionMessageParam;

/**
 * Memory layer.
 * Manages short-term in-context history.
 * Extend `retrieve()` to add long-term / semantic memory.
 */
export class Memory {
  private history: Message[] = [];

  /** Add a message to in-context history. */
  add(message: Message): void {
    this.history.push(message);
  }

  /**
   * Retrieve relevant context.
   * Currently returns full history — swap for semantic search when needed.
   */
  retrieve(): Message[] {
    return [...this.history];
  }

  snapshot(): Message[] {
    return this.history;
  }
}

The Tools layer bridges the LLM's text output with real-world actions. Each tool is described to the model as a JSON schema; when the model decides to call one, the orchestrator routes it to the matching function here.

// src/tools.ts
import type OpenAI from "openai";

export type ToolName = "get_weather" | "calculate";

export interface ToolResult {
  output: string;
}

/** Tool schemas sent to the LLM so it knows what is available. */
export const toolSchemas: OpenAI.Chat.ChatCompletionTool[] = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description:
        "Returns current weather for a city. " +
        "Use when the user asks about weather conditions.",
      parameters: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name, e.g. 'Tokyo'" },
        },
        required: ["city"],
      },
    },
  },
  {
    type: "function",
    function: {
      name: "calculate",
      description: "Evaluates an arithmetic expression and returns the result.",
      parameters: {
        type: "object",
        properties: {
          expression: {
            type: "string",
            description: "Arithmetic expression, e.g. '(42 + 8) * 2'",
          },
        },
        required: ["expression"],
      },
    },
  },
];

/** Execute a tool by name and return its result. */
export function runTool(name: ToolName, args: Record<string, string>): ToolResult {
  if (name === "get_weather") {
    const city = args.city ?? "Unknown";
    const fakeData: Record<string, string> = {
      Berlin: "12°C, partly cloudy",
      Tokyo: "28°C, sunny",
      London: "9°C, rainy",
    };
    return { output: `Weather in ${city}: ${fakeData[city] ?? "Data not available"}` };
  }

  if (name === "calculate") {
    try {
      // NOTE: replace with a safe math parser (e.g. mathjs) in production
      const result = Function(`"use strict"; return (${args.expression})`)();
      return { output: String(result) };
    } catch {
      return { output: "Error: invalid expression" };
    }
  }

  return { output: "Unknown tool" };
}

Before the main loop starts, the Planning layer asks the LLM to break the goal into an ordered list of sub-tasks. This gives the agent an upfront map rather than improvising every step — especially useful for multi-step goals where later steps depend on earlier ones.

// src/planning.ts
import OpenAI from "openai";

const client = new OpenAI(); // reads OPENAI_API_KEY from env

export interface Plan {
  steps: string[];
}

/**
 * Planning layer.
 * Asks the LLM to decompose a goal into ordered sub-tasks.
 * Returns the list so the agent can track progress through the plan.
 */
export async function createPlan(goal: string): Promise<Plan> {
  const response = await client.chat.completions.create({
    model: "gpt-4o-mini",
    max_tokens: 512,
    messages: [
      {
        role: "system",
        content:
          "You are a planning assistant. " +
          "Given a user goal, output ONLY a numbered list of concrete sub-tasks needed to complete it. " +
          "No extra text — just the numbered list.",
      },
      { role: "user", content: `Goal: ${goal}` },
    ],
  });

  const text = response.choices[0]?.message.content ?? "";

  // Parse "1. ...\n2. ..." into an array of step strings
  const steps = text
    .split("\n")
    .map((line) => line.replace(/^\d+\.\s*/, "").trim())
    .filter(Boolean);

  console.log(`[planning] created ${steps.length}-step plan:`);
  steps.forEach((s, i) => console.log(`  ${i + 1}. ${s}`));

  return { steps };
}

This is the core of the agent. It receives the perceived input, the plan, and access to memory and tools, then drives the LLM through the ReAct loop: call the LLM → execute any tool calls → observe the result → append to memory → repeat. The feedback loop is the for cycle itself — every observation feeds directly back into the next LLM call.

// src/agent.ts
import OpenAI from "openai";
import { Memory } from "./memory";
import { toolSchemas, runTool, ToolName } from "./tools";
import type { Plan } from "./planning";
import type { UserMessage } from "./perception";

const client = new OpenAI(); // reads OPENAI_API_KEY from env

const SYSTEM_PROMPT =
  "You are a helpful assistant with access to tools. " +
  "A plan has been prepared for you — work through it step by step. " +
  "Use tools whenever they give more accurate or up-to-date answers. " +
  "Once all steps are done, provide the final answer.";

const MAX_ITERATIONS = 10;

/**
 * Action + Feedback loop.
 * Drives the LLM through the ReAct cycle:
 *   perceive → (plan already done) → reason → act → observe → feedback → repeat
 */
export async function runAgent(
  input: UserMessage,
  plan: Plan,
  memory: Memory
): Promise<string> {
  // Seed memory: system prompt, user goal, then the prepared plan
  memory.add({ role: "system", content: SYSTEM_PROMPT });
  memory.add({ role: "user", content: input.content });
  memory.add({
    role: "user",
    content:
      "Before answering, follow this plan:\n" +
      plan.steps.map((s, i) => `${i + 1}. ${s}`).join("\n"),
  });

  for (let iteration = 0; iteration < MAX_ITERATIONS; iteration++) {
    // ── ACTION: call the LLM with full memory context ──────────────────────
    const response = await client.chat.completions.create({
      model: "gpt-4o-mini",
      messages: memory.retrieve(),
      tools: toolSchemas,
      tool_choice: "auto",
    });

    const choice = response.choices[0];
    if (!choice) return "No response from model.";

    // Append assistant turn to memory (feedback loop — step 1)
    memory.add(choice.message);

    // ── OBSERVE: final answer or tool calls? ───────────────────────────────
    if (choice.finish_reason !== "tool_calls" || !choice.message.tool_calls?.length) {
      return choice.message.content ?? "No answer returned.";
    }

    // ── ACTION: execute each tool call ─────────────────────────────────────
    for (const toolCall of choice.message.tool_calls) {
      if (toolCall.type !== "function") continue;

      const name = toolCall.function.name as ToolName;
      const args = JSON.parse(toolCall.function.arguments) as Record<string, string>;

      console.log(`[action]   tool call → ${name}`, args);
      const result = runTool(name, args);

      // ── FEEDBACK: append observation — closes the loop ──────────────────
      console.log(`[feedback] tool result ← ${result.output}`);
      memory.add({
        role: "tool",
        tool_call_id: toolCall.id,
        content: result.output,
      });
    }
  }

  return "Agent reached the iteration limit without a final answer.";
}

index.ts connects every layer in the order the diagram shows: perceive → memory → plan → act + feedback loop.

// src/index.ts
import { perceive } from "./perception";
import { Memory } from "./memory";
import { createPlan } from "./planning";
import { runAgent } from "./agent";

const rawGoal = "What is the weather like in Tokyo, and what is 1234 multiplied by 56?";

(async () => {
  console.log("Goal:", rawGoal);
  console.log("---");

  // 1. Perception — normalise raw input
  const input = perceive(rawGoal);

  // 2. Memory — initialise empty short-term store
  const memory = new Memory();

  // 3. Planning — decompose goal into sub-tasks
  const plan = await createPlan(input.content);

  // 4. Action + Feedback loop — run the agent
  const answer = await runAgent(input, plan, memory);

  console.log("\nFinal answer:\n", answer);
})();

Run it:

OPENAI_API_KEY=your_key npx tsx src/index.ts

Expected output:

Goal: What is the weather like in Tokyo, and what is 1234 multiplied by 56?
---
[perception] received input (60 chars)
[planning] created 2-step plan:
  1. Retrieve the current weather for Tokyo
  2. Calculate 1234 × 56
[action]   tool call → get_weather { city: 'Tokyo' }
[feedback] tool result ← Weather in Tokyo: 28°C, sunny
[action]   tool call → calculate { expression: '1234 * 56' }
[feedback] tool result ← 69104

Final answer:
 The weather in Tokyo is currently 28°C and sunny. And 1234 × 56 = 69,104.

Each log prefix maps directly to a diagram component — [perception], [planning], [action], [feedback] — so you can trace exactly which layer is active at any point.

Not all agents work the same way. The architecture you choose should match the complexity of the tasks you expect.

Agent Type	Decides Next Step?	Uses Memory?	Example Use Case
Simple Reflex	No	No	Trigger-based email responder
ReAct (Reason + Act)	Yes	No	Web research assistant
Plan-and-Execute	Yes	Partial	Multi-step coding assistant
Multi-Agent	Yes (per agent)	Yes	Software dev team simulation

ReAct (the pattern used above) is the most practical starting point for most projects. It is flexible, easy to debug, and well-supported by modern LLMs.

Multi-agent setups assign specialised sub-agents to different roles (researcher, coder, reviewer) coordinated by an orchestrator. They scale to complex workflows but add significant operational overhead.

Agents are not always the right tool. The extra complexity only pays off in specific situations.

Scenario	Use an Agent?	Reason
Single, well-defined question	No	A plain LLM call is simpler and faster
Multi-step task with unknowns	Yes	Agent can plan and adapt
Needs real-time data (web, DB)	Yes	Tool use bridges the gap
Requires human approval mid-task	Yes	Human-in-the-loop pattern
Pure text transformation	No	No autonomy needed

A plain LLM call is simpler, cheaper, and more predictable. Reach for agents when the task genuinely requires autonomy — iterating, deciding, or using external data.

Infinite loops. Without a hard iteration cap, an agent that cannot find an answer will keep trying. Always set MAX_ITERATIONS.

Prompt injection via tool results. If a tool fetches content from the web or a database, that content lands inside the LLM's context. A malicious string like "Ignore your instructions and..." embedded in an API response can hijack the agent. Sanitise or clearly delimit external content before returning it as a tool result.

Cost and latency. Every iteration is a full LLM call. A task that takes five iterations costs five times as much as a single call. Monitor token usage carefully in production.

Irreversible actions. If a tool deletes a file, sends an email, or charges a credit card, a reasoning mistake causes real-world consequences. Add confirmation steps or a human-in-the-loop checkpoint before any destructive or high-stakes action.

Hallucinated tool calls. Models occasionally invent tool names or parameters that do not exist. Always validate the tool name against your schema before executing anything.

An AI agent is an LLM embedded inside a loop that lets it use tools, observe results, and keep working until a goal is reached. The core pattern — Reason, Act, Observe — is straightforward to implement and already opens up a wide class of problems that a single LLM call cannot handle.

The example in this article is intentionally minimal: a loop, two tools, and an iteration cap. That is genuinely all you need to start. From this base you can add persistent memory, parallel tool execution, multi-agent coordination, or human-in-the-loop checkpoints — one layer at a time, driven by actual need rather than speculation.

Start small, instrument your loops, and add complexity only when the task demands it.

Summary

An AI agent wraps an LLM with a ReAct loop: reason → act (tool call) → observe → repeat.
Tools bridge the gap between the LLM's text output and real-world actions.
Memory is managed externally — either in the message history or a separate store.
Always bound your loop, sanitise tool results, and guard irreversible actions.
Begin with a single-agent ReAct pattern; escalate to multi-agent only when justified.

Table of contents

What Is an AI Agent?

What an AI Agent Consists Of

LLM (Brain)

Memory

Tools

Perception

Planning

Action & Feedback Loop

Concepts

LLM (Large Language Model)

Tool Use

Memory

Planning

Orchestration

How an AI Agent Works

Project Structure

Building a Simple AI Agent

Step 1: Perception — normalise raw input

Step 2: Memory — keep conversation history

Step 3: Tools — web, code, APIs

Step 4: Planning — decompose the goal

Step 5: Action + Feedback Loop

Step 6: Wire It All Together

Agent Types

When to Use an AI Agent

Challenges and Pitfalls

Conclusion