Building AI Agents
IC3IC4IC5

Building a ReAct Agent from Scratch

An agent is a loop — model, tools, memory, stop condition. Build the ReAct loop by hand in ~40 lines, watch it run live, then learn exactly when interviewers expect you to reach for it versus a plan-execute or a plain workflow.

19 min read · 11 sections
Prerequisites: function calling / tool use, basic LLM prompting
Runnable: ai-eng-wiki/examples/agents/react_agent.py

1. Quick anchor

An "AI agent" is not a model. It is a loop around a model. The model proposes an action, your code executes it, the result goes back into the context, and the loop repeats until the model decides it is done. Strip away every framework and that is all an agent is — four moving parts: a model, a set of tools, an accumulating memory (the message list), and a stop condition.

ReAct (Reason + Act, Yao et al., 2022) is the simplest useful shape of that loop: the model reasons in natural language about what to do, acts by calling a tool, observes the result, and reasons again. Modern LLMs do this natively through tool-use / function-calling APIs — you no longer parse "Action: search[...]" out of free text the way the original paper did; the model emits a structured tool_use block and the API hands it to you typed.

If you can write this loop from memory, you understand 80% of what production agent frameworks (LangGraph, the OpenAI Agents SDK, the Claude Agent SDK) do for you. The other 20% — durability, sub-agents, context management — is the subject of the rest of this pillar.

2. Why interviewers probe this

Every agent question at a frontier lab bottoms out here. The interviewer is checking whether you see an agent as a loop you control or as magic a library does. The tell at each level:

  • IC3/IC4 — Can you name the four parts and write the loop without a framework? Do you append the assistant's tool_use to history before sending the result back (the #1 beginner bug)?
  • IC5 — Can you articulate the agent-vs-workflow decision, and the ReAct-vs-plan-execute tradeoff, in terms of cost, latency, and failure modes — not vibes?
  • IC6+ — Can you reason about where the loop breaks at scale: context growth, error compounding, non-determinism, and how you'd bound and observe it?

A senior who says "I'd just use LangGraph" without being able to draw the loop underneath is a red flag. The library is fine; not knowing what it does is not.

3. Concept build-up

3.1 The loop, precisely

Beginner explainerNew here? The agent loop in 30 seconds

What is it? An agent loop runs the same four steps over and over: the model reasons about what to do, calls a tool, sees the result, and loops back to reason again.

The four parts:

  • Model — the LLM (Claude) that decides what action to take next
  • Tools — functions the model can call (search, calculator, send email)
  • Memory — the message history that grows with each thought and result
  • Stop condition — when the model says it's done (no more tool calls)

Step by step.

  1. Start with the user's question in memory
  2. Call the model with all the tools available
  3. The model either (a) calls a tool or (b) says it's done with an answer
  4. If it called a tool, run that tool, add the result to memory, and go back to step 2
  5. If it said it's done, return the answer and stop

Remember this: the trick is that you must append the model's decision to memory before you append the tool results—this is what the API expects.

messages = [user_request]
while True:
    response = model(messages, tools)        # REASON: model picks an action
    messages.append(response)                # remember what it decided
    if response has no tool calls:
        return response.text                 # STOP: model is done
    for call in response.tool_calls:         # ACT: run the tools
        result = execute(call)
        messages.append(result)              # OBSERVE: feed results back

Three invariants make or break this:

  1. Append the assistant turn before the tool results. The API requires the tool_use block and its matching tool_result to both be in history, in order, with matching IDs. Skip the assistant turn and you get a 400.
  2. Return all tool results in a single user turn. If the model requested three tools in parallel, you send back three tool_result blocks together, not one message each.
  3. The stop condition is the model's, not yours — you stop when it stops calling tools (stop_reason == "end_turn"), with a hard iteration cap as a backstop.

3.2 ReAct vs Plan-Execute vs a plain workflow

ReAct decides the next action after seeing each result. That adaptivity is its strength and its cost: every step is a fresh model call.

A plan-execute agent (next lesson) makes a plan once, then executes it, calling the model far less often — cheaper and faster, but brittle if the plan was wrong.

And often you need neither. Anthropic's Building Effective Agents draws the line that frontier interviewers expect you to draw:

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents are systems where the LLM dynamically directs its own process and tool usage.

If you can write the control flow as a fixed pipeline (classify → retrieve → answer), do that — it is deterministic, debuggable, and cheap. Reach for an agent loop only when the steps genuinely cannot be known in advance. "The simplest thing that works" is the senior answer almost every time.

Step through the difference yourself — watch the model-call counts diverge on the same task:

◐ InteractiveReAct vs Plan-Execute

Task: “Book the cheapest flight to Tokyo next Friday and add it to my calendar.”. Boxes outlined in rust are model (LLM) calls — watch the counts diverge.

ReAct0 model calls
think → act → observe, looped
Thought: I don't know the flights yet — I'll search.
Act: search_flights(dest=Tokyo, date=Fri)
Obs: 3 options — $612 (09:40), $688 (14:10), $740 (21:30)
Thought: $612 is cheapest. Is my calendar free then?
Act: get_calendar(date=Fri)
Obs: free 08:00–12:00, busy 13:00–15:00
Thought: 09:40 fits. Add it.
Act: add_event('Flight to Tokyo $612', 09:40)
Obs: event created ✓
Final: Booked the $612 09:40 flight and added it to your calendar.
Plan-Execute0 model calls
plan once, then run the plan
Plan (1 model call): ① search_flights ② pick cheapest ③ check_calendar ④ add_event
Exec ①: search_flights(Tokyo, Fri) → 3 options
Exec ②: pick_cheapest → $612 (09:40)
Exec ③: get_calendar(Fri) → free at 09:40
Exec ④: add_event('Flight to Tokyo $612', 09:40) → ✓
Final (1 model call): Booked the $612 09:40 flight and added it.

3.3 Tools are an interface, not an afterthought

The model only knows what your tool descriptions tell it. A vague search(q) gets misused; a precise "Search internal docs. Call this whenever the answer depends on company-specific information you don't already know" gets used correctly. Recent Opus models reach for tools more conservatively, so prescriptive, when-to-call descriptions measurably raise the right-call rate. Tool design is a first-class skill — it gets its own lesson (Tool Use & MCP).

4. Minimal implementation

Here is a complete ReAct agent against the Claude Messages API — no framework, ~40 lines of real logic. The full runnable file (with a CLI and a mock-LLM fallback so it runs with no API key) is in ai-eng-wiki/examples/agents/react_agent.py.

from anthropic import Anthropic
 
client = Anthropic()  # reads ANTHROPIC_API_KEY from the environment
 
# 1) TOOLS — schema the model sees, plus the Python that actually runs.
TOOLS = [
    {
        "name": "calculator",
        "description": "Evaluate an arithmetic expression like (3 + 4) * 2. "
                       "Call this for any non-trivial math instead of doing it in your head.",
        "input_schema": {
            "type": "object",
            "properties": {"expression": {"type": "string"}},
            "required": ["expression"],
        },
    },
    {
        "name": "get_weather",
        "description": "Get the current weather for a city. Call when the user asks about weather.",
        "input_schema": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"],
        },
    },
]
 
def run_tool(name: str, args: dict) -> str:
    if name == "calculator":
        # Restrict to arithmetic so eval is safe.
        expr = args["expression"]
        if not all(c in "0123456789+-*/(). " for c in expr):
            return "Error: only arithmetic is allowed."
        return str(eval(expr))
    if name == "get_weather":
        return f"{args['location']}: 21°C, partly cloudy."
    return f"Unknown tool: {name}"
 
# 2) THE LOOP — reason, act, observe, repeat.
def agent(user_msg: str, max_turns: int = 6) -> str:
    messages = [{"role": "user", "content": user_msg}]
    for _ in range(max_turns):                       # hard cap = the backstop
        resp = client.messages.create(
            model="claude-opus-4-8",                 # swap to claude-sonnet-4-6 / -haiku-4-5 for cost
            max_tokens=1024,
            system="You solve the task by reasoning and using tools. "
                   "State each step briefly ('Thought: ...') before acting.",
            tools=TOOLS,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": resp.content})  # INVARIANT 1
 
        if resp.stop_reason != "tool_use":           # STOP: model is done
            return "".join(b.text for b in resp.content if b.type == "text")
 
        results = []
        for block in resp.content:                   # ACT on every requested tool
            if block.type == "tool_use":
                out = run_tool(block.name, block.input)
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,          # IDs must match
                    "content": out,
                })
        messages.append({"role": "user", "content": results})  # INVARIANT 2: all results, one turn
    return "Stopped: hit the turn limit."
 
print(agent("What's the weather in Tokyo, and what is (18 * 7) + 12?"))

Read the loop against the three invariants in §3.1 — they are all right there. The eval is gated to arithmetic characters; in production you would never eval model output without a sandbox, but for a calculator tool the character whitelist is a reasonable guard.

The ReAct loop—section by section

Part 1: Tools (lines 99–119). This is the "menu" the model sees. Each tool is an object with a name (what the model calls), a description (when to call it), and an input_schema (what arguments it expects). The model reads these descriptions and learns that calculator exists and should be called for math, and get_weather exists for weather. The descriptions are the entire interface—vague descriptions produce bad tool calls.

Part 2: run_tool (lines 121–130). When the model asks for a tool, this function executes it. For calculator, it blocks dangerous characters and runs Python's eval on the expression. For get_weather, it returns a fake sunny day. In production, these would hit real APIs (math libraries, weather services). Notice: if a tool call fails, we return the error as a string—the loop catches it and sends it back to the model as a failed tool result, not a crash.

Part 3: The agent loop (lines 133–159). This is the engine:

  • Line 135: hard cap on iterations (safety valve against infinite loops)
  • Line 136–143: send all messages and tools to Claude; it responds with either an answer or a tool call
  • Line 144: INVARIANT 1 — append the assistant's response (including any tool calls) to history before processing results
  • Line 146: check stop_reason: if it's "end_turn" (not "tool_use"), the model is done; extract and return the text
  • Lines 150–157: extract each tool call, run it, and build a result block with the matching ID
  • Line 158: INVARIANT 2 — append all results in one user message (not one per tool)

What it does end-to-end: the loop sends the user's question, gets back a decision ("use calculator"), runs that tool, feeds the result back, gets back another decision ("use weather"), runs that, feeds it back, then gets back a final answer. Each step the model adapts based on what it just learned.

Now run a real one. The playground below streams an actual Claude tool-use trace (or a deterministic mock if no key is set) — every Thought, every tool call, every observation, exactly as the loop above produces them:

▶ Live agent loop

5. Production tradeoffs

The toy loop is correct but not production-ready. The gap is what IC5+ interviews live in:

Concern Toy loop What production adds
Runaway loops max_turns cap Cap + budget on tokens and wall-clock; detect repeated identical tool calls and break.
Tool failures crashes Return the error as a tool_result with is_error: true so the model can recover or report — don't raise.
Context growth unbounded Old tool results bloat context; use context editing / compaction (see Context Engineering).
Cost / latency one model per step Cache the stable system+tools prefix (prompt caching); route easy steps to a cheaper model.
Observability print Trace every step (tool, input, latency, tokens) — you cannot debug what you can't see.
Why this matters: a concrete flow

Imagine you ask: "What's the weather in Tokyo, and what is (18 * 7) + 12?" Here's what the loop does:

  1. Message 1 (user): "What's the weather in Tokyo, and what is (18 * 7) + 12?"
  2. Call model with tools. Claude reads the tool descriptions and decides: "I need both tools."
  3. Append assistant turn: Claude returns tool_use blocks for both calculator(expression="(18 * 7) + 12") and get_weather(location="Tokyo")—append this to messages
  4. Execute both tools: calculator runs (18 * 7) + 12132, weather returns "Tokyo: 21°C, partly cloudy."
  5. Append results as one user turn: two tool_result blocks (one per tool) go back together
  6. Call model again. It sees both results: 132 and the weather. It reasons: "I have everything I need."
  7. Model stops calling tools (stop_reason == "end_turn"), returns: "The weather in Tokyo is 21°C, partly cloudy. And (18 * 7) + 12 = 132."
  8. Return and stop. The loop exits.

Each step feeds into the next. The model is not locked into a plan; it adapts. That's ReAct's power—and its cost: 3 model calls instead of 1.

| Determinism | none | Agents are non-deterministic; gate side-effecting tools (send_email, deploy) behind confirmation. |

Two senior instincts worth stating out loud in an interview:

  • Make hard-to-reverse actions tools, and gate them. A send_email tool can be intercepted and confirmed; bash -c "curl ..." cannot. Reversibility drives the tool surface (this is the heart of agent design).
  • The model is ~20% of a production agent. The other 80% is the harness: orchestration, memory, guardrails, evals, observability. Saying this signals you have shipped one.

6. How it's asked

[IC3] "What are the parts of an agent loop?" Model, tools, memory (the message list), and a stop condition. The loop: model proposes a tool call → harness executes it → result is appended to memory → repeat until the model stops calling tools. Mention the append-assistant-turn-before-results invariant — it shows you've actually written one.
[IC5] "When does ReAct beat plan-execute, and when does it lose?" ReAct re-decides after every observation, so it wins on open-ended, uncertain tasks where later steps depend on earlier results (debugging, research, anything where you can't enumerate steps up front). It loses on predictable multi-step tasks: it pays a model call per step (latency + tokens) and can thrash. Plan-execute front-loads one planning call, executes cheaply, but blunders if the plan was wrong — so you add an evaluator/re-plan step, which starts to look like ReAct again. The real senior answer: if you can write it as a fixed workflow, do that instead of either.
[IC5] "Your ReAct agent loops forever on a flaky tool. Every guard you'd add?" (1) Hard max_turns cap. (2) Token + wall-clock budget. (3) Detect repeated identical (tool, args) calls and break or inject a "that didn't work, try differently" message. (4) Return tool errors as tool_result is_error: true so the model adapts rather than the harness crashing. (5) Per-tool timeout + retry-with-backoff on the tool, not the loop. (6) An evaluator check on "are we making progress?" for long horizons. (7) Trace everything so you can see where it's stuck.
[IC6] "How does this loop break at 10,000 concurrent agents?" Context growth (compaction/context-editing), cost (prompt caching the shared prefix; cheaper models for easy steps; this is where you'd discuss inference economics), tail latency from slow tools (timeouts, hedging), non-determinism making incidents un-reproducible (trace + replay), and shared-tool rate limits (queueing, per-tenant budgets).

7. Pitfalls & flashcards

  • Forgetting to append the assistant turn before the tool results → API 400. The single most common bug.
  • Splitting parallel tool results across turns instead of one user message with all tool_result blocks.
  • Raising on tool failure instead of returning an error result the model can recover from.
  • No iteration cap → a flaky tool becomes an infinite, expensive loop.
  • Treating "use an agent" as the default. Most tasks are workflows. Agents cost more and fail in more ways; justify them.

Flashcard. Agent = model + tools + memory + stop condition, run in a loop. ReAct = reason→act→observe, one model call per step: adaptive but pricey. Use the simplest thing that works — workflow < plan-execute < ReAct in order of how much you should have to justify it.

8. Further reading

Primary sources
← More in Building AI Agents