An agent is a loop — model, tools, memory, stop condition. Build the ReAct loop by hand in ~40 lines, watch it run live, then learn exactly when interviewers expect you to reach for it versus a plan-execute or a plain workflow.
ai-eng-wiki/examples/agents/react_agent.pyAn "AI agent" is not a model. It is a loop around a model. The model proposes an action, your code executes it, the result goes back into the context, and the loop repeats until the model decides it is done. Strip away every framework and that is all an agent is — four moving parts: a model, a set of tools, an accumulating memory (the message list), and a stop condition.
ReAct (Reason + Act, Yao et al., 2022) is the simplest useful shape of that loop: the model reasons in natural language about what to do, acts by calling a tool, observes the result, and reasons again. Modern LLMs do this natively through tool-use / function-calling APIs — you no longer parse "Action: search[...]" out of free text the way the original paper did; the model emits a structured tool_use block and the API hands it to you typed.
If you can write this loop from memory, you understand 80% of what production agent frameworks (LangGraph, the OpenAI Agents SDK, the Claude Agent SDK) do for you. The other 20% — durability, sub-agents, context management — is the subject of the rest of this pillar.
Every agent question at a frontier lab bottoms out here. The interviewer is checking whether you see an agent as a loop you control or as magic a library does. The tell at each level:
tool_use to history before sending the result back (the #1 beginner bug)?A senior who says "I'd just use LangGraph" without being able to draw the loop underneath is a red flag. The library is fine; not knowing what it does is not.
What is it? An agent loop runs the same four steps over and over: the model reasons about what to do, calls a tool, sees the result, and loops back to reason again.
The four parts:
Step by step.
Remember this: the trick is that you must append the model's decision to memory before you append the tool results—this is what the API expects.
messages = [user_request]
while True:
response = model(messages, tools) # REASON: model picks an action
messages.append(response) # remember what it decided
if response has no tool calls:
return response.text # STOP: model is done
for call in response.tool_calls: # ACT: run the tools
result = execute(call)
messages.append(result) # OBSERVE: feed results backThree invariants make or break this:
tool_use block and its matching tool_result to both be in history, in order, with matching IDs. Skip the assistant turn and you get a 400.tool_result blocks together, not one message each.stop_reason == "end_turn"), with a hard iteration cap as a backstop.ReAct decides the next action after seeing each result. That adaptivity is its strength and its cost: every step is a fresh model call.
A plan-execute agent (next lesson) makes a plan once, then executes it, calling the model far less often — cheaper and faster, but brittle if the plan was wrong.
And often you need neither. Anthropic's Building Effective Agents draws the line that frontier interviewers expect you to draw:
Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Agents are systems where the LLM dynamically directs its own process and tool usage.
If you can write the control flow as a fixed pipeline (classify → retrieve → answer), do that — it is deterministic, debuggable, and cheap. Reach for an agent loop only when the steps genuinely cannot be known in advance. "The simplest thing that works" is the senior answer almost every time.
Step through the difference yourself — watch the model-call counts diverge on the same task:
Task: “Book the cheapest flight to Tokyo next Friday and add it to my calendar.”. Boxes outlined in rust are model (LLM) calls — watch the counts diverge.
The model only knows what your tool descriptions tell it. A vague search(q) gets misused; a precise "Search internal docs. Call this whenever the answer depends on company-specific information you don't already know" gets used correctly. Recent Opus models reach for tools more conservatively, so prescriptive, when-to-call descriptions measurably raise the right-call rate. Tool design is a first-class skill — it gets its own lesson (Tool Use & MCP).
Here is a complete ReAct agent against the Claude Messages API — no framework, ~40 lines of real logic. The full runnable file (with a CLI and a mock-LLM fallback so it runs with no API key) is in ai-eng-wiki/examples/agents/react_agent.py.
from anthropic import Anthropic
client = Anthropic() # reads ANTHROPIC_API_KEY from the environment
# 1) TOOLS — schema the model sees, plus the Python that actually runs.
TOOLS = [
{
"name": "calculator",
"description": "Evaluate an arithmetic expression like (3 + 4) * 2. "
"Call this for any non-trivial math instead of doing it in your head.",
"input_schema": {
"type": "object",
"properties": {"expression": {"type": "string"}},
"required": ["expression"],
},
},
{
"name": "get_weather",
"description": "Get the current weather for a city. Call when the user asks about weather.",
"input_schema": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"],
},
},
]
def run_tool(name: str, args: dict) -> str:
if name == "calculator":
# Restrict to arithmetic so eval is safe.
expr = args["expression"]
if not all(c in "0123456789+-*/(). " for c in expr):
return "Error: only arithmetic is allowed."
return str(eval(expr))
if name == "get_weather":
return f"{args['location']}: 21°C, partly cloudy."
return f"Unknown tool: {name}"
# 2) THE LOOP — reason, act, observe, repeat.
def agent(user_msg: str, max_turns: int = 6) -> str:
messages = [{"role": "user", "content": user_msg}]
for _ in range(max_turns): # hard cap = the backstop
resp = client.messages.create(
model="claude-opus-4-8", # swap to claude-sonnet-4-6 / -haiku-4-5 for cost
max_tokens=1024,
system="You solve the task by reasoning and using tools. "
"State each step briefly ('Thought: ...') before acting.",
tools=TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content}) # INVARIANT 1
if resp.stop_reason != "tool_use": # STOP: model is done
return "".join(b.text for b in resp.content if b.type == "text")
results = []
for block in resp.content: # ACT on every requested tool
if block.type == "tool_use":
out = run_tool(block.name, block.input)
results.append({
"type": "tool_result",
"tool_use_id": block.id, # IDs must match
"content": out,
})
messages.append({"role": "user", "content": results}) # INVARIANT 2: all results, one turn
return "Stopped: hit the turn limit."
print(agent("What's the weather in Tokyo, and what is (18 * 7) + 12?"))Read the loop against the three invariants in §3.1 — they are all right there. The eval is gated to arithmetic characters; in production you would never eval model output without a sandbox, but for a calculator tool the character whitelist is a reasonable guard.
Part 1: Tools (lines 99–119). This is the "menu" the model sees. Each tool is an object with a name (what the model calls), a description (when to call it), and an input_schema (what arguments it expects). The model reads these descriptions and learns that calculator exists and should be called for math, and get_weather exists for weather. The descriptions are the entire interface—vague descriptions produce bad tool calls.
Part 2: run_tool (lines 121–130). When the model asks for a tool, this function executes it. For calculator, it blocks dangerous characters and runs Python's eval on the expression. For get_weather, it returns a fake sunny day. In production, these would hit real APIs (math libraries, weather services). Notice: if a tool call fails, we return the error as a string—the loop catches it and sends it back to the model as a failed tool result, not a crash.
Part 3: The agent loop (lines 133–159). This is the engine:
stop_reason: if it's "end_turn" (not "tool_use"), the model is done; extract and return the textWhat it does end-to-end: the loop sends the user's question, gets back a decision ("use calculator"), runs that tool, feeds the result back, gets back another decision ("use weather"), runs that, feeds it back, then gets back a final answer. Each step the model adapts based on what it just learned.
Now run a real one. The playground below streams an actual Claude tool-use trace (or a deterministic mock if no key is set) — every Thought, every tool call, every observation, exactly as the loop above produces them:
The toy loop is correct but not production-ready. The gap is what IC5+ interviews live in:
| Concern | Toy loop | What production adds |
|---|---|---|
| Runaway loops | max_turns cap |
Cap + budget on tokens and wall-clock; detect repeated identical tool calls and break. |
| Tool failures | crashes | Return the error as a tool_result with is_error: true so the model can recover or report — don't raise. |
| Context growth | unbounded | Old tool results bloat context; use context editing / compaction (see Context Engineering). |
| Cost / latency | one model per step | Cache the stable system+tools prefix (prompt caching); route easy steps to a cheaper model. |
| Observability | print |
Trace every step (tool, input, latency, tokens) — you cannot debug what you can't see. |
Imagine you ask: "What's the weather in Tokyo, and what is (18 * 7) + 12?" Here's what the loop does:
"What's the weather in Tokyo, and what is (18 * 7) + 12?"tool_use blocks for both calculator(expression="(18 * 7) + 12") and get_weather(location="Tokyo")—append this to messages(18 * 7) + 12 → 132, weather returns "Tokyo: 21°C, partly cloudy."tool_result blocks (one per tool) go back together132 and the weather. It reasons: "I have everything I need."stop_reason == "end_turn"), returns: "The weather in Tokyo is 21°C, partly cloudy. And (18 * 7) + 12 = 132."Each step feeds into the next. The model is not locked into a plan; it adapts. That's ReAct's power—and its cost: 3 model calls instead of 1.
| Determinism | none | Agents are non-deterministic; gate side-effecting tools (send_email, deploy) behind confirmation. |
Two senior instincts worth stating out loud in an interview:
send_email tool can be intercepted and confirmed; bash -c "curl ..." cannot. Reversibility drives the tool surface (this is the heart of agent design).max_turns cap. (2) Token + wall-clock budget. (3) Detect repeated identical (tool, args) calls and break or inject a "that didn't work, try differently" message. (4) Return tool errors as tool_result is_error: true so the model adapts rather than the harness crashing. (5) Per-tool timeout + retry-with-backoff on the tool, not the loop. (6) An evaluator check on "are we making progress?" for long horizons. (7) Trace everything so you can see where it's stuck.tool_result blocks.Flashcard. Agent = model + tools + memory + stop condition, run in a loop. ReAct = reason→act→observe, one model call per step: adaptive but pricey. Use the simplest thing that works — workflow < plan-execute < ReAct in order of how much you should have to justify it.