A living map of what's actually moving at the frontier right now, and why each item matters for an interview. Skim it weekly; depth lives in the pillars.
Interview prep rots fast in this field. This page tracks the developments a senior AI engineer is expected to already know about in mid-2026 — not deep dives (those are the pillars), but the "have you heard of this and can you say why it matters" layer that separates current candidates from ones running on 2024 knowledge.
The agent has moved into the product UI. CopilotKit is the leading SDK for in-app copilots and agentic UIs, built on the AG-UI protocol — an open, bidirectional spec for streaming state between an agent backend and a user-facing app. The ideas to be able to name:
Why it's asked: "design an agentic product" questions now expect you to discuss the frontend contract — how partial state, tool calls, and human approvals stream to a UI — not just the backend loop. Covered in Agentic Frontends & Harness Engineering.
The reframing of 2026: reliability comes from the harness around the model, not the model itself. The Learn Harness Engineering curriculum names the failure modes explicitly — context rot (attention degrades as tool output floods the window), premature "victory," and lost intent — and the scaffolding that fixes them: external memory (an AGENTS.md / todo log as the system of record), verification loops (a capable model checks a cheaper model's work), tight tool routing, and structured error recovery. This is the discipline behind Codex and Claude Code. If you build coding agents and can't talk about context management and verification, that's the gap this fills. Covered in Harness Engineering and threaded through the Agents pillar.
Post-training moved from "fine-tune on demonstrations" to "let the agent learn from experience." OpenPipe ART (Agent Reinforcement Trainer) trains agents on real tasks with GRPO — sample a group of trajectories per task, normalize rewards within the group, update the policy, no separate critic. Its headline trick is RULER: instead of hand-writing a reward function, an LLM judges trajectories relative to each other (0–1), which removes the single hardest part of RL — reward engineering. ART-trained small models have matched or beaten frontier models on narrow agent tasks (e.g. email search) at a fraction of the cost. Why it's hot: it's how you make a domain-specific agent get better without retraining a giant base model. Covered in Fine-tuning, Post-training & RL.
System-design interviews now blend classic distributed systems with AI-specific patterns. Resources like system-design-academy track the union: vector DBs and retrieval at scale, multi-agent architectures, LLM evaluation, MCP integration, plus the timeless caching/consistency/load-balancing fundamentals. The questions to be ready for: design a RAG bot over 10M docs, an AI coding agent, an eval platform, a real-time voice assistant. A strong answer covers the data pipeline, retrieval strategy, model routing, inference optimization, guardrails, evals, observability, and cost — the LLM is one box in a large diagram. Covered in AI System Design.
The field's best free signal is on practitioner feeds. Avi Chawla (@_avichawla, Daily Dose of Data Science) posts tight, code-first explainers on RAG, agents, RL, and ML systems — exactly the breadth an interview spans; his recent work includes how top labs build RL agents (GRPO/RULER). @0xcodez posts on agentic coding and the engineering of coding agents. Treat these as a weekly pulse on what's entering the interview canon, then go deep in the pillars.
Know the current tiers and the cost/quality tradeoffs you'd actually reason about in a design round:
| Tier | Examples (2026) | When you'd reach for it |
|---|---|---|
| Flagship | Claude Opus 4.8, GPT-5-class | Hardest reasoning, long-horizon agentic work, the orchestrator's judgment calls. |
| Balanced | Claude Sonnet 4.6 | The default workhorse — strong tool use at a fraction of flagship cost. |
| Fast / cheap | Claude Haiku 4.5 | Bounded sub-tasks: classification, extraction, exploration sub-agents. |
The senior instinct isn't "use the biggest model" — it's route per task: flagship where judgment matters, cheap models for the bounded 80%. That single lever dominates most cost discussions. Covered in Inference, Serving & Scaling.
Skim it before any interview. For each item, you should be able to say what it is in one sentence and why it matters in one more. Then earn the depth in the pillars — the goal of this atlas is that you can both do these things and explain them to the limit of any interviewer's questioning.