Living page · 2026-06-07

Frontier Watch — June 2026

A living map of what's actually moving at the frontier right now, and why each item matters for an interview. Skim it weekly; depth lives in the pillars.

Interview prep rots fast in this field. This page tracks the developments a senior AI engineer is expected to already know about in mid-2026 — not deep dives (those are the pillars), but the "have you heard of this and can you say why it matters" layer that separates current candidates from ones running on 2024 knowledge.

Agentic frontends: CopilotKit & AG-UI

The agent has moved into the product UI. CopilotKit is the leading SDK for in-app copilots and agentic UIs, built on the AG-UI protocol — an open, bidirectional spec for streaming state between an agent backend and a user-facing app. The ideas to be able to name:

  • Generative UI — the agent renders real UI components at runtime, not just text. (Three control levels: tightly-controlled via AG-UI, shared/declarative, and open-ended via MCP-style apps.)
  • CoAgents — multiple agents cooperating inside one application context.
  • Human-in-the-loop — the agent pauses to ask the user before continuing (approvals, edits).
  • Frontend actions — the agent triggers UI-side operations (fill a form, mutate state).

Why it's asked: "design an agentic product" questions now expect you to discuss the frontend contract — how partial state, tool calls, and human approvals stream to a UI — not just the backend loop. Covered in Agentic Frontends & Harness Engineering.

Harness engineering: the model is 20% of the system

The reframing of 2026: reliability comes from the harness around the model, not the model itself. The Learn Harness Engineering curriculum names the failure modes explicitly — context rot (attention degrades as tool output floods the window), premature "victory," and lost intent — and the scaffolding that fixes them: external memory (an AGENTS.md / todo log as the system of record), verification loops (a capable model checks a cheaper model's work), tight tool routing, and structured error recovery. This is the discipline behind Codex and Claude Code. If you build coding agents and can't talk about context management and verification, that's the gap this fills. Covered in Harness Engineering and threaded through the Agents pillar.

RL for agents: OpenPipe ART, GRPO & RULER

Post-training moved from "fine-tune on demonstrations" to "let the agent learn from experience." OpenPipe ART (Agent Reinforcement Trainer) trains agents on real tasks with GRPO — sample a group of trajectories per task, normalize rewards within the group, update the policy, no separate critic. Its headline trick is RULER: instead of hand-writing a reward function, an LLM judges trajectories relative to each other (0–1), which removes the single hardest part of RL — reward engineering. ART-trained small models have matched or beaten frontier models on narrow agent tasks (e.g. email search) at a fraction of the cost. Why it's hot: it's how you make a domain-specific agent get better without retraining a giant base model. Covered in Fine-tuning, Post-training & RL.

System design for AI: the new bar

System-design interviews now blend classic distributed systems with AI-specific patterns. Resources like system-design-academy track the union: vector DBs and retrieval at scale, multi-agent architectures, LLM evaluation, MCP integration, plus the timeless caching/consistency/load-balancing fundamentals. The questions to be ready for: design a RAG bot over 10M docs, an AI coding agent, an eval platform, a real-time voice assistant. A strong answer covers the data pipeline, retrieval strategy, model routing, inference optimization, guardrails, evals, observability, and cost — the LLM is one box in a large diagram. Covered in AI System Design.

High-signal practitioners to follow

The field's best free signal is on practitioner feeds. Avi Chawla (@_avichawla, Daily Dose of Data Science) posts tight, code-first explainers on RAG, agents, RL, and ML systems — exactly the breadth an interview spans; his recent work includes how top labs build RL agents (GRPO/RULER). @0xcodez posts on agentic coding and the engineering of coding agents. Treat these as a weekly pulse on what's entering the interview canon, then go deep in the pillars.

The model landscape, briefly

Know the current tiers and the cost/quality tradeoffs you'd actually reason about in a design round:

Tier Examples (2026) When you'd reach for it
Flagship Claude Opus 4.8, GPT-5-class Hardest reasoning, long-horizon agentic work, the orchestrator's judgment calls.
Balanced Claude Sonnet 4.6 The default workhorse — strong tool use at a fraction of flagship cost.
Fast / cheap Claude Haiku 4.5 Bounded sub-tasks: classification, extraction, exploration sub-agents.

The senior instinct isn't "use the biggest model" — it's route per task: flagship where judgment matters, cheap models for the bounded 80%. That single lever dominates most cost discussions. Covered in Inference, Serving & Scaling.

What to do with this page

Skim it before any interview. For each item, you should be able to say what it is in one sentence and why it matters in one more. Then earn the depth in the pillars — the goal of this atlas is that you can both do these things and explain them to the limit of any interviewer's questioning.