← all pillarsPillar · session 6 · in progress

Evaluation & Testing

LLM-as-judge, RAGAS/DeepEval, golden datasets, agent trajectory evals and CI regression suites.

This pillar is on the build list. We're going depth-first — one fully-built pillar per session. Start with Building AI Agents.