← all pillarsPillar · session 5 · in progress

Inference, Serving & Scaling

vLLM, PagedAttention, KV cache, speculative decoding, quantization (AWQ/FP8), FSDP & tensor/pipeline parallelism.

This pillar is on the build list. We're going depth-first — one fully-built pillar per session. Start with Building AI Agents.