What the top labs are actively pushing right now — the research themes a senior engineer must be able to discuss.
LLM agents are being trained via RL for 100+ turn multi-step tasks with sparse rewards; credit assignment across long horizons remains the core challenge as agents must reason, adapt strategies, and reflect over extended action sequences.
Frontier models achieve frontier-scale knowledge with inference-scale efficiency via sparse MoE routing, mixed-precision quantization, and sub-quadratic attention (e.g., FlashAttention-3); enables GPT-4-level performance at 50x lower inference cost.
Gemini Omni and Nemotron 3 Nano Omni unify video, audio, image, and text in native any-to-any generation; Omni reasons about physics, maintains long-context character consistency, and enables natural-language video editing.
Learned world models enable robots to plan in imagination before acting; DreamerV3 achieves 10-100x data efficiency; foundation models now provide semantic grounding, making embodied AI robotics' hottest 2026 frontier.
Speculative decoding parallelizes draft and verify stages to achieve 3.87x speedup with 84% token acceptance; production stacks combine PagedAttention, INT8 quantization, GQA, and prefix caching for 10-50x request throughput per GPU.
Context windows scaled from 4K (2023) to 1M+ tokens (2026); frontier models must address KV-cache limits, 'lost in the middle' phenomena, and memory compression to actually use extended contexts for agents and RAG systems.
AutoResearch agents autonomously run 700+ ML experiments, discover training optimizations, and iterate model architectures; self-improving AI is 2026's frontier where models begin optimizing their own training and inference pipelines.
The Stanford / CMU / MIT / fast.ai canon plus the best online specializations.
Foundational ML course covering supervised/unsupervised learning, neural networks, and reinforcement learning with rigorous theoretical grounding.
Deep dive into CNNs and vision models with hands-on assignments; essential for understanding visual recognition architectures.
Comprehensive NLP curriculum from word embeddings to Transformers; includes fine-tuning a BERT-style model on SQuAD dataset.
Modern course on building LLMs from first principles: tokenization, architecture, scaling, training, and inference optimization.
Frontier seminar with top researchers (Hinton, Vaswani, Karpathy); covers latest breakthroughs across vision, language, and multimodal transformers.
Rigorous, fast-paced core course systematically covering MLPs, CNNs, RNNs, Attention, optimization, and generalization with weekly quizzes.
Graduate seminar on cutting-edge NLP research; covers modern methods with PyTorch and Hugging Face, culminating in a paper replication project.
Accessible introduction with TensorFlow labs covering vision, NLP, generative models, and RL; includes industry-judged project competition.
Top-down course teaching practical deep learning with PyTorch; produces deployable models by lesson 2, no advanced math required.
Beginner-friendly specialization covering ML fundamentals and practical AI applications; ideal for engineers new to the field.
Free, comprehensive course on LLMs and NLP using Hugging Face ecosystem; recently expanded with fine-tuning and reasoning model chapters.
Graduate course bridging control, RL, and deep learning; covers policy gradients, value functions, model-based RL, and imitation learning.
Cutting-edge course on agentic AI; covers LLM reasoning, code generation, robotics integration, and scientific discovery applications.
Eight-part video series building neural networks from scratch: backprop, makemore, and a complete GPT implementation from first principles.
Canonical RL course covering MDPs, dynamic programming, temporal difference learning, policy gradients, and integration of learning and planning.
Karpathy's Zero-to-Hero, 3Blue1Brown, and the channels that actually teach.
Seven-video series building neural networks from first principles—micrograd backprop, makemore language models, and nanoGPT—essential foundation for understanding how modern LLMs work
1h56m deep dive building a working Transformer language model from empty file following Attention is All You Need paper, connecting theory to runnable code
Comprehensive 4-hour implementation reproducing GPT-2 from scratch including architecture, optimizations, and training with proper hyperparameters—bridge from theory to production
2h13m walkthrough of tokenization and Byte Pair Encoding—often overlooked but critical stage where many LLM quirks originate
3h31m general-audience overview of full LLM training stack—pretraining, fine-tuning, RLHF, and safety—ideal for building mental model of ChatGPT-class systems
2h11m practical guide covering LLM ecosystem, model selection, tool use, and real-world applications—bridges research to practitioner workflows
Active YouTube channel featuring latest research-level content on neural networks, LLMs, and frontier AI engineering topics with consistent quality
Visually stunning explanation of neural network fundamentals using animation—unique pedagogical strength in building geometric intuition
Clearest visual breakdown of attention mechanism—the core innovation in Transformers—using Grant Sanderson's signature animation style
Channel specializing in clear, intuitive explanations of statistics and machine learning concepts using visual demonstrations—builds strong conceptual foundations
Clear explanation of attention mechanism with runnable PyTorch code, translating visual intuition to production implementation
Complete Transformer architecture walkthrough including all layers, matrix multiplications, and training/inference—comprehensive technical reference
Deep implementation of modern LLaMA architecture covering KV cache, grouped query attention, rotary embeddings, and RMSNorm—production-grade knowledge
Detailed explanation of positional encoding with mathematics and intuition—critical component often glossed over in transformer explanations
Comprehensive playlist of deep learning and ML paper breakdowns with visual explanations—stay current with frontier research
Channel dedicated to rigorous paper summaries covering recent ML research papers with critical analysis and implementation discussions
Curated 2-5 minute summaries of cutting-edge research in AI, graphics, and ML with visual demonstrations—efficient way to track frontier research
Graduate-level deep learning course covering CNNs, RNNs, optimization, and modern architectures—authoritative academic treatment with practical grounding
Comprehensive NLP course covering word vectors, language models, transformers, and LLMs with latest research—essential for LLM-focused engineers
Rigorous computer vision course building intuition for CNNs and architectural principles—foundational for understanding vision in multimodal systems
The references worth owning — most of them free and online.
Canonical comprehensive textbook covering linear algebra through advanced deep generative models with complete free online access.
Interactive textbook adopted at 500 universities with runnable code in PyTorch/TensorFlow covering CNNs, RNNs, NLP, and recommender systems.
MIT Press 2023 text curating essential ideas with modern coverage of transformers and diffusion models, available free for students.
Stanford's definitive reference on NLP and computational linguistics with empirical statistical foundations and modern neural approaches.
Production-ready examples with minimal theory progressing from linear regression through deep neural networks using modern frameworks.
Hands-on implementation of GPT-style transformers without relying on existing LLM libraries, covering pretraining, fine-tuning, and instruction-following.
2025 O'Reilly guide to practical AI system design covering prompt engineering, RAG, fine-tuning, agents, and deployment of foundation models.
Holistic framework for ML system design addressing data engineering, feature selection, retraining cadence, and monitoring in production.
2022 comprehensive unifying treatment of modern ML through probabilistic modeling and Bayesian decision theory with online Python code.
Free Cambridge text bridging theory and practice covering linear algebra, optimization, and probability with focus on core ML methods.
Concise 2023 introduction with dense coverage of essential concepts and landmark models for computer vision and NLP, free under CC-BY-NC-SA.
Practical guide by Hugging Face creators on transformer training, fine-tuning for text classification, NER, and QA with distillation and optimization.
The papers every AI engineer is assumed to have read.
Introduced the Transformer architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel training—the foundation of all modern LLMs.
Seminal masked language modeling approach establishing the pre-training paradigm that enables transfer learning in NLP.
GPT-3 paper demonstrating that scale alone enables few-shot learning without task-specific fine-tuning, defining the era of large language models.
Introduces RoPE (Rotary Position Embeddings), enabling superior length extrapolation and becoming standard in modern LLMs like GPT and Claude.
Parameter-efficient fine-tuning via low-rank updates, enabling practical adaptation of multi-billion parameter models with minimal compute.
Combines quantization with LoRA enabling single-GPU fine-tuning of 65B+ models, democratizing access to large model customization.
IO-aware attention algorithm reducing memory and computation via blocking, critical for efficient transformer training and inference.
Chinchilla paper establishing scaling laws showing optimal model-to-data allocation for given compute budgets, guiding efficient LLM training.
InstructGPT paper introducing RLHF (Reinforcement Learning from Human Feedback), the alignment method underlying ChatGPT and modern LLMs.
Self-improving alignment via principle-based AI feedback, scaling RLHF without expensive human annotation while maintaining safety.
DPO simplifies preference alignment by eliminating separate reward model training, matching or exceeding RLHF with simpler methodology.
Foundation for RAG pattern combining neural retrieval with generation, enabling knowledge grounding and reducing hallucination.
Dense passage retrieval with late interaction scoring, enabling efficient semantic search crucial for RAG and information retrieval systems.
vLLM's PagedAttention algorithm enabling memory-efficient batch serving via paged KV cache allocation, standard in production LLM serving.
Sparse mixture-of-experts achieving trillion-parameter scale with efficient routing, demonstrating alternative scaling path via sparsity.
State-space model alternative to transformers achieving linear complexity while maintaining strong long-context performance.
2024–2026 work defining reasoning, agentic RL, interpretability, and efficiency.
Demonstrates pure RL (GRPO) can induce emergent chain-of-thought reasoning without supervised reasoning examples, achieving o1-parity on math/coding benchmarks.
671B MoE with auxiliary-loss-free load balancing and multi-token prediction, achieving GPT-4/Claude 3.5 parity at 5.6M training cost.
Technical safety and capability documentation of o1, the first frontier reasoning model at inference-time compute scale achieving state-of-the-art on math/science benchmarks.
First unsupervised 11B world model trained on unlabeled internet videos, generating interactive 2D game environments from single images.
Enables training 100M+ token sequences via distributed blockwise attention with full GPU overlap, eliminating per-device memory constraints.
H100 attention optimization achieving 90%+ GPU utilization through async computation and low-precision, enabling efficient long-context inference.
Scales sparse autoencoders to production models, extracting monosemantic features from Claude 3.5 that respond to and causally control model behavior.
405B dense model with 128K context matching GPT-4, establishing open-source frontier baseline with multimodal, coding, and reasoning capabilities.
235B MoE with unified thinking/non-thinking modes, 85.7 on AIME'24 and competitive with o1/o3 on reasoning benchmarks.
Formalizes inference-time compute scaling laws, showing smaller models with test-time compute offer Pareto-optimal cost/performance trade-offs.
Extends sparse autoencoders to VLMs like CLIP, extracting monosemantic visual features for interpretability across modalities.
Theoretical framework showing debate's advantage over RLAIF scales with knowledge divergence via phase transitions, enabling superhuman AI oversight.
7B self-supervised vision transformer achieving SOTA across diverse downstream tasks without fine-tuning, outperforming specialized vision models.
The primary sources — Anthropic, OpenAI, DeepMind, Meta, and the great explainers.
Primary source for Anthropic's constitutional AI, RLHF, and Claude scaling research with direct access to published papers and research updates.
Deep operational knowledge on building reliable AI systems at scale, including agents, tool use, evals, and MCP—what production teams actually encounter.
Frontier mechanistic interpretability research on sparse autoencoders, circuit analysis, and reverse-engineering transformer internals—cutting-edge monosemantic feature extraction.
Evidence-based analysis of frontier AI risks for cybersecurity, biosecurity, and autonomous systems—national security implications of frontier models.
Official OpenAI research publications including GPT-series scaling laws, RLHF, multimodal models, and reasoning-focused frontier models.
Major breakthroughs in AlphaFold, Gemini frontier models, multimodal reasoning, and scientific AI applications across biology and mathematics.
Meta's foundational model research, Llama open-source models, computer vision, and robotics—bridging research to production at scale.
Broad AI research spanning foundation models, agents, quantum computing, and enterprise AI applications with theoretical depth.
Infrastructure and optimization for AI: inference performance, physical AI, robotics acceleration, and GPU-specific model deployments.
Frontier open-source and commercial model releases including physics AI, multimodal research, and efficient transformer architectures.
Frontier reasoning models and efficient scaling breakthroughs—low-cost training innovations and long-context (1M+ tokens) architectural advances.
Canonical visual explainer of transformer architecture and attention mechanisms—most accessible introduction to core concepts for engineers.
Authoritative learning notes on diffusion models, agents, reinforcement learning, and LLM reasoning—rigorous technical explanations with breadth.
Academic research from 100+ grad students across vision, NLP, RL, robotics, and cross-cutting themes like human-compatible AI.
Practical insights on AI evaluation, safety evals, RL environment design, and data infrastructure for training frontier models.
Minimal, from-scratch implementations of backprop, CNNs, and GPT—pedagogical deep dives preferred by engineers building intuition.
Docs and tools for agents, RAG, fine-tuning, eval, and serving.
Official reference for Claude API covering models, tool use, streaming, prompt caching, batch processing, and vision capabilities.
Anthropic-authored guide to agentic coding workflows with best practices for prompt design, file context management, and autonomous automation.
Official Jupyter notebooks demonstrating tool use, agents, RAG, vision, and production patterns with runnable examples.
Free self-paced courses from Anthropic engineers on Claude API, Code, MCP, and agents with official certificates.
Deployable reference projects including customer support agents, financial analysts, computer use, and autonomous coding agents.
Open standard for connecting AI systems to data sources and tools with server/client architecture and JSON-RPC protocol spec.
Framework for building stateful, multi-step agents with human-in-the-loop, durability, and comprehensive memory.
Data framework for RAG with 300+ integrations, structured ingestion, and advanced retrieval with agents.
High-throughput inference serving engine with distributed parallelism, paged attention, and production-grade serving.
End-to-end training library with SFT, DPO, GRPO, and PEFT integration for efficient model fine-tuning.
Lightweight framework for multi-agent workflows with sandbox agents, tool execution, and stateful conversations.
Production-ready agent framework for TypeScript with tool loops, MCP support, and real-time voice capabilities.
Official examples and recipes for function calling, fine-tuning, embeddings, vision, and production patterns.
Declarative programming framework for optimizing LLM pipelines with structured signatures and automatic prompt compilation.
Reference-free evaluation metrics for RAG systems measuring faithfulness, relevance, context precision, and recall.
Pytest-style LLM evaluation framework with 50+ metrics, component-level evals, and CI/CD integration.
Framework-agnostic observability with tracing, evals, dashboards, and multi-SDK support for production monitoring.
End-to-end ML platform with automatic LLM tracing, cost tracking, evaluation scoring, and experiment comparison.
Frameworks, coding agents, serving runtimes, retrieval, and eval repos — wire them together and you've built what frontier labs hire for. None of it is behind a paywall.
Lightning-fast Python framework for orchestrating autonomous agent crews and event-driven flows with first-class multi-agent autonomy.
Production-grade agentic AI framework emphasizing type-safe, validated agent behaviors with multi-provider LLM support and structured outputs.
Enterprise SDK for building and running production agent platforms with storage, observability, human approval, RBAC, and 100+ tool integrations.
Stateful agent platform with advanced memory management enabling long-term learning and self-improvement over time.
Multi-agent framework for building AI software companies and autonomous development teams with natural language programming capabilities.
Modern TypeScript framework for AI-powered agents with model routing, autonomous workflows, human-in-the-loop, and production observability.
Tool integration platform powering 1000+ toolkits for agents with context management, authentication, sandboxed execution, and framework-agnostic SDKs.
Barebones agent library emphasizing code-based thinking with sandboxed execution and minimal dependencies for lightweight agentic systems.
Production-grade multi-language framework for orchestrating complex agent workflows with standardized patterns, observability, and enterprise features.
Event-driven async orchestration framework specialized for document-centric agent workflows with production-grade scaling and stateful execution.
Model-agnostic SDK for building AI agents and orchestrating multi-agent workflows across Python, .NET, and Java with plugin architecture.
Vision for accessible autonomous agents with platform, forge framework, and benchmark tools for building and evaluating agentic systems.
Autonomous agent platform with multi-interface access (CLI, GUI, SDK) for end-to-end code execution and codebase modification, MIT-licensed and Series A funded.
GitHub issue resolver that autonomously fixes bugs with any LLM, built by Princeton/Stanford researchers, featured at NeurIPS 2024; mini-swe-agent is recommended for simplicity.
Terminal-native AI pair programmer with full codebase mapping, git integration, and 45k stars; works with Claude, GPT, and local LLMs.
Open-source AI coding agent (5M+ VS Code installs) with SDK, IDE extensions, and CLI; autonomous file editing, command execution, and real-time error monitoring across platforms.
Open-source IDE extension (VS Code, JetBrains) with source-controlled AI checks enforceable in CI/CD, 33k stars, supports 15+ model providers.
Terminal-native persistent autonomous agent (since 2023) with code writing, terminal access, web browsing, and MCP server integration; works with any LLM provider.
Extensible AI agent built in Rust for executing, testing, and building complete projects; 47k stars, 15+ LLM providers, 70+ MCP extensions, moved to AAIF at Linux Foundation.
High-adoption open-source coding agent (171k stars, 7.5M monthly developers) with terminal, desktop, and IDE integration; plan and build agents with privacy-first architecture.
Gold-standard benchmark for evaluating autonomous code agents on real GitHub issues; 2,294 tasks, verified subset with 500 human-annotated instances.
Benchmark for evaluating agents on hard terminal tasks (89 curated tasks, ICLR 2026); supports Claude Code, OpenHands, SWE-agent, and mini-swe-agent.
Enterprise-grade secure sandbox runtime for AI code execution (90ms startup, Firecracker VMs); Python/TS SDKs, 12.5k stars, widely used by agent platforms.
Secure elastic sandbox infrastructure for AI code execution with stateful snapshots, 72k stars, multi-language SDKs (TS, Python, Ruby, Go, Java); AGPL-licensed.
TypeScript framework for building multi-agent networks with deterministic routing, shared state, and MCP integration; Apache 2.0, 884 stars.
Reference-free evaluation framework for LLM applications with automatic test generation; 14.3k stars, widely used for agent and RAG system evaluation.
The gold-standard library for extracting structured outputs from any LLM via Pydantic models with zero boilerplate, trusted by 100k+ developers at OpenAI, Google, Microsoft.
Efficient programming paradigm for steering LLM output with constrained generation, conditionals, and loops seamlessly integrated; reduces latency and cost vs conventional prompting.
Fast, provider-agnostic structured generation library using regex and context-free grammars to enforce JSON/structured outputs with microsecond-level latency overhead.
DSL for reliable tool-calling and structured outputs with fallback policies, multi-model switching, and schema-aligned parsing that works even without native LLM tool support.
High-performance LLM serving framework with native constrained decoding via compressed FSM for structured outputs (JSON/regex/grammar) with near-zero overhead and 3x faster JSON decoding.
Universal gateway for 100+ LLM providers (OpenAI, Anthropic, Gemini, etc.) with unified structured outputs API, cost tracking, and load balancing for production agents.
Pydantic AI-native framework for declarative structured extraction, classification, and generation workflows with deep integration into type-safe Python patterns.
Production-grade Python bindings for local LLM inference with OpenAI API compatibility, enabling on-device structured outputs and agent serving without external dependencies.
Efficient JSON generation by only delegating content token prediction to the LLM while auto-filling fixed tokens, reducing latency and improving reliability for structured outputs.
Simplest path to run any open-source LLM locally with REST API; no GPU required, production-ready with 173k stars
De facto standard for LLM inference in C/C++; foundation of Ollama, LM Studio, and most local inference tools; minimal dependencies, cross-hardware support
NVIDIA-optimized serving with state-of-the-art GPU kernels and multi-GPU orchestration; critical for production LLM inference on NVIDIA hardware
Fine-tuning optimization achieving 2x speedup and 70% VRAM reduction with no accuracy loss; dual-interface (Studio UI + code API)
Unified fine-tuning framework supporting 100+ models with SFT, LoRA, QLoRA, and preference tuning; multimodal training support
Unified efficient fine-tuning of 100+ LLMs & VLMs with support for SFT, RLHF, DPO, and process reward models; production-tested at scale
Comprehensive toolkit for LLM serving with compression, quantization, and dynamic batching; 1.8x higher throughput than vLLM per the maintainers
Meta's composable framework providing OpenAI-compatible APIs with pluggable backends (Ollama, vLLM, managed services); run-anywhere deployment
Modular local inference engine supporting LLMs, vision, voice, images with minimal dependencies; wraps llama.cpp, vLLM, whisper.cpp as needed
Enterprise RAG orchestration framework with modular pipelines, retrieval routing, and multi-stage ranking — production-ready for complex retrieval workflows.
Lightweight embeddings database with automatic tokenization and vectorization — fastest path to RAG for prototypes and small-scale systems.
High-performance vector database with sparse/dense/multivector search, 97% memory reduction via quantization, and production-grade filtering.
Cloud-native vector database combining semantic search with structured filtering, built-in RAG pipelines, and multi-tenancy for enterprise scale.
Distributed vector database scaling to billions of vectors with GPU acceleration, native sparse vectors (BM25/SPLADE), and hybrid search in a single engine.
PostgreSQL extension enabling vector similarity search while retaining ACID compliance, JOINs, and point-in-time recovery in your existing database.
Production vector database combining vector search with full-text, filtering, and aggregations in one query — enterprise standard for hybrid RAG.
Late-interaction (ColBERT) retrieval trainer and inference — domain-generalizing retrieval alternative to dense embeddings with zero-shot robustness.
All-in-one embeddings DB combining vector search, sparse indexing, SQL, and LLM orchestration — minimal overhead for semantic search workflows.
AI search platform handling vectors, tensors, and structured data at scale with ML model inference at query time — for complex ranking and relevance.
Knowledge graph extraction and graph-based RAG for complex reasoning — structures unstructured text into queryable knowledge graphs for nuanced retrieval.
Standard library for computing and training embeddings with 15k+ pretrained models — essential backbone for all dense retrieval and semantic search.
Complete LLM observability platform with tracing, evals, prompt management, and metrics dashboards for production agents.
Enterprise-grade LLM observability and evaluation platform with drift detection, retrieval quality scoring, and trace analytics.
Lightweight LLM observability platform offering cost monitoring, request tracking, and experimentation without code changes.
Red-teaming and prompt testing framework with adversarial evaluation, security scanning, and CI/CD integration for agents and RAGs.
Structured output and validation framework ensuring LLM outputs conform to guardrails, schemas, and safety constraints.
NVIDIA's toolkit for enforcing guardrails on LLMs via topical boundaries, content filtering, and behavioral constraints.
Automated evaluation and testing library for LLM agents detecting performance regressions, hallucinations, and robustness gaps.
Framework for building web-automation agents with DOM interaction, JavaScript execution, and cross-site navigation capabilities.
SDK for orchestrating browser agents with reliable screenshot-based navigation, JavaScript isolation, and debugging tools.
Natural language code interpreter enabling agents to execute Python/shell/JavaScript locally with sandboxed execution.
The feeds that keep you current between model releases.
Curated technical deep dives into LLM architectures, research paper roundups, and state-of-the-art reviews from a seasoned researcher building at the frontier.
Daily byte-sized insights on machine learning, data science tools, and untold observations that make the data science lifecycle less intimidating.
Curated weekly report on the most important AI research and industry-shaping events for engineers and business leaders to act on what matters.
Insider technical analysis of frontier AI model training and post-training from a researcher actively shipping at scale, with original work on RLHF methodologies.
Pragmatic guidance on recommendation systems, LLMs, and AI product development from an ML engineer who has scaled teams at Amazon and Anthropic.
Monthly essays on AI engineering, system design, and production MLOps from the author of 'Designing Machine Learning Systems' and 'AI Engineering.'
Rigorous, independent technical analysis of AI tools, SQLite, Datasette, and pragmatic takes on LLMs in production from a Django co-creator.
185K-subscriber newsletter + weekly podcast diving deep into how frontier labs build agents, models, and infrastructure with interviews from the builders themselves.
Field-tested insights on evals, error analysis, and improving AI products in production from an engineer who helps teams move past prototype stage.
Applied AI essays on RAG, open source, and building AI systems in production from a DX engineer with deep hands-on experience.
Weekly deep dives into cutting-edge AI research papers with analysis of technical breakthroughs and implications, including sci-fi explorations of impact.
Technical insights from a pioneer of deep learning and LLMs, now at the frontier of pre-training at Anthropic with 39K+ subscribers.
1.1M+ subscriber deep dive into Big Tech and startup engineering practices, with rigorous analysis of AI engineering trends from the inside.
A Berkeley researcher's longform set of RL interview questions for 2026 — the RLHF/PPO/GRPO/DPO post-training territory frontier labs probe in recruiting.