The library · verified June 2026

Everything worth reading.

The courses, lectures, books, papers, and lab blogs a frontier-lab engineer would actually point you to — plus the live frontier directions the top labs are pushing right now. Every link verified.

197 of 197

Frontier directions (June 2026)

What the top labs are actively pushing right now — the research themes a senior engineer must be able to discuss.

Frontierfrontier

Agentic RL & Long-Horizon Training

Anthropic, OpenAI, DeepMind, Apple ML Research

LLM agents are being trained via RL for 100+ turn multi-step tasks with sparse rewards; credit assignment across long horizons remains the core challenge as agents must reason, adapt strategies, and reflect over extended action sequences.

#agentic-AI#reinforcement-learning#credit-assignment#long-horizon

Frontierfrontier

Efficiency: MoE, Quantization & Sub-Quadratic Attention

Hugging Face, Pangu, NVIDIA, Meta

Frontier models achieve frontier-scale knowledge with inference-scale efficiency via sparse MoE routing, mixed-precision quantization, and sub-quadratic attention (e.g., FlashAttention-3); enables GPT-4-level performance at 50x lower inference cost.

#efficiency#mixture-of-experts#quantization#attention

Frontierfrontier

Multimodal & Omni-Directional Models

Google DeepMind, NVIDIA, OpenAI, Anthropic

Gemini Omni and Nemotron 3 Nano Omni unify video, audio, image, and text in native any-to-any generation; Omni reasons about physics, maintains long-context character consistency, and enables natural-language video editing.

#multimodal#video-generation#omni-models#physics-understanding

Frontierfrontier

World Models & Embodied Agents

DeepMind, Stanford, MIT, Carnegie Mellon, Apple ML Research

Learned world models enable robots to plan in imagination before acting; DreamerV3 achieves 10-100x data efficiency; foundation models now provide semantic grounding, making embodied AI robotics' hottest 2026 frontier.

#world-models#robotics#embodied-AI#sim-to-real

Frontierfrontier

Inference-Time Optimization & Speculative Decoding

Google DeepMind, Together AI, Hugging Face

Speculative decoding parallelizes draft and verify stages to achieve 3.87x speedup with 84% token acceptance; production stacks combine PagedAttention, INT8 quantization, GQA, and prefix caching for 10-50x request throughput per GPU.

#inference-optimization#speculative-decoding#throughput#latency

Frontierfrontier

Long-Context Windows & Memory Architectures

Google, Anthropic, OpenAI, Meta

Context windows scaled from 4K (2023) to 1M+ tokens (2026); frontier models must address KV-cache limits, 'lost in the middle' phenomena, and memory compression to actually use extended contexts for agents and RAG systems.

#context-length#memory#KV-cache#long-form-reasoning

Frontierfrontier

AI for AI Research & Self-Improvement

Andrej Karpathy, OpenAI, Google DeepMind

AutoResearch agents autonomously run 700+ ML experiments, discover training optimizations, and iterate model architectures; self-improving AI is 2026's frontier where models begin optimizing their own training and inference pipelines.

#AutoML#meta-learning#neural-architecture-search#research-agents

Courses

The Stanford / CMU / MIT / fast.ai canon plus the best online specializations.

Coursefoundational

CS229: Machine Learning

Stanford University (Tengyu Ma, Chris Ré)

Foundational ML course covering supervised/unsupervised learning, neural networks, and reinforcement learning with rigorous theoretical grounding.

#machine-learning#theory#algorithms#foundations

Courseintermediate

CS231n: Deep Learning for Computer Vision

Stanford University

Deep dive into CNNs and vision models with hands-on assignments; essential for understanding visual recognition architectures.

#computer-vision#convolutional-networks#deep-learning#practical

Courseintermediate

CS224n: Natural Language Processing with Deep Learning

Stanford University (Diyi Yang, Yejin Choi)

Comprehensive NLP curriculum from word embeddings to Transformers; includes fine-tuning a BERT-style model on SQuAD dataset.

#nlp#transformers#embeddings#language-models

CS336: Language Modeling from Scratch

Stanford University (Tatsunori Hashimoto, Percy Liang)

Modern course on building LLMs from first principles: tokenization, architecture, scaling, training, and inference optimization.

#language-models#llms#transformers#from-scratch

CS25: Transformers United

Stanford University (Steven Feng, Christopher Manning, et al)

Frontier seminar with top researchers (Hinton, Vaswani, Karpathy); covers latest breakthroughs across vision, language, and multimodal transformers.

#transformers#frontier#research#multimodal

Coursefoundational

11-785: Introduction to Deep Learning

Carnegie Mellon University (Bhiksha Raj)

Rigorous, fast-paced core course systematically covering MLPs, CNNs, RNNs, Attention, optimization, and generalization with weekly quizzes.

#deep-learning#foundations#rigorous#architectures

11-711: Advanced Natural Language Processing

Carnegie Mellon University (Graham Neubig)

Graduate seminar on cutting-edge NLP research; covers modern methods with PyTorch and Hugging Face, culminating in a paper replication project.

#nlp#research#advanced#language-understanding

Coursefoundational

6.S191: Introduction to Deep Learning

MIT (Alexander Amini, Ava Amini)

Accessible introduction with TensorFlow labs covering vision, NLP, generative models, and RL; includes industry-judged project competition.

#deep-learning#applications#hands-on#tfx

Courseintermediate

Practical Deep Learning for Coders

fast.ai (Jeremy Howard, Rachel Thomas)

Top-down course teaching practical deep learning with PyTorch; produces deployable models by lesson 2, no advanced math required.

#practical#applied#pytorch#hands-on

Coursefoundational

Machine Learning Specialization

DeepLearning.AI & Stanford Online (Andrew Ng)

Beginner-friendly specialization covering ML fundamentals and practical AI applications; ideal for engineers new to the field.

#machine-learning#beginner#specialization#andrew-ng

Courseintermediate

Free, comprehensive course on LLMs and NLP using Hugging Face ecosystem; recently expanded with fine-tuning and reasoning model chapters.

#llms#nlp#hugging-face#transformers

CS285: Deep Reinforcement Learning

UC Berkeley (Sergey Levine)

Graduate course bridging control, RL, and deep learning; covers policy gradients, value functions, model-based RL, and imitation learning.

#reinforcement-learning#control#advanced#algorithms

CS294/194-196: Large Language Model Agents

UC Berkeley (Dawn Song, Xinyun Chen)

Cutting-edge course on agentic AI; covers LLM reasoning, code generation, robotics integration, and scientific discovery applications.

#llm-agents#reasoning#code-generation#frontier

Coursefoundational

Neural Networks: Zero to Hero

Andrej Karpathy

Eight-part video series building neural networks from scratch: backprop, makemore, and a complete GPT implementation from first principles.

#from-scratch#neural-networks#transformers#educational

Reinforcement Learning (UCL/DeepMind)

David Silver (DeepMind)

Canonical RL course covering MDPs, dynamic programming, temporal difference learning, policy gradients, and integration of learning and planning.

#reinforcement-learning#theory#foundational-rl#deepmind

Video lectures

Karpathy's Zero-to-Hero, 3Blue1Brown, and the channels that actually teach.

Coursefoundational

Neural Networks: Zero to Hero

Andrej Karpathy

Seven-video series building neural networks from first principles—micrograd backprop, makemore language models, and nanoGPT—essential foundation for understanding how modern LLMs work

#neural-networks#backpropagation#from-scratch#foundational

Videointermediate

Let's build GPT: from scratch, in code, spelled out

Andrej Karpathy

1h56m deep dive building a working Transformer language model from empty file following Attention is All You Need paper, connecting theory to runnable code

#transformers#gpt#attention#coding

Let's reproduce GPT-2 (124M)

Andrej Karpathy

Comprehensive 4-hour implementation reproducing GPT-2 from scratch including architecture, optimizations, and training with proper hyperparameters—bridge from theory to production

#gpt-2#training#optimization#full-stack

Videointermediate

Let's build the GPT Tokenizer

Andrej Karpathy

2h13m walkthrough of tokenization and Byte Pair Encoding—often overlooked but critical stage where many LLM quirks originate

#tokenization#bpe#llm-fundamentals

Videointermediate

Deep Dive into LLMs like ChatGPT

Andrej Karpathy

3h31m general-audience overview of full LLM training stack—pretraining, fine-tuning, RLHF, and safety—ideal for building mental model of ChatGPT-class systems

#llms#training-pipeline#rlhf#system-design

Videointermediate

Andrej Karpathy

2h11m practical guide covering LLM ecosystem, model selection, tool use, and real-world applications—bridges research to practitioner workflows

#llm-tools#prompting#ecosystem#practical

Andrej Karpathy

Andrej Karpathy

Active YouTube channel featuring latest research-level content on neural networks, LLMs, and frontier AI engineering topics with consistent quality

#channel#frontier#ai-engineering

Coursefoundational

Neural Networks

Visually stunning explanation of neural network fundamentals using animation—unique pedagogical strength in building geometric intuition

#neural-networks#visualization#geometry#foundational

Videointermediate

Attention in transformers, visually explained

Clearest visual breakdown of attention mechanism—the core innovation in Transformers—using Grant Sanderson's signature animation style

#attention#transformers#visualization

Coursefoundational

StatQuest with Josh Starmer

Channel specializing in clear, intuitive explanations of statistics and machine learning concepts using visual demonstrations—builds strong conceptual foundations

#channel#statistics#machine-learning#intuition

Videointermediate

Attention in Transformers: Concepts and Code in PyTorch

Josh Starmer (StatQuest)

Clear explanation of attention mechanism with runnable PyTorch code, translating visual intuition to production implementation

#attention#transformers#pytorch#code

Attention is all you need (Transformer) - Model explanation

Complete Transformer architecture walkthrough including all layers, matrix multiplications, and training/inference—comprehensive technical reference

#transformers#attention#architecture#math

Coding LLaMA 2 from scratch in PyTorch

Deep implementation of modern LLaMA architecture covering KV cache, grouped query attention, rotary embeddings, and RMSNorm—production-grade knowledge

#llama#architecture#pytorch#optimization

Videointermediate

Transformers From Scratch - Part 1: Positional Encoding

Detailed explanation of positional encoding with mathematics and intuition—critical component often glossed over in transformer explanations

#transformers#positional-encoding#math#fundamentals

Papers Explained

Comprehensive playlist of deep learning and ML paper breakdowns with visual explanations—stay current with frontier research

#papers#research#deep-learning

Channel dedicated to rigorous paper summaries covering recent ML research papers with critical analysis and implementation discussions

#channel#papers#research#criticism

Courseintermediate

Two Minute Papers

Károly Zsolnai-Fehér

Curated 2-5 minute summaries of cutting-edge research in AI, graphics, and ML with visual demonstrations—efficient way to track frontier research

#channel#research#frontier#visual-summaries

Courseintermediate

Stanford CS230: Deep Learning

Stanford Online

Graduate-level deep learning course covering CNNs, RNNs, optimization, and modern architectures—authoritative academic treatment with practical grounding

#deep-learning#course#stanford#comprehensive

Courseintermediate

Stanford CS224N: Natural Language Processing with Deep Learning

Stanford Online

Comprehensive NLP course covering word vectors, language models, transformers, and LLMs with latest research—essential for LLM-focused engineers

#nlp#transformers#language-models#stanford

Courseintermediate

Stanford CS231N: Deep Learning for Computer Vision

Stanford Online

Rigorous computer vision course building intuition for CNNs and architectural principles—foundational for understanding vision in multimodal systems

#computer-vision#cnns#stanford#architecture

Books

The references worth owning — most of them free and online.

Bookfoundational

Ian Goodfellow, Yoshua Bengio, Aaron Courville

Canonical comprehensive textbook covering linear algebra through advanced deep generative models with complete free online access.

#deep-learning#fundamentals#math

Bookintermediate

Dive into Deep Learning

Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola

Interactive textbook adopted at 500 universities with runnable code in PyTorch/TensorFlow covering CNNs, RNNs, NLP, and recommender systems.

#deep-learning#code-first#multi-framework

Bookintermediate

Understanding Deep Learning

Simon J.D. Prince

MIT Press 2023 text curating essential ideas with modern coverage of transformers and diffusion models, available free for students.

#deep-learning#modern-architectures#accessible

Bookfoundational

Speech and Language Processing (3rd Edition Draft)

Dan Jurafsky, James H. Martin

Stanford's definitive reference on NLP and computational linguistics with empirical statistical foundations and modern neural approaches.

#nlp#fundamentals#language-models

Bookintermediate

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (3rd Edition)

Aurélien Géron

Production-ready examples with minimal theory progressing from linear regression through deep neural networks using modern frameworks.

#machine-learning#practical#implementation

Build a Large Language Model (From Scratch)

Sebastian Raschka

Hands-on implementation of GPT-style transformers without relying on existing LLM libraries, covering pretraining, fine-tuning, and instruction-following.

#llm#transformers#from-scratch

AI Engineering: Building Applications with Foundation Models

2025 O'Reilly guide to practical AI system design covering prompt engineering, RAG, fine-tuning, agents, and deployment of foundation models.

#ai-systems#foundation-models#engineering-practices

Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications

Holistic framework for ML system design addressing data engineering, feature selection, retraining cadence, and monitoring in production.

#ml-systems#production#engineering

Bookintermediate

Probabilistic Machine Learning: An Introduction

Kevin P. Murphy

2022 comprehensive unifying treatment of modern ML through probabilistic modeling and Bayesian decision theory with online Python code.

#probabilistic-ml#bayesian#deep-learning

Bookfoundational

Mathematics for Machine Learning

Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong

Free Cambridge text bridging theory and practice covering linear algebra, optimization, and probability with focus on core ML methods.

#mathematics#fundamentals#free

Bookfoundational

The Little Book of Deep Learning

François Fleuret

Concise 2023 introduction with dense coverage of essential concepts and landmark models for computer vision and NLP, free under CC-BY-NC-SA.

#deep-learning#computer-vision#nlp

Natural Language Processing with Transformers (Revised Edition)

Lewis Tunstall, Leandro von Werra, Thomas Wolf

Practical guide by Hugging Face creators on transformer training, fine-tuning for text classification, NER, and QA with distillation and optimization.

#nlp#transformers#hugging-face

Seminal papers

The papers every AI engineer is assumed to have read.

Paperfoundational

Attention Is All You Need

Ashish Vaswani et al.

Introduced the Transformer architecture based entirely on attention mechanisms, eliminating recurrence and enabling parallel training—the foundation of all modern LLMs.

#transformers#attention#architecture

Paperfoundational

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin et al.

Seminal masked language modeling approach establishing the pre-training paradigm that enables transfer learning in NLP.

#pre-training#bidirectional#NLP

Paperfoundational

Language Models are Few-Shot Learners

Tom B. Brown et al.

GPT-3 paper demonstrating that scale alone enables few-shot learning without task-specific fine-tuning, defining the era of large language models.

#GPT-3#few-shot#scale

Paperfoundational

RoFormer: Enhanced Transformer with Rotary Position Embedding

Jianlin Su et al.

Introduces RoPE (Rotary Position Embeddings), enabling superior length extrapolation and becoming standard in modern LLMs like GPT and Claude.

#position-embeddings#RoPE#extrapolation

Paperfoundational

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu et al.

Parameter-efficient fine-tuning via low-rank updates, enabling practical adaptation of multi-billion parameter models with minimal compute.

#fine-tuning#efficiency#adaptation

Paperfoundational

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers et al.

Combines quantization with LoRA enabling single-GPU fine-tuning of 65B+ models, democratizing access to large model customization.

#quantization#fine-tuning#memory-efficiency

Paperfoundational

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

IO-aware attention algorithm reducing memory and computation via blocking, critical for efficient transformer training and inference.

#attention#efficiency#performance

Paperfoundational

Training Compute-Optimal Large Language Models

Jordan Hoffmann et al.

Chinchilla paper establishing scaling laws showing optimal model-to-data allocation for given compute budgets, guiding efficient LLM training.

#scaling-laws#compute#training

Paperfoundational

Training language models to follow instructions with human feedback

Long Ouyang et al.

InstructGPT paper introducing RLHF (Reinforcement Learning from Human Feedback), the alignment method underlying ChatGPT and modern LLMs.

#RLHF#alignment#instruction-tuning

Paperfoundational

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai et al.

Self-improving alignment via principle-based AI feedback, scaling RLHF without expensive human annotation while maintaining safety.

#alignment#AI-feedback#constitutional-ai

Paperfoundational

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov et al.

DPO simplifies preference alignment by eliminating separate reward model training, matching or exceeding RLHF with simpler methodology.

#alignment#preference-learning#DPO

Paperfoundational

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis et al.

Foundation for RAG pattern combining neural retrieval with generation, enabling knowledge grounding and reducing hallucination.

#retrieval#generation#RAG

Paperfoundational

ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT

Omar Khattab & Matei Zaharia

Dense passage retrieval with late interaction scoring, enabling efficient semantic search crucial for RAG and information retrieval systems.

#retrieval#dense-search#embeddings

Paperfoundational

Efficient Memory Management for Large Language Model Serving with PagedAttention

Woosuk Kwon et al.

vLLM's PagedAttention algorithm enabling memory-efficient batch serving via paged KV cache allocation, standard in production LLM serving.

#inference#serving#memory-efficiency

Paperfoundational

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

William Fedus et al.

Sparse mixture-of-experts achieving trillion-parameter scale with efficient routing, demonstrating alternative scaling path via sparsity.

#mixture-of-experts#sparsity#scaling

Paperfoundational

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu & Tri Dao

State-space model alternative to transformers achieving linear complexity while maintaining strong long-context performance.

#state-space-models#linear-complexity#alternatives

Frontier papers

2024–2026 work defining reasoning, agentic RL, interpretability, and efficiency.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Demonstrates pure RL (GRPO) can induce emergent chain-of-thought reasoning without supervised reasoning examples, achieving o1-parity on math/coding benchmarks.

#RL-for-reasoning#GRPO#test-time-compute

DeepSeek-V3 Technical Report

671B MoE with auxiliary-loss-free load balancing and multi-token prediction, achieving GPT-4/Claude 3.5 parity at 5.6M training cost.

#mixture-of-experts#scaling#MLA

OpenAI o1 System Card

Technical safety and capability documentation of o1, the first frontier reasoning model at inference-time compute scale achieving state-of-the-art on math/science benchmarks.

#reasoning-models#test-time-compute#safety

Genie: Generative Interactive Environments

Google DeepMind

First unsupervised 11B world model trained on unlabeled internet videos, generating interactive 2D game environments from single images.

#world-models#generative-models#unsupervised

Ring Attention with Blockwise Transformers for Near-Infinite Context

Hao Liu, Matei Zaharia, Pieter Abbeel

Enables training 100M+ token sequences via distributed blockwise attention with full GPU overlap, eliminating per-device memory constraints.

#long-context#distributed-training#attention

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Jay Shah, Ganesh Bikshandi, Tri Dao et al.

H100 attention optimization achieving 90%+ GPU utilization through async computation and low-precision, enabling efficient long-context inference.

#attention-optimization#hardware-efficiency#inference

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

Scales sparse autoencoders to production models, extracting monosemantic features from Claude 3.5 that respond to and causally control model behavior.

#mechanistic-interpretability#sparse-autoencoders#feature-extraction

The Llama 3 Herd of Models

Meta (Dubey, Grattafiori et al.)

405B dense model with 128K context matching GPT-4, establishing open-source frontier baseline with multimodal, coding, and reasoning capabilities.

#dense-models#scaling#multilingual

Qwen3 Technical Report

Alibaba Qwen Team

235B MoE with unified thinking/non-thinking modes, 85.7 on AIME'24 and competitive with o1/o3 on reasoning benchmarks.

#mixture-of-experts#reasoning-modes#frontier

Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models

Yangzhen Wu, Zhiqing Sun, Sean Welleck, Yiming Yang

Formalizes inference-time compute scaling laws, showing smaller models with test-time compute offer Pareto-optimal cost/performance trade-offs.

#scaling-laws#test-time-compute#inference-optimization

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot, Serge Belongie, Zeynep Akata

Extends sparse autoencoders to VLMs like CLIP, extracting monosemantic visual features for interpretability across modalities.

#mechanistic-interpretability#vision-language#sparse-autoencoders

Knowledge Divergence and the Value of Debate for Scalable Oversight

Theoretical framework showing debate's advantage over RLAIF scales with knowledge divergence via phase transitions, enabling superhuman AI oversight.

#scalable-oversight#debate#alignment

DINOv3: Self-supervised Learning for Vision at Unprecedented Scale

Meta AI Research

7B self-supervised vision transformer achieving SOTA across diverse downstream tasks without fine-tuning, outperforming specialized vision models.

#vision-transformers#self-supervised#scaling

Lab research & engineering blogs

The primary sources — Anthropic, OpenAI, DeepMind, Meta, and the great explainers.

Anthropic Research

Primary source for Anthropic's constitutional AI, RLHF, and Claude scaling research with direct access to published papers and research updates.

#constitutional-ai#safety#alignment

Anthropic Engineering

Deep operational knowledge on building reliable AI systems at scale, including agents, tool use, evals, and MCP—what production teams actually encounter.

#production-systems#agents#evals

Transformer Circuits Thread (Anthropic Interpretability)

Frontier mechanistic interpretability research on sparse autoencoders, circuit analysis, and reverse-engineering transformer internals—cutting-edge monosemantic feature extraction.

#interpretability#SAE#circuits

Frontier Red Team Research (red.anthropic.com)

Evidence-based analysis of frontier AI risks for cybersecurity, biosecurity, and autonomous systems—national security implications of frontier models.

#safety#red-teaming#security

OpenAI Research

Official OpenAI research publications including GPT-series scaling laws, RLHF, multimodal models, and reasoning-focused frontier models.

#gpt#scaling#reasoning

Google DeepMind Blog

Google DeepMind

Major breakthroughs in AlphaFold, Gemini frontier models, multimodal reasoning, and scientific AI applications across biology and mathematics.

#alphafold#gemini#scientific-ai

Meta AI Research Blog

Meta's foundational model research, Llama open-source models, computer vision, and robotics—bridging research to production at scale.

#llama#open-source#robotics

Microsoft Research Blog

Microsoft Research

Broad AI research spanning foundation models, agents, quantum computing, and enterprise AI applications with theoretical depth.

#agents#foundation-models#quantum

Blogintermediate

NVIDIA Technical Blog

Infrastructure and optimization for AI: inference performance, physical AI, robotics acceleration, and GPU-specific model deployments.

#gpu#inference#physical-ai

Mistral AI News & Research

Frontier open-source and commercial model releases including physics AI, multimodal research, and efficient transformer architectures.

#physics-ai#open-source#multimodal

DeepSeek Research Blog

Frontier reasoning models and efficient scaling breakthroughs—low-cost training innovations and long-context (1M+ tokens) architectural advances.

#moe#efficient-scaling#reasoning

Blogfoundational

The Illustrated Transformer

Canonical visual explainer of transformer architecture and attention mechanisms—most accessible introduction to core concepts for engineers.

#transformers#attention#from-scratch

Blogintermediate

Lil'Log (Lilian Weng's Blog)

Authoritative learning notes on diffusion models, agents, reinforcement learning, and LLM reasoning—rigorous technical explanations with breadth.

#diffusion#agents#rlhf

Berkeley AI Research (BAIR) Blog

UC Berkeley EECS

Academic research from 100+ grad students across vision, NLP, RL, robotics, and cross-cutting themes like human-compatible AI.

#robotics#vision#reinforcement-learning

Blogintermediate

Practical insights on AI evaluation, safety evals, RL environment design, and data infrastructure for training frontier models.

#evaluation#safety#data-infrastructure

Blogfoundational

Andrej Karpathy Blog & Neural Networks: Zero to Hero

Andrej Karpathy

Minimal, from-scratch implementations of backprop, CNNs, and GPT—pedagogical deep dives preferred by engineers building intuition.

#from-scratch#neural-networks#gpt

Build with Claude, Codex & the stack

Docs and tools for agents, RAG, fine-tuning, eval, and serving.

Docsfoundational

Claude API Documentation

Official reference for Claude API covering models, tool use, streaming, prompt caching, batch processing, and vision capabilities.

#claude-api#official-docs#foundational

Docsintermediate

Claude Code Documentation & Best Practices

Anthropic-authored guide to agentic coding workflows with best practices for prompt design, file context management, and autonomous automation.

#claude-code#agents#best-practices

Toolintermediate

Anthropic Cookbook

Official Jupyter notebooks demonstrating tool use, agents, RAG, vision, and production patterns with runnable examples.

#agents#rag#tool-use#examples

Coursefoundational

Anthropic Academy

Free self-paced courses from Anthropic engineers on Claude API, Code, MCP, and agents with official certificates.

#courses#training#mcp#agents

Toolintermediate

Anthropic Quickstarts

Deployable reference projects including customer support agents, financial analysts, computer use, and autonomous coding agents.

#agents#templates#computer-use#tool-use

Docsintermediate

Model Context Protocol (MCP) Specification

Anthropic & community

Open standard for connecting AI systems to data sources and tools with server/client architecture and JSON-RPC protocol spec.

#mcp#protocol#tools#integration

Docsintermediate

LangGraph Documentation

Framework for building stateful, multi-step agents with human-in-the-loop, durability, and comprehensive memory.

#agents#orchestration#langgraph

Docsintermediate

LlamaIndex Documentation

Data framework for RAG with 300+ integrations, structured ingestion, and advanced retrieval with agents.

#rag#retrieval#indexing#data-frameworks

Docsintermediate

vLLM Documentation

High-throughput inference serving engine with distributed parallelism, paged attention, and production-grade serving.

#serving#inference#llm-ops

Hugging Face Transformers & TRL Documentation

End-to-end training library with SFT, DPO, GRPO, and PEFT integration for efficient model fine-tuning.

#training#fine-tuning#rl#peft

Toolintermediate

OpenAI Agents SDK (Python)

Lightweight framework for multi-agent workflows with sandbox agents, tool execution, and stateful conversations.

#agents#multi-agent#orchestration

Toolintermediate

OpenAI Agents SDK (JavaScript/TypeScript)

Production-ready agent framework for TypeScript with tool loops, MCP support, and real-time voice capabilities.

#agents#typescript#multi-agent

Toolintermediate

OpenAI Cookbook

Official examples and recipes for function calling, fine-tuning, embeddings, vision, and production patterns.

#examples#recipes#best-practices

Declarative programming framework for optimizing LLM pipelines with structured signatures and automatic prompt compilation.

#prompt-optimization#dspy#program-not-prompt

Docsintermediate

RAGAS Evaluation Framework

Reference-free evaluation metrics for RAG systems measuring faithfulness, relevance, context precision, and recall.

#evaluation#rag#metrics

Docsintermediate

DeepEval Documentation

Pytest-style LLM evaluation framework with 50+ metrics, component-level evals, and CI/CD integration.

#evaluation#testing#llm-evals

LangSmith Observability Platform

Framework-agnostic observability with tracing, evals, dashboards, and multi-SDK support for production monitoring.

#observability#monitoring#tracing

Weights & Biases Weave

Weights & Biases

End-to-end ML platform with automatic LLM tracing, cost tracking, evaluation scoring, and experiment comparison.

#observability#evaluation#ml-ops

The open-source agent stack (it's free)

Frameworks, coding agents, serving runtimes, retrieval, and eval repos — wire them together and you've built what frontier labs hire for. None of it is behind a paywall.

Toolfoundational

Lightning-fast Python framework for orchestrating autonomous agent crews and event-driven flows with first-class multi-agent autonomy.

#agents#multi-agent#orchestration#python

Toolfoundational

Production-grade agentic AI framework emphasizing type-safe, validated agent behaviors with multi-provider LLM support and structured outputs.

#agents#structured-output#validation#production

Toolintermediate

Enterprise SDK for building and running production agent platforms with storage, observability, human approval, RBAC, and 100+ tool integrations.

#agents#production#multi-agent#observability

Toolintermediate

Stateful agent platform with advanced memory management enabling long-term learning and self-improvement over time.

#agents#memory#stateful#learning

FoundationAgents

Multi-agent framework for building AI software companies and autonomous development teams with natural language programming capabilities.

#agents#multi-agent#software-engineering#orchestration

Toolintermediate

Modern TypeScript framework for AI-powered agents with model routing, autonomous workflows, human-in-the-loop, and production observability.

#agents#typescript#workflows#observability

Toolfoundational

Tool integration platform powering 1000+ toolkits for agents with context management, authentication, sandboxed execution, and framework-agnostic SDKs.

#agents#tool-integration#sandbox#orchestration

Toolfoundational

Hugging Face smolagents

Barebones agent library emphasizing code-based thinking with sandboxed execution and minimal dependencies for lightweight agentic systems.

#agents#sandbox#lightweight#code-execution

Toolintermediate

Microsoft Agent Framework

Production-grade multi-language framework for orchestrating complex agent workflows with standardized patterns, observability, and enterprise features.

#agents#multi-agent#production#orchestration

Toolintermediate

Event-driven async orchestration framework specialized for document-centric agent workflows with production-grade scaling and stateful execution.

#agents#workflows#document-processing#async

Toolintermediate

Microsoft Semantic Kernel

Model-agnostic SDK for building AI agents and orchestrating multi-agent workflows across Python, .NET, and Java with plugin architecture.

#agents#multi-language#plugins#orchestration

Toolintermediate

Significant-Gravitas

Vision for accessible autonomous agents with platform, forge framework, and benchmark tools for building and evaluating agentic systems.

#agents#autonomous#benchmark#platform

Toolintermediate

Autonomous agent platform with multi-interface access (CLI, GUI, SDK) for end-to-end code execution and codebase modification, MIT-licensed and Series A funded.

#coding-agent#sandbox#multi-interface#autonomous

SWE-agent / Mini-SWE-agent

GitHub issue resolver that autonomously fixes bugs with any LLM, built by Princeton/Stanford researchers, featured at NeurIPS 2024; mini-swe-agent is recommended for simplicity.

#coding-agent#issue-fixing#github-integration#eval-benchmarked

Toolintermediate

Terminal-native AI pair programmer with full codebase mapping, git integration, and 45k stars; works with Claude, GPT, and local LLMs.

#coding-agent#terminal-native#git-integrated#pair-programming

Toolintermediate

Open-source AI coding agent (5M+ VS Code installs) with SDK, IDE extensions, and CLI; autonomous file editing, command execution, and real-time error monitoring across platforms.

#coding-agent#ide-extension#cross-platform#autonomous

Toolintermediate

Open-source IDE extension (VS Code, JetBrains) with source-controlled AI checks enforceable in CI/CD, 33k stars, supports 15+ model providers.

#coding-agent#ide-extension#ci-cd-integrated#policy-as-code

Terminal-native persistent autonomous agent (since 2023) with code writing, terminal access, web browsing, and MCP server integration; works with any LLM provider.

#coding-agent#terminal-native#persistent-agent#mcp-integrated

AAIF (Linux Foundation)

Extensible AI agent built in Rust for executing, testing, and building complete projects; 47k stars, 15+ LLM providers, 70+ MCP extensions, moved to AAIF at Linux Foundation.

#coding-agent#multi-interface#extensible#mcp-ecosystem

Toolintermediate

High-adoption open-source coding agent (171k stars, 7.5M monthly developers) with terminal, desktop, and IDE integration; plan and build agents with privacy-first architecture.

#coding-agent#multi-interface#high-adoption#privacy-first

Toolfoundational

Gold-standard benchmark for evaluating autonomous code agents on real GitHub issues; 2,294 tasks, verified subset with 500 human-annotated instances.

#eval#benchmark#coding-agent#issue-fixing

Toolfoundational

harbor-framework

Benchmark for evaluating agents on hard terminal tasks (89 curated tasks, ICLR 2026); supports Claude Code, OpenHands, SWE-agent, and mini-swe-agent.

#eval#benchmark#cli-agents#terminal-native

Toolintermediate

Enterprise-grade secure sandbox runtime for AI code execution (90ms startup, Firecracker VMs); Python/TS SDKs, 12.5k stars, widely used by agent platforms.

#sandbox#code-execution#infrastructure#secure

Toolintermediate

Secure elastic sandbox infrastructure for AI code execution with stateful snapshots, 72k stars, multi-language SDKs (TS, Python, Ruby, Go, Java); AGPL-licensed.

#sandbox#code-execution#stateful#infrastructure

AgentKit (Inngest)

TypeScript framework for building multi-agent networks with deterministic routing, shared state, and MCP integration; Apache 2.0, 884 stars.

#agents#orchestration#routing#multi-agent

Toolfoundational

Reference-free evaluation framework for LLM applications with automatic test generation; 14.3k stars, widely used for agent and RAG system evaluation.

#eval#framework#metrics#reference-free

Toolfoundational

The gold-standard library for extracting structured outputs from any LLM via Pydantic models with zero boilerplate, trusted by 100k+ developers at OpenAI, Google, Microsoft.

#structured-output#validation#multi-language#production

Toolintermediate

Efficient programming paradigm for steering LLM output with constrained generation, conditionals, and loops seamlessly integrated; reduces latency and cost vs conventional prompting.

#structured-output#control-flow#serving#constraint-based

Toolintermediate

Fast, provider-agnostic structured generation library using regex and context-free grammars to enforce JSON/structured outputs with microsecond-level latency overhead.

#structured-output#serving#constraint-based#multi-provider

Toolintermediate

DSL for reliable tool-calling and structured outputs with fallback policies, multi-model switching, and schema-aligned parsing that works even without native LLM tool support.

#structured-output#agents#type-safe#fallback

High-performance LLM serving framework with native constrained decoding via compressed FSM for structured outputs (JSON/regex/grammar) with near-zero overhead and 3x faster JSON decoding.

#serving#structured-output#inference#production

Toolfoundational

Universal gateway for 100+ LLM providers (OpenAI, Anthropic, Gemini, etc.) with unified structured outputs API, cost tracking, and load balancing for production agents.

#serving#gateway#multi-provider#observability

Toolintermediate

Pydantic AI-native framework for declarative structured extraction, classification, and generation workflows with deep integration into type-safe Python patterns.

#structured-output#validation#agents#python-native

Toolfoundational

llama-cpp-python

Production-grade Python bindings for local LLM inference with OpenAI API compatibility, enabling on-device structured outputs and agent serving without external dependencies.

#serving#local-inference#openai-compatible#edge

Efficient JSON generation by only delegating content token prediction to the LLM while auto-filling fixed tokens, reducing latency and improving reliability for structured outputs.

#structured-output#efficiency#local-inference#json

Toolfoundational

Simplest path to run any open-source LLM locally with REST API; no GPU required, production-ready with 173k stars

#serving#local-inference#api#beginner-friendly

Toolfoundational

De facto standard for LLM inference in C/C++; foundation of Ollama, LM Studio, and most local inference tools; minimal dependencies, cross-hardware support

#serving#inference-engine#lightweight#ubiquitous

NVIDIA-optimized serving with state-of-the-art GPU kernels and multi-GPU orchestration; critical for production LLM inference on NVIDIA hardware

#serving#gpu-optimization#nvidia#performance

Toolintermediate

Fine-tuning optimization achieving 2x speedup and 70% VRAM reduction with no accuracy loss; dual-interface (Studio UI + code API)

#fine-tuning#optimization#memory-efficient#performance

Toolintermediate

axolotl-ai-cloud

Unified fine-tuning framework supporting 100+ models with SFT, LoRA, QLoRA, and preference tuning; multimodal training support

#fine-tuning#framework#multimodal#flexible

Toolintermediate

Unified efficient fine-tuning of 100+ LLMs & VLMs with support for SFT, RLHF, DPO, and process reward models; production-tested at scale

#fine-tuning#framework#vllm-support#reward-modeling

InternLM (OpenCompass)

Comprehensive toolkit for LLM serving with compression, quantization, and dynamic batching; 1.8x higher throughput than vLLM per the maintainers

#serving#compression#deployment#optimization

Toolintermediate

Meta's composable framework providing OpenAI-compatible APIs with pluggable backends (Ollama, vLLM, managed services); run-anywhere deployment

#serving#framework#api-compatibility#multi-backend

Toolintermediate

Modular local inference engine supporting LLMs, vision, voice, images with minimal dependencies; wraps llama.cpp, vLLM, whisper.cpp as needed

#serving#local-inference#multimodal#modular

Toolfoundational

Enterprise RAG orchestration framework with modular pipelines, retrieval routing, and multi-stage ranking — production-ready for complex retrieval workflows.

#retrieval#RAG#orchestration#agents

Toolfoundational

Lightweight embeddings database with automatic tokenization and vectorization — fastest path to RAG for prototypes and small-scale systems.

#retrieval#vector-db#embeddings#semantic-search

Toolfoundational

High-performance vector database with sparse/dense/multivector search, 97% memory reduction via quantization, and production-grade filtering.

#retrieval#vector-db#hybrid-search#scaling

Toolfoundational

Cloud-native vector database combining semantic search with structured filtering, built-in RAG pipelines, and multi-tenancy for enterprise scale.

#retrieval#vector-db#RAG#structured-filtering

Toolintermediate

Distributed vector database scaling to billions of vectors with GPU acceleration, native sparse vectors (BM25/SPLADE), and hybrid search in a single engine.

#retrieval#vector-db#distributed#hybrid-search

Toolfoundational

PostgreSQL extension enabling vector similarity search while retaining ACID compliance, JOINs, and point-in-time recovery in your existing database.

#retrieval#vector-db#postgres#structured-data

Toolintermediate

Production vector database combining vector search with full-text, filtering, and aggregations in one query — enterprise standard for hybrid RAG.

#retrieval#vector-db#hybrid-search#full-text

Late-interaction (ColBERT) retrieval trainer and inference — domain-generalizing retrieval alternative to dense embeddings with zero-shot robustness.

#retrieval#dense-retrieval#colbert#training

Toolintermediate

All-in-one embeddings DB combining vector search, sparse indexing, SQL, and LLM orchestration — minimal overhead for semantic search workflows.

#retrieval#vector-db#semantic-search#embeddings

AI search platform handling vectors, tensors, and structured data at scale with ML model inference at query time — for complex ranking and relevance.

#retrieval#search-engine#ml-inference#ranking

Knowledge graph extraction and graph-based RAG for complex reasoning — structures unstructured text into queryable knowledge graphs for nuanced retrieval.

#retrieval#knowledge-graphs#RAG#structured-extraction

Toolfoundational

sentence-transformers

Standard library for computing and training embeddings with 15k+ pretrained models — essential backbone for all dense retrieval and semantic search.

#retrieval#embeddings#semantic-search#training

Toolfoundational

Complete LLM observability platform with tracing, evals, prompt management, and metrics dashboards for production agents.

#observability#eval#tracing#metrics

Toolfoundational

Enterprise-grade LLM observability and evaluation platform with drift detection, retrieval quality scoring, and trace analytics.

#observability#eval#drift-detection#retrieval

Toolintermediate

Lightweight LLM observability platform offering cost monitoring, request tracking, and experimentation without code changes.

#observability#monitoring#cost-tracking#experiments

Toolintermediate

Red-teaming and prompt testing framework with adversarial evaluation, security scanning, and CI/CD integration for agents and RAGs.

#eval#red-teaming#prompt-testing#security

Toolintermediate

Structured output and validation framework ensuring LLM outputs conform to guardrails, schemas, and safety constraints.

#guardrails#validation#structured-output#safety

Toolintermediate

NeMo Guardrails

NVIDIA's toolkit for enforcing guardrails on LLMs via topical boundaries, content filtering, and behavioral constraints.

#guardrails#safety#content-filtering#constraints

Toolintermediate

Automated evaluation and testing library for LLM agents detecting performance regressions, hallucinations, and robustness gaps.

#eval#testing#robustness#hallucination-detection

Toolintermediate

Framework for building web-automation agents with DOM interaction, JavaScript execution, and cross-site navigation capabilities.

#agents#web-automation#browser#interaction

Toolintermediate

SDK for orchestrating browser agents with reliable screenshot-based navigation, JavaScript isolation, and debugging tools.

#agents#browser-automation#sdk#debugging

Open Interpreter

Open Interpreter

Natural language code interpreter enabling agents to execute Python/shell/JavaScript locally with sandboxed execution.

#agents#code-execution#sandbox#interpreter

Newsletters & practitioners

The feeds that keep you current between model releases.

Newsletteradvanced

Sebastian Raschka, PhD

Curated technical deep dives into LLM architectures, research paper roundups, and state-of-the-art reviews from a seasoned researcher building at the frontier.

#transformers#research-summaries#llm-architecture#ai-trends

Newsletterintermediate

Daily Dose of Data Science

Daily byte-sized insights on machine learning, data science tools, and untold observations that make the data science lifecycle less intimidating.

#data-science#ml-tools#practical-tips#daily-learning

Newsletterintermediate

DeepLearning.AI (Andrew Ng)

Curated weekly report on the most important AI research and industry-shaping events for engineers and business leaders to act on what matters.

#ai-news#research#industry-trends#weekly-digest

Newsletteradvanced

Interconnects AI

Nathan Lambert (Allen Institute for AI)

Insider technical analysis of frontier AI model training and post-training from a researcher actively shipping at scale, with original work on RLHF methodologies.

#post-training#rlhf#open-models#model-training

Newsletterintermediate

eugeneyan's Newsletter

Eugene Yan (Anthropic)

Pragmatic guidance on recommendation systems, LLMs, and AI product development from an ML engineer who has scaled teams at Amazon and Anthropic.

#recsys#llm-systems#ml-infrastructure#engineering

Newsletteradvanced

Chip Huyen's Substack

Monthly essays on AI engineering, system design, and production MLOps from the author of 'Designing Machine Learning Systems' and 'AI Engineering.'

#mlops#ai-systems#production-ml#engineering-best-practices

Simon Willison's Weblog

Rigorous, independent technical analysis of AI tools, SQLite, Datasette, and pragmatic takes on LLMs in production from a Django co-creator.

#ai-tools#llm-practices#web-dev#open-source

Newsletteradvanced

swyx (Shawn Wang)

185K-subscriber newsletter + weekly podcast diving deep into how frontier labs build agents, models, and infrastructure with interviews from the builders themselves.

#ai-agents#model-building#infrastructure#interviews

Hamel Husain's Blog

Field-tested insights on evals, error analysis, and improving AI products in production from an engineer who helps teams move past prototype stage.

#evals#ai-engineering#observability#reliability

Jason Liu Writing

Applied AI essays on RAG, open source, and building AI systems in production from a DX engineer with deep hands-on experience.

#rag#llm-applications#open-source#consulting

Newsletteradvanced

Jack Clark (Anthropic)

Weekly deep dives into cutting-edge AI research papers with analysis of technical breakthroughs and implications, including sci-fi explorations of impact.

#research-analysis#arxiv#ai-implications#frontier-tech

Newsletteradvanced

Andrej Karpathy's Substack

Andrej Karpathy (Anthropic)

Technical insights from a pioneer of deep learning and LLMs, now at the frontier of pre-training at Anthropic with 39K+ subscribers.

#deep-learning#llm-training#neural-networks#frontier-research

Newsletterintermediate

The Pragmatic Engineer

1.1M+ subscriber deep dive into Big Tech and startup engineering practices, with rigorous analysis of AI engineering trends from the inside.

#software-engineering#big-tech#career#ai-infrastructure

RL Interview Questions 2026

Xiuyu Li (@sheriyuo, UC Berkeley)

A Berkeley researcher's longform set of RL interview questions for 2026 — the RLHF/PPO/GRPO/DPO post-training territory frontier labs probe in recruiting.

#rl#interview#rlhf#post-training