Contributors.
9 writers from labs, startups, and research groups working on the hard parts of ML systems. Every article is attributed — and most of them are open to questions on Discussions.
World models, embodied agents, and the cost surfaces of model-based RL. Writes about what actually scales versus what only looks good on the leaderboard. Currently interested in latent dynamics models and planning over learned simulators.
Distributed training and architecture researcher. Spends most of his time debugging large runs and arguing about MoE routing strategies. Currently focused on expert-balance objectives and the practical limits of sparse models.
Inference engineer focused on kernel-level optimizations and evaluation methodology. Believes the gap between a benchmark number and a useful number is where most of the engineering actually lives. Writes about attention kernels, profiling, and the eval landscape.
Production LLM serving — paged attention, batching, KV cache management. Cares about tail latency, not just averages. Currently focused on long-context serving and KV offload strategies.
Works on quantization-aware training and post-training quantization for production deployment. Cares about the gap between papers and the calibration headaches you actually hit in prod. Currently focused on FP4 and mixed-precision rollouts.
Inference acceleration via speculative decoding and tree-based sampling. Interested in the systems-side of decoding — verification kernels, draft-model selection, throughput under real load. Currently exploring multi-draft and Medusa-style heads.
Retrieval systems engineer interested in embedding compression and approximate search. Has strong opinions on when retrieval helps and when it just adds latency. Currently working on quantized indexes and re-ranker design.
Retrieval-augmented generation, document understanding, and grounded model evaluation. Spends most of his time arguing that the answer to "do we need RAG?" is usually "not the way you think." Currently exploring long-context vs retrieval trade-offs.
Training infrastructure for research-scale workloads. Spends most of his time figuring out why a run that worked on 8 GPUs falls over on 64. Currently working on FSDP recipes and large-scale debugging tooling.