The People

Contributors.

9 writers from labs, startups, and research groups working on the hard parts of ML systems. Every article is attributed — and most of them are open to questions on Discussions.

Felix Marin
contributor

World models, embodied agents, and the cost surfaces of model-based RL. Writes about what actually scales versus what only looks good on the leaderboard. Currently interested in latent dynamics models and planning over learned simulators.

2 articles · @felix
Hugo Belmar
contributor

Distributed training and architecture researcher. Spends most of his time debugging large runs and arguing about MoE routing strategies. Currently focused on expert-balance objectives and the practical limits of sparse models.

2 articles · @hugo
Liam Chen
contributor

Inference engineer focused on kernel-level optimizations and evaluation methodology. Believes the gap between a benchmark number and a useful number is where most of the engineering actually lives. Writes about attention kernels, profiling, and the eval landscape.

2 articles · @lchen
Priya Raghavan
contributor

Production LLM serving — paged attention, batching, KV cache management. Cares about tail latency, not just averages. Currently focused on long-context serving and KV offload strategies.

2 articles · @priya
Ana Voinescu
contributor

Works on quantization-aware training and post-training quantization for production deployment. Cares about the gap between papers and the calibration headaches you actually hit in prod. Currently focused on FP4 and mixed-precision rollouts.

1 article · @ana
Mira Holst
contributor

Inference acceleration via speculative decoding and tree-based sampling. Interested in the systems-side of decoding — verification kernels, draft-model selection, throughput under real load. Currently exploring multi-draft and Medusa-style heads.

1 article · @mira
Naoko Ide
contributor

Retrieval systems engineer interested in embedding compression and approximate search. Has strong opinions on when retrieval helps and when it just adds latency. Currently working on quantized indexes and re-ranker design.

1 article · @naoko
Sasha Petrov
contributor

Retrieval-augmented generation, document understanding, and grounded model evaluation. Spends most of his time arguing that the answer to "do we need RAG?" is usually "not the way you think." Currently exploring long-context vs retrieval trade-offs.

1 article · @sasha
Toma Iliescu
contributor

Training infrastructure for research-scale workloads. Spends most of his time figuring out why a run that worked on 8 GPUs falls over on 64. Currently working on FSDP recipes and large-scale debugging tooling.

1 article · @toma