← All contributors
HB
contributor
Hugo Belmar
@hugo
Distributed training and architecture researcher. Spends most of his time debugging large runs and arguing about MoE routing strategies. Currently focused on expert-balance objectives and the practical limits of sparse models.
2 articles
Architecture · Distributed focus
2 articles
[2026.11.054]
What we've been getting wrong about MoE routing
Top-k routing has become a default. It shouldn't be. A look at the tradeoffs nobody's measuring and the experiments that change my mind.
Hugo Belmar Architecture
Oct 28, 2026
22 min
[2026.08.092] 22 min
FSDP vs DeepSpeed, 2026 edition
The choice used to be obvious. It isn't anymore. A side-by-side on training a 30B model across three clusters and four hardware generations.
Hugo Belmar Distributed
Aug 04, 2026
20 min
20 min