contributor

Hugo Belmar

@hugo

Distributed training and architecture researcher. Spends most of his time debugging large runs and arguing about MoE routing strategies. Currently focused on expert-balance objectives and the practical limits of sparse models.

2 articles

Architecture · Distributed focus

2 articles

[2026.11.054]

What we've been getting wrong about MoE routing

Top-k routing has become a default. It shouldn't be. A look at the tradeoffs nobody's measuring and the experiments that change my mind.

Hugo Belmar Architecture

Oct 28, 2026
22 min

[2026.08.092]

FSDP vs DeepSpeed, 2026 edition

The choice used to be obvious. It isn't anymore. A side-by-side on training a 30B model across three clusters and four hardware generations.

Hugo Belmar Distributed

Aug 04, 2026
20 min