Playground

Tools.

Working calculators, visualizers, and benchmarks for ML systems work. Open source, no auth, no telemetry. Pick a tool below.

Throughput Calculator

· LIVE

Back-of-envelope tokens/sec for a given model, precision, and hardware. Memory-bound regime only; assumes batched serving with a healthy KV cache headroom.

Model size70B params
Batch size8
Sequence length4096 tokens
Precision
GPU
Estimate
Tokens / sec / GPU
383
Memory required
155.9 GB
Fits on 1 GPU?
needs sharding
Param memory (fp8)70.0 GB
KV cache @ batch=8, seq=409685.9 GB
HBM bandwidth (H100 SXM)3350 GB/s
Compute @ fp81979 TFLOPS
⚠ Estimate is memory-bound roofline only. Actual numbers depend on kernel quality, continuous batching, speculative decoding, and a dozen other things this tool doesn't model.
More from the community

Standalone tools & demos.

Want to add one? Contribute a tool →