What it does
Tokenizers determine how language models see your text. The same prompt can produce wildly different token counts (and therefore cost, latency, and behavior) across model families. This tool runs your text through seven tokenizers in parallel and shows the splits aligned column-by-column.
What you’ll see
- Token-level color coding so you can spot where tokenizers diverge
- Total token count + character compression ratio per tokenizer
- A “rare token” view that highlights tokens that decode to multiple Unicode codepoints
- Side-by-side cost projection across major API providers
Status
Experimental — currently runs entirely in-browser via WASM ports of the underlying libraries. Expect occasional Unicode edge cases.