← All contributors
PR
contributor
Priya Raghavan
@priya
Production LLM serving — paged attention, batching, KV cache management. Cares about tail latency, not just averages. Currently focused on long-context serving and KV offload strategies.
2 articles
Inference focus
2 articles
[2026.11.073]
Continuous batching, revisited
Three years after the original paper, what does state-of-the-art serving actually look like? A field report from a team running 12B tokens a day.
Priya Raghavan Inference
Nov 06, 2026
19 min
[2026.07.103] 19 min
Notes on KV cache paging at scale
PagedAttention is a good idea poorly understood. A primer, plus the second-order effects you only see at 10,000 concurrent requests.
Priya Raghavan Inference
Jul 11, 2026
15 min
15 min