contributor

Priya Raghavan

@priya

Production LLM serving — paged attention, batching, KV cache management. Cares about tail latency, not just averages. Currently focused on long-context serving and KV offload strategies.

2 articles

Inference focus

2 articles

[2026.11.073]

Continuous batching, revisited

Three years after the original paper, what does state-of-the-art serving actually look like? A field report from a team running 12B tokens a day.

Priya Raghavan Inference

Nov 06, 2026
19 min

[2026.07.103]

Notes on KV cache paging at scale

PagedAttention is a good idea poorly understood. A primer, plus the second-order effects you only see at 10,000 concurrent requests.

Priya Raghavan Inference

Jul 11, 2026
15 min