Viraj @tunedgradient, Twitter Profile

Viraj @tunedgradient

3 weeks ago

used to think kv cache paging/eviction was simple. but the traces here are interesting: - chat reuse lasts mins, api reuse lasts secs. - for apis, a small gpu cache is plenty. - for chat, smarter eviction. - be super workload-aware. (arxiv.org/pdf/2506.02634)

0 0 0 35 0

Download Image