LMCache supports gpt-oss (20B/120B) on Day 1!
TTFT 1.20s ā 0.39s (-67.5%), finish time 15.70s ā 7.73s (-50.7%) compared to Vanilla vLLM.
Release the true power of GPT-OSS with vllm+LMCache -- full deployment tutorial here:
blog.lmcache.ai/2025-08-05-gptā¦#LMCache#vLLM#OpenAI#LLMā¦
Excited to share our latest work š šš§šš¦ at #SOSP2025.
This oneās special as itās my first full CS project from start to finishāfrom early brainstorming and iterating on ideas to running experiments and writing the paper. Learned a ton, and perseverance finally paid off! š
Excited to share our latest work š šš§šš¦ at #SOSP2025.
This oneās special as itās my first full CS project from start to finishāfrom early brainstorming and iterating on ideas to running experiments and writing the paper. Learned a ton, and perseverance finally paid off! š
šØ LMCache now turbocharges multimodal models in vLLM!
By caching image-token KV pairs, repeated images now get ~100% cache hit rate ā cutting latency from 18s to ~1s.
Works out of the box.
Check the blog: blog.lmcache.ai/2025-07-03-mulā¦
Try it š github.com/LMCache/LMCache#vLLM#MLLMā¦
š LMCache X @RedHat Official Collaboration
LMCache is now a founding supporter of Red Hat's new llm-d project for scalable distributed LLM inference!
š¤ Red Hat is also joining the LMCache project as an active contributor
ā” Together we're building faster, more efficient openā¦
š LMCache turbocharges vLLM, KServe & Dynamo!
Our new blog reveals how this SOTA KV cache layer slashes LLM inference costs & latency (TTFT/ITL).
ā Massive scale (beyond single-node)
ā Blazing speed with custom CUDA Kernels
š Up to 2.3x prefill & 4.5x RAG throughput!ā¦
šš š¼š¼š»š°š®š°šøš² X šš šš®š°šµš²: KV Cache-centric Language Model Serving š
We're thrilled to announce a strategic collaboration between LMCache and Mooncake to pioneer a KVCache-centric Large Language Model (LLM) serving system! This partnership is set to redefineā¦
𤯠78.8% p95 Inter-Token Latency reduction with LMCache + vLLM v1 P/D support š
In our previous blog, we introduced the integration of šš šš®š°šµš² with šššš šš and NVIDIA's š”šš«š library, enabling Prefill-Decode Disaggregation (PD) for LLM inference.
Today, we'reā¦
š Exciting news from #EuroSys2025: Our work on CacheBlend won Best Paper! š
CacheBlend delivers the first-ever speedup for RAG LLMs, achieving near-100% KV cache hit rates while maintaining output quality.
Catch Jiayi Yao, Kuntai Du, and Shan Lu at EuroSys/ASPLOS this week &ā¦
š Deploying LLMs in Clusters #1
Check out this step-by-step tutorial to deploy the vLLM Production Stack on a cloud VM for superior performance and easy management.
š Blog: blog.lmcache.ai/2025-02-13-cloā¦
š Code: github.com/vllm-project/pā¦
Comment for the topic next week!
#k8s#vLLM
š LMCache speeds up multi-turn conversations by 7x vs. vLLM + prefix caching!
Our secret? Efficient KV cache offloading to CPU, Disk, and even remote storage!
šTry the benchmark now on your servers:
github.com/LMCache/LMCachā¦#LLM#Benchmark#vLLM#chat
24K Followers 1K FollowingSoftware Engineer @Youtube | Building LLM serving infra | AI | ex : @Google Search & @Microsoft Azure | 3x hackathon winner | Views my own
1K Followers 6K FollowingStudent,Love to solve hardest math problem. LLM's, Mathematical Research(Geometric Topology,Differential Geometry),Quantum Computing.Lord Krishna is God Of Math
677 Followers 2K Following3rd-yr PhD @PrincetonCS working on systems for ML/LLMs, interning @Google, previously @AmazonScience @maxplanckpress @WisconsinCS, fan of @fcbarcelona
653 Followers 47 Followingš§Ŗ Open-Source Team that maintains LMCache and Production Stack
š¤ Democratizing AI by providing efficient LLM serving for ALL
92K Followers 207 FollowingLMArena: Open Platform for Community-driven AI Benchmarking. Graduated from UC Berkeley / @lmsysorg. Weāre hiring: https://t.co/1OkfLq2n0I
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
10K Followers 48 FollowingAn open-source declarative framework for building modular AI software. Programmingānot promptingāLLMs via higher-level abstractions & optimizers.
1K Followers 6K FollowingStudent,Love to solve hardest math problem. LLM's, Mathematical Research(Geometric Topology,Differential Geometry),Quantum Computing.Lord Krishna is God Of Math
109K Followers 3 FollowingThe official newsroom for @OpenAI. Tweets are on the record.
If you like this account, youāll love our blog: https://t.co/nEYf8Iq3C0
3K Followers 274 FollowingProfessor at MIT. Research director at AWS.
Co-founder of https://t.co/0p4BiiYWYN and Instancio.
Systems for ML / ML for Systems Research
Blog: https://t.co/WUnjudXKHn
1.4M Followers 570 FollowingThe Massachusetts Institute of Technology is a world leader in research and education. Related accounts: @MITevents @MITstudents @MIT_alumni
215K Followers 562 FollowingDeputy Secretary of State, former US šŗšø Ambassador to Mexico š²š½. This is my personal account. Official account: @DeputySecState.
9K Followers 865 Followingmts @ openai |
cs phd @ š uc berkeley |
building @vllm_project |
machine learning system |
the real agi is the friends we made along the way
653 Followers 47 Followingš§Ŗ Open-Source Team that maintains LMCache and Production Stack
š¤ Democratizing AI by providing efficient LLM serving for ALL
45K Followers 455 FollowingCuriosity, wonder, quantitative research. Books, read/written. Desire is that which is missing. Evil twin of @yogappygappy. new book: https://t.co/ygOgypsEQ5
44K Followers 1K FollowingCTO at @Databricks and CS prof at @UCBerkeley. Working on data+AI, including @ApacheSpark, @DeltaLakeOSS, @MLflow, https://t.co/94gROE5Xa0. https://t.co/nmRYAKG0LZ
2K Followers 288 FollowingAssistant Professor at UIUC CS | Previously @MSFTResearch @Stanford | Creator of Puffer (400k users) | Illinois Networked Systems + AI Lab.
677 Followers 2K Following3rd-yr PhD @PrincetonCS working on systems for ML/LLMs, interning @Google, previously @AmazonScience @maxplanckpress @WisconsinCS, fan of @fcbarcelona
4.3M Followers 3 FollowingOpenAIās mission is to ensure that artificial general intelligence benefits all of humanity. Weāre hiring: https://t.co/dJGr6Lg202