Joshua Gu @astrogu_

CS Phd student @MIT, @MIT_CSAIL, @MITEECS👨‍💻| @LMCache Lab | Previous: BS @UChicago. Research on AI Systems Chicago, IL Joined December 2023

Tweets

30
Followers

35
Following

98
Likes

24

LMCache Lab @lmcache

a month ago

LMCache supports gpt-oss (20B/120B) on Day 1! TTFT 1.20s → 0.39s (-67.5%), finish time 15.70s → 7.73s (-50.7%) compared to Vanilla vLLM. Release the true power of GPT-OSS with vllm+LMCache -- full deployment tutorial here: blog.lmcache.ai/2025-08-05-gpt… #LMCache #vLLM #OpenAI #LLM…

4 9 33 1K 9

Download Image

LMCache Lab @lmcache

a month ago

🚀 Big news from LMCache Lab! 📝 3 papers accepted at SOSP ’25 & NSDI ’26, pushing the frontier of LLM-inference efficiency: 1️⃣ Cross-agent KV-cache sharing (NSDI) 🔗 arxiv.org/abs/2411.02820 2️⃣ Custom design for LLM prefillers (SOSP) 🔗 arxiv.org/abs/2505.07203 3️⃣…

0 6 34 1K 8

Download Image

Joshua Gu @astrogu_

a month ago

such cool demo videos, wonder who made these… 🤔

LMCache Lab @lmcache

a month ago

such cool demo videos, wonder who made these… 🤔

0 7 23 9K 16

Download Video

0 1 3 111 0

Joshua Gu @astrogu_

2 months ago

🔥 Check it out! 🔥

LMCache Lab @lmcache

2 months ago

🔥 Check it out! 🔥

1 6 30 1K 14

Download Image

0 0 2 71 0

Joshua Gu @astrogu_

2 months ago

Excited to share our latest work 𝗠𝗘𝗧𝗜𝗦 at #SOSP2025. This one’s special as it’s my first full CS project from start to finish—from early brainstorming and iterating on ideas to running experiments and writing the paper. Learned a ton, and perseverance finally paid off! 🚀

Siddhant Ray @siddhantrayyy

2 months ago

1 6 16 2K 1

Download Image

0 3 7 803 0

Joshua Gu @astrogu_

2 months ago

🤙

LMCache Lab @lmcache

2 months ago

🤙

0 4 12 861 0

Download Image

0 0 0 80 0

LMCache Lab @lmcache

2 months ago

🚨 LMCache now turbocharges multimodal models in vLLM! By caching image-token KV pairs, repeated images now get ~100% cache hit rate — cutting latency from 18s to ~1s. Works out of the box. Check the blog: blog.lmcache.ai/2025-07-03-mul… Try it 👉 github.com/LMCache/LMCache #vLLM #MLLM…

0 12 40 2K 14

Download Image

Joshua Gu @astrogu_

2 months ago

🥳🥳🥳

LMCache Lab @lmcache

2 months ago

🥳🥳🥳

0 5 12 952 0

Download Image

0 1 1 206 0

LMCache Lab @lmcache

3 months ago

🚀 LMCache X @RedHat Official Collaboration LMCache is now a founding supporter of Red Hat's new llm-d project for scalable distributed LLM inference! 🤝 Red Hat is also joining the LMCache project as an active contributor ⚡ Together we're building faster, more efficient open…

1 13 23 39K 4

Download Image

LMCache Lab @lmcache

4 months ago

🚀 LMCache turbocharges vLLM, KServe & Dynamo! Our new blog reveals how this SOTA KV cache layer slashes LLM inference costs & latency (TTFT/ITL). ✅ Massive scale (beyond single-node) ✅ Blazing speed with custom CUDA Kernels 📈 Up to 2.3x prefill & 4.5x RAG throughput!…

1 6 22 1K 4

Download Image

LMCache Lab @lmcache

4 months ago

🚀𝗠𝗼𝗼𝗻𝗰𝗮𝗰𝗸𝗲 X 𝗟𝗠𝗖𝗮𝗰𝗵𝗲: KV Cache-centric Language Model Serving 🚀 We're thrilled to announce a strategic collaboration between LMCache and Mooncake to pioneer a KVCache-centric Large Language Model (LLM) serving system! This partnership is set to redefine…

1 10 21 1K 5

Download Image

LMCache Lab @lmcache

4 months ago

🤯 78.8% p95 Inter-Token Latency reduction with LMCache + vLLM v1 P/D support 🚀 In our previous blog, we introduced the integration of 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 with 𝘃𝗟𝗟𝗠 𝘃𝟭 and NVIDIA's 𝗡𝗜𝗫𝗟 library, enabling Prefill-Decode Disaggregation (PD) for LLM inference. Today, we're…

0 8 20 676 6

Joshua Gu @astrogu_

5 months ago

🔥 Tencent x @lmcache

LMCache Lab @lmcache

5 months ago

🔥 Tencent x @lmcache

1 9 24 2K 9

Download Image

0 0 0 55 0

LMCache Lab @lmcache

5 months ago

🚀 𝗟𝗠𝗖𝗮𝗰𝗵𝗲 Powers Up 𝘃𝗟𝗟𝗠 𝗩𝟭: P/D Disaggregation & NIXL Support! vLLM V1 revolutionized LLM serving, but lacked a dedicated KV cache interface for advanced optimizations... until NOW! ⚡ LMCache Lab is thrilled to announce two major updates enhancing vLLM V1's…

1 11 49 2K 27

Download Image

LMCache Lab @lmcache

5 months ago

🏆 Exciting news from #EuroSys2025: Our work on CacheBlend won Best Paper! 🚀 CacheBlend delivers the first-ever speedup for RAG LLMs, achieving near-100% KV cache hit rates while maintaining output quality. Catch Jiayi Yao, Kuntai Du, and Shan Lu at EuroSys/ASPLOS this week &…

1 8 18 4K 1

Download Image

Joshua Gu @astrogu_

6 months ago

FAST!!! 🚀 Thrilled to see all the hard work pay off!

LMCache Lab @lmcache

6 months ago

FAST!!! 🚀 Thrilled to see all the hard work pay off!

0 6 18 1K 4

Download Image

0 0 1 34 0

Joshua Gu @astrogu_

6 months ago

🚀 vLLM Production Stack is here!

LMCache Lab @lmcache

6 months ago

🚀 vLLM Production Stack is here!

4 25 55 7K 28

Download Image

0 0 2 52 0

LMCache Lab @lmcache

7 months ago

🚀 Deploying LLMs in Clusters #1 Check out this step-by-step tutorial to deploy the vLLM Production Stack on a cloud VM for superior performance and easy management. 📊 Blog: blog.lmcache.ai/2025-02-13-clo… 🔗 Code: github.com/vllm-project/p… Comment for the topic next week! #k8s #vLLM

0 7 11 821 5

Download Image

LMCache Lab @lmcache

8 months ago

🔥Meet the vLLM Official Production Stack🔥 -⚡️ 3x higher throughput & 3x faster response! -🔧 Easy k8s deployment with helm chart! -📈 Observability dashboard! And it’s open-source under vllm-project! Code: github.com/vllm-project/p… Blog: blog.lmcache.ai/2025-01-21-sta… #LLM #vLLM #k8s

0 19 39 5K 14

Download Image

LMCache Lab @lmcache

9 months ago

🚀 LMCache speeds up multi-turn conversations by 7x vs. vLLM + prefix caching! Our secret? Efficient KV cache offloading to CPU, Disk, and even remote storage! 🔗Try the benchmark now on your servers: github.com/LMCache/LMCach… #LLM #Benchmark #vLLM #chat