🧪 Open-Source Team that maintains LMCache and Production Stack
🤖 Democratizing AI by providing efficient LLM serving for ALLlmcache.ai Github, OnlineJoined September 2024
Join us at SIGCOMM 2025(conferences.sigcomm.org/sigcomm/2025/t…) for our full-day LMCache Tutorial — an intelligent caching middleware that makes LLM inference faster & cheaper!
📅 Sept 8, 2025
8:45 AM – 6:00 PM (Portugal Time / WEST)
= 12:45 AM – 10:00 AM (PDT)
What you’ll learn:
🔹 KV-cache…
🚀 Exciting to see LMCache x Mooncake being discussed at the vLLM Shanghai Meetup!
The ecosystem around vLLM is evolving fast — from distributed inference to hardware optimizations — and cache innovations like this will be key to unlocking the next level of efficiency &…
🚀 Exciting to see LMCache x Mooncake being discussed at the vLLM Shanghai Meetup!
The ecosystem around vLLM is evolving fast — from distributed inference to hardware optimizations — and cache innovations like this will be key to unlocking the next level of efficiency &…
Mark your calendars! Excited for the first FastAGI meetup featuring incredible speakers on AI infra & agents 🚀 Looking forward to the discussions and energy at LMCache Lab!
Mark your calendars! Excited for the first FastAGI meetup featuring incredible speakers on AI infra & agents 🚀 Looking forward to the discussions and energy at LMCache Lab!
Fastest inference engine for LLMs!
LMCache is an LLM serving engine that reduce Time to First Token (TTFT) and increase throughput, especially under long-context scenarios.
100% Open Source
8 KV-Cache Systems You Can’t Afford to Miss in 2025
By 2025, KV-cache has evolved from a “nice-to-have” optimization into a critical layer for high-performance large language model (LLM) serving.
From GPU-resident paging tricks to persistent, cross-node cache sharing, the…
We're thrilled to share an integration between KServe and @_llm_d_, bringing powerful, scalable LLM serving to @kubernetesio.
Our @RedHatAI team is integrating llm-d, a Kubernetes-native distributed inference framework, into KServe. This is all about combining the best of both…
CacheGen(arxiv.org/abs/2310.07240) lets you store KV caches on disk or AWS S3 and load them way faster than recomputing!
Modern LLMs use long contexts, but reprocessing these every time is slow and resource-intensive.
While engines like vLLM (and LMCache) can cache contexts in…
3K Followers 307 FollowingWeb3 Strategist | Digital Growth & Influence Architect Shaping the future of blockchain adoption & authentic communities Co-Creator of campaigns that move marke
2K Followers 1K FollowingAI Platform Engineer @lycorp_jp || #Kubernetes Member || Previously @Woven_ToyotaJP, @PreferredNetJP @chatwork_ja. All posts are my own.
456 Followers 3K FollowingHusband, father and data fan. Python learner, R friend and Julia explorer. Competition and econometrics. Economist @UNAM_mx @CIDE_mx, Data scientist @ITAM_mx
564K Followers 513 FollowingFounder of the world’s most read daily AI newsletter @therundownai. Sharing the latest developments in the world of artificial intelligence.
355K Followers 1K FollowingML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW).
38K Followers 991 FollowingCreator of bitsandbytes.Research Scientist @allen_ai and incoming professor @CarnegieMellon. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.
83K Followers 8K FollowingCompiling in real-time, the race towards AGI.
🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l
7K Followers 786 FollowingCo-founder and CEO @TensorChord, building postgres-based vector extension https://t.co/7WGvl1sR56 | Father of 1 cat | Married
14K Followers 15K FollowingAustin Powered. Co-founder of OpenStack & OpenInfra Foundation. General Manager of AI & Infrastructure for the Linux Foundation. open source for fun & profit.
2K Followers 588 Following@NVIDIA Sr. Research Scientist | UIUC PhD
All opinions and tweets are personal.
Tweets about AI Inference, CUDA and GPU systems.
9K Followers 865 Followingmts @ openai |
cs phd @ 🌁 uc berkeley |
building @vllm_project |
machine learning system |
the real agi is the friends we made along the way
17K Followers 20 FollowingA high-throughput and memory-efficient inference and serving engine for LLMs. Join https://t.co/lxJ0SfX5pJ to discuss together with the community!
No recent Favorites. New Favorites will appear here.