Excited to share WebLLM engine: a high-performance in-browser LLM inference engine!
WebLLM offers local GPU acceleration via @WebGPU, fully OpenAI-compatible API, and built-in web workers support to separate backend executions.
Check out the blog post: blog.mlc.ai/2024/06/13/web…
webllm.mlc.ai now adds Gemma from @GoogleDeepMind! The 2b model is perfect for building in-browser agents with @WebGPU acceleration -- everything local!
Here is a 1x speed demo of 4-bit quantized gemma-2b-it on @GooglePixel_US 7 Pro with @googlechrome.
Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model.
This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by @junrushao from @octoaicloud and many other contributors. github.com/mlc-ai/mlc-llm
CodeLlama 70B is now on MLC LLM -- local deployment everywhere!
Thanks to JIT compilation, running on different platforms (even w/ multi-GPU) is made easy -- see how M2 Mac (left) and 2 x RTX4090 (right) have almost the same code.
llm.mlc.ai/docs/huggingface.co/mlc-ai
With @googlechrome v121, you can run webllm.mlc.ai on your Android web browser with @WebGPU acceleration, everything locally!
Here is a 1x speed demo of running 4-bit quantized Phi-2 on Samsung S23. Thank you @quicksave2k@jason_mayes for the support and suggestions!
New WizardMath V1.1 from @WizardLM_AI on WebLLM!
Took me only ~20 mins to deploy it on browser with @WebGPU acceleration. WebLLM can be an easy way for folks to try new models — a laptop with @googlechrome, that’s it!
We are actively working on WebLLM to make it even better!
Exciting news: StreamingLLM is now available on iPhone! 🎉 A huge thanks to @davidpissarra for his fantastic extension to our work. Can't wait to explore the possibilities with StreamingLLM!
Exciting news: StreamingLLM is now available on iPhone! 🎉 A huge thanks to @davidpissarra for his fantastic extension to our work. Can't wait to explore the possibilities with StreamingLLM!
Run the Mistral-7B-Instruct-v0.2 model on iPhone! Supports now StreamingLLM for endless generation. Try the MLC Chat App via TestFlight llm.mlc.ai
For native LLM deployment, attention sinks are particularly helpful for longer generation with less memory requirement.
Run @MistralAI's 7B model on your browser with @WebGPU acceleration! Try it out at webllm.mlc.ai
For native LLM deployment, sliding window attention is particularly helpful for enjoying longer context with less memory requirement.
25K Followers 27K FollowingTech VC and entrepreneur. Curious. Investing and building in AI. Built companies in media and tech. Founder @frontiervc. Learned things @Harvard, @Stanford
996 Followers 5K Followingex-Visiting Scholar @cmuhcii | @CarnegieMellon. Human-Computer Interaction and Health Informatics enthusiast working as Researcher & Software Engineer.
2K Followers 515 FollowingMLIR, CUTLASS,Tensor Core arch @NVIDIA. Mechanic @hpcgarage. Exercise of any 1st amendment rights are for none other than myself.
54K Followers 0 FollowingWe are building a world class AI R&D company in Tokyo. We want to develop AI solutions for Japan’s needs, and democratize AI in Japan. https://t.co/1q07mb3TzE
9K Followers 1K FollowingA research group in @StanfordAILab working on the foundations of machine learning & systems. https://t.co/JHK58TDorG Ostensibly supervised by Chris Ré
1K Followers 594 Followingassistant prof @NYU_Courant CS :: PL :: parallel computing :: music :: lead dev of the MaPLe compiler https://t.co/6jBOSBGv9C :: https://t.co/WLZqdK5BRC
23K Followers 680 FollowingProfessor and Head of Machine Learning Department at @CarnegieMellon. Board member @OpenAI and @Qualcomm. Chief Technical Advisor @GraySwanAI.
25K Followers 27K FollowingTech VC and entrepreneur. Curious. Investing and building in AI. Built companies in media and tech. Founder @frontiervc. Learned things @Harvard, @Stanford
26K Followers 876 FollowingResearch Scientist Director in Meta FAIR. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.
8K Followers 806 FollowingCSD's mission @ Carnegie Mellon is to lead in computer science research & education, pushing the frontiers of the field, resulting in real world impact.
1K Followers 31 FollowingMatX designs hardware tailored for the world’s best AI models: We dedicate every transistor to maximizing performance for large models. Join us: https://t.co/E3XexKHUSM
1K Followers 384 FollowingAssistant Professor working on systems and networking @nyuniversity. I have moved to discuss dot systems, where I am also apanda.