David Pissarra @davidpissarra

PhD Student @NYU_Courant | MSc @istecnico @Tsinghua_Uni | prev: Research Intern @CSDatCMU davidpissarra.com NYC Joined January 2020

Tweets

14
Followers

127
Following

138
Likes

428

Charlie Ruan @charlie_ruan

a year ago

Excited to share WebLLM engine: a high-performance in-browser LLM inference engine! WebLLM offers local GPU acceleration via @WebGPU, fully OpenAI-compatible API, and built-in web workers support to separate backend executions. Check out the blog post: blog.mlc.ai/2024/06/13/web…

11 94 390 97K 350

Download Video

Charlie Ruan @charlie_ruan

2 years ago

webllm.mlc.ai now adds Gemma from @GoogleDeepMind! The 2b model is perfect for building in-browser agents with @WebGPU acceleration -- everything local! Here is a 1x speed demo of 4-bit quantized gemma-2b-it on @GooglePixel_US 7 Pro with @googlechrome.

3 26 72 12K 23

Download Video

Ruihang Lai @ruihanglai

2 years ago

Run Gemma model locally on iPhone - we get blazing fast 20 tok/s for 2B model. This shows amazing potential ahead for Gemma fine-tunes on phones, made possible by the new MLC SLM compilation flow by @junrushao from @octoaicloud and many other contributors. github.com/mlc-ai/mlc-llm

3 17 35 16K 9

Download Video

Charlie Ruan @charlie_ruan

2 years ago

CodeLlama 70B is now on MLC LLM -- local deployment everywhere! Thanks to JIT compilation, running on different platforms (even w/ multi-GPU) is made easy -- see how M2 Mac (left) and 2 x RTX4090 (right) have almost the same code. llm.mlc.ai/docs/ huggingface.co/mlc-ai

2 20 77 16K 39

Download Image

Charlie Ruan @charlie_ruan

2 years ago

With @googlechrome v121, you can run webllm.mlc.ai on your Android web browser with @WebGPU acceleration, everything locally! Here is a 1x speed demo of running 4-bit quantized Phi-2 on Samsung S23. Thank you @quicksave2k @jason_mayes for the support and suggestions!

7 25 106 43K 73

Download Video

Charlie Ruan @charlie_ruan

2 years ago

New WizardMath V1.1 from @WizardLM_AI on WebLLM! Took me only ~20 mins to deploy it on browser with @WebGPU acceleration. WebLLM can be an easy way for folks to try new models — a laptop with @googlechrome, that’s it! We are actively working on WebLLM to make it even better!

3 11 30 3K 11

Download Video

Tianqi Chen @tqchenml

2 years ago

Chat with Mistral 7B Instruct v0.2 running locally in iphone and ipad. Now available in @AppStore. apps.apple.com/gb/app/mlc-cha…

23 68 489 135K 249

Download Video

Guangxuan Xiao @Guangxuan_Xiao

2 years ago

Exciting news: StreamingLLM is now available on iPhone! 🎉 A huge thanks to @davidpissarra for his fantastic extension to our work. Can't wait to explore the possibilities with StreamingLLM!

David Pissarra @davidpissarra

2 years ago

Exciting news: StreamingLLM is now available on iPhone! 🎉 A huge thanks to @davidpissarra for his fantastic extension to our work. Can't wait to explore the possibilities with StreamingLLM!

3 16 74 43K 31

Download Image

1 4 35 6K 4

David Pissarra @davidpissarra

2 years ago

Run the Mistral-7B-Instruct-v0.2 model on iPhone! Supports now StreamingLLM for endless generation. Try the MLC Chat App via TestFlight llm.mlc.ai For native LLM deployment, attention sinks are particularly helpful for longer generation with less memory requirement.

3 16 74 43K 31

Download Image

Charlie Ruan @charlie_ruan

2 years ago

Run @MistralAI's 7B model on your browser with @WebGPU acceleration! Try it out at webllm.mlc.ai For native LLM deployment, sliding window attention is particularly helpful for enjoying longer context with less memory requirement.