Gaurav Pathak @gauravpathak, Twitter Profile

Gaurav Pathak @gauravpathak

2 years ago

M2 Ultra 128GB RAM + llama.cpp, getting 24 Tokens/sec with 13B model, 18 Tokens/sec with 30B, and 9 Tokens/sec with 65B! Thank you @ggerganov

4 29 219 34K 64

Download Video

tobi lutke @tobi

2 years ago

@gauravpathak @ggerganov you need to compile with `make LLAMA_METAL=1` and then add -ngl 1000 to your command line

1 0 18 920 1

Alexander Derve @AlexanderDerve

2 years ago

@gauravpathak @ggerganov On one hand the M2 Ultra with 128GB RAM is crazy expensive but on the other hand a 48GB A40 GPU that runs 65B is just about the same price. If nothing else more competition is good

0 0 2 356 0

tim @yoitsmetim

2 years ago

@gauravpathak @ggerganov Any documentation that you can share?! I want to try this on my M2 as well

1 0 0 403 0

woon @e_wacc

2 years ago

@gauravpathak @cerebral_valley @ggerganov @realGeorgeHotz 🤯

0 0 0 336 0