M2 Ultra 128GB RAM + llama.cpp, getting 24 Tokens/sec with 13B model, 18 Tokens/sec with 30B, and 9 Tokens/sec with 65B! Thank you @ggerganov
4
29
219
34K
64
Download Video
@gauravpathak @ggerganov you need to compile with `make LLAMA_METAL=1` and then add -ngl 1000 to your command line
@gauravpathak @ggerganov On one hand the M2 Ultra with 128GB RAM is crazy expensive but on the other hand a 48GB A40 GPU that runs 65B is just about the same price. If nothing else more competition is good
@gauravpathak @ggerganov Any documentation that you can share?! I want to try this on my M2 as well