Modular engineers are using Mojo with Nvidia Blackwell to make matrix multiplication faster than cuBLAS. In part 1 of our series, we explain what matmul is, why it’s fundamental to LLMs, and give a quick history from Ampere to Blackwell. modular.com/blog/matrix-mu…
4
28
171
45K
141
@Modular Sounds fake, like the 98000x speedup over Python