FlashAttention in 3D? Our latest blog explores the #kernel design of 2-Simplicial #Attention, modeling the algorithm with a hardware aligned design and rewriting the entire kernel in TLX (Triton Low Level Extensions).
🔗 hubs.la/Q03H6S9D0#PyTorch#OpenSourceAI
*Advanced automatic differentiation*
Twitter friends, I finally released a (drafty) extended chapter on autodiff, covering general vector spaces, implicit autodiff, and multilinear algebra (largely based on material from @mblondel_ml & Roulet). 🙂
sscardapane.it/assets/alice/A…
I agree this is kind of the weakpoint here, especially since the last checkpoint is the one where Muon and SOAP really degrade to ~adamw performance. with that said the paper is very good
I agree this is kind of the weakpoint here, especially since the last checkpoint is the one where Muon and SOAP really degrade to ~adamw performance. with that said the paper is very good
We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments…
We did a very careful study of 10 optimizers with no horse in the race. Despite all the excitement about Muon, Mars, Kron, Soap, etc., at the end of the day, if you tune the hyperparameters rigorously and scale up, the speedup over AdamW diminishes to only 10% :-( Experiments…
Today, we are releasing FineVision, a huge open-source dataset for training state-of-the-art Vision-Language Models:
> 17.3M images
> 24.3M samples
> 88.9M turns
> 9.5B answer tokens
Here are my favourite findings:
The Lore of Kalomaze! ⚡️
bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about training, finetuning, RL (environments and recipes), scaling, working at PI and a Lot of Lores!
(link in replies)
i am somewhat growing skeptical of looping layers as an architectural strategy (as some of you may know, i've been quite a fan). Something is still missing imo, probably a combination of:
- a proper experimental demonstration of looping being worth the FLOPs and
- a hierarchical…
i am somewhat growing skeptical of looping layers as an architectural strategy (as some of you may know, i've been quite a fan). Something is still missing imo, probably a combination of:
- a proper experimental demonstration of looping being worth the FLOPs and
- a hierarchical…
Another active stream of Language Modeling literature investigates whether, and how, one can adapt a pretrained model to perform better on a given task, without any additional continued-pretraining, nor fine-tuning.
At the current stage, two ideas have emerged: layer-pruning to…
New post! The fact that we experience life through what feels like a singular entity, I believe, is chance adaptation rather than a given rule of life. If our environments warranted it and evolution proceeded a little differently, experience might take on other forms
this is very very true. i think easiest example is when you are being hosted by a friend: a male friend will throw a mattress on the ground and that's where you're gonna sleep. any woman will treat you like a PRINCE and make hotels pale in comparison
this is very very true. i think easiest example is when you are being hosted by a friend: a male friend will throw a mattress on the ground and that's where you're gonna sleep. any woman will treat you like a PRINCE and make hotels pale in comparison
317 Followers 3K Followinga dog in human’s clothing | interested in math + informatics + chemistry (sometimes even philosophy) | no hate but curiosity really killed the cat 😉
3K Followers 6K FollowingLLM for code and reasoning. PhD student at Cornell. Previously Student Researcher at @google. Previously intern at @theteamatx.
177 Followers 469 FollowingAI Researcher @Aleph__Alpha | prev PhD @LMU_Muenchen | Working on GenAI | Interested in Geometric DL and the beautiful math behind.
812 Followers 91 FollowingHead of GLADIA @SapienzaRoma and fellow of @SSAS_Sapienza, @ELLISforEurope and @yacadeuro. Lover of crazy ideas and anything passion-driven. #EUFunded
752 Followers 203 Followingp/hd | Big RL energy | 0.71 |research⟩ + 0.71 |engineer⟩ @ Meta, but never speaking on behalf of the company | Prev. lead maintainer of Gymnasium