🚀 Introducing our new tech report: Muon is Scalable for LLM Training
We found that Muon optimizer can be scaled up using the follow techniques:
• Adding weight decay
• Carefully adjusting the per-parameter update scale
✨ Highlights:
• ~2x computational efficiency vs AdamW…
Highly recommend this user-friendly project if you start with LM pretraining and want to build your own model/optimizer. The repo is easy to understand, easy to edit and easy to implement new ideas with minimum workloads. Well done Keller! Looking forward to your records on VIT:)
Highly recommend this user-friendly project if you start with LM pretraining and want to build your own model/optimizer. The repo is easy to understand, easy to edit and easy to implement new ideas with minimum workloads. Well done Keller! Looking forward to your records on VIT:)
Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes.
1. First reported by @bnjmn_marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match.
2. We reproed the issue, and further investigation…
Harm's Law of Smol Models (HLSM) tells us how much we need to scale up the data size (k_D) as we scale down the model size (k_N), if we wish to preserve the loss of a Chinchilla-optimal model.
harmdevries.com/post/model-siz…
11 Followers 49 FollowingMachine learning PhD student from National University of Singapore. Work in Out-of-Distribution generalization and AI for healthcare.
783 Followers 903 FollowingAssociate Professor of ML at the University of Pisa (Italy).
Deep Randomized Neural Networks, Reservoir Computing, Stable Architectures, Deep Learning 4 Graphs
1K Followers 740 FollowingAssistant Professor of Mathematics (Presidential Young Professor) at the National University of Singapore (@NUSingapore). #DeepLearning, #RobustAI, #ScalableAI
123 Followers 246 FollowingRough AI blogger. Writing poems with code. Neural Networks. Currently on Interpretability, Sparsity, Continual Learning, Programmable NNs.
107 Followers 131 FollowingResearcher at MSR Cambridge, previously at IIT in Genoa, Italy. Working on principled, efficient optimization for machine learning and on LLMs' expressivity.
5K Followers 2K Followingbuilding @collinearAI 🧪 | MIT 35u35 | UN AI Advisory Body | Featured in NYT, Quanta, Science, MIT TR| Previously: @huggingface 🤗, @SFResearch, PhD @utcompsci
1.2M Followers 522 FollowingHighlighting Politicians' trades so we can invest alongside Goal: get them banned from trading. $800,000,000 invested on @joinautopilot_ so far
618 Followers 325 FollowingPostdoctoral Fellow at @PrincetonPLI | Past: Computer Science PhD @TelAvivUni & Apple Scholar in AI/ML | Interested in the foundations of deep learning
5K Followers 822 FollowingAssistant Professor @Stanford Statistics and @StanfordBrain. Computational Neuroscience, Machine Learning, Bayesian Statistics. Tweets are my own.
11 Followers 49 FollowingMachine learning PhD student from National University of Singapore. Work in Out-of-Distribution generalization and AI for healthcare.
2K Followers 406 FollowingMachine learning researcher at Microsoft Research | based in Sydney | PhD @UniofOxford | South African 🇿🇦 | @DeepIndaba 🐘 | maker of things
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
5K Followers 667 FollowingIncoming Assistant Prof, Toyota Technical Institute at Chicago @TTIC_Connect
Recruiting PhD students (start 2026) 👀
Will irl - TC0 enthusiast
50K Followers 404 Following@AnthropicAI. Prev. @Google Brain/DeepMind, founding team @OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD.
No recent Favorites. New Favorites will appear here.