🚀 Our new findings continue to unravel the mysteries of R1-Zero-like training!
📢 We identify that BIAS in GRPO leads to longer responses—so we fixed it.
✅ GRPO Done Right → 7B SOTA on AIME!
🚀 Our new findings continue to unravel the mysteries of R1-Zero-like training!
📢 We identify that BIAS in GRPO leads to longer responses—so we fixed it.
✅ GRPO Done Right → 7B SOTA on AIME!
🪂Understanding R1-Zero-Like Training: A Critical Perspective
* DeepSeek-V3-Base already exhibits "Aha moment" before RL-tuning??
* The ever-increasing output length in RL-tuning might be due to a BIAS in GRPO??
* Getting GRPO Done Right, we achieve a 7B AIME sota!
🧵
📜Full…
781 Followers 2K FollowingNLP/ML Researcher (working on developing GenAI and its human-centric applications) & Ex-@JD_Corporate @TencentGlobal @Sydney_Uni. Opinions are my own.
496 Followers 3K Followingpostdoc @OxCSML @NatureRecovery 🌱
AI for Social Good @barefootlaw_org 🌍
prev @ClopathLab @TheTeamAtX @ucl
@klarakaleb.bsky.social
3K Followers 3K FollowingPost-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
15K Followers 6K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
19K Followers 8K FollowingOn the quest to understand the fundamental mathematics of intelligence and of the universe with curiosity. https://t.co/mMchI2d4pg Upskilling @StanfordOnline
1K Followers 311 Following🇸🇬Research Scientist at Sea AI Lab @SeaGroup; 👨🏻🎓PhD/BS from @Tsinghua_Uni and ex-@MSFTResearch; 🛡️Trustworthy AI and Generative Models.
148 Followers 180 FollowingProfessor of Computer Science, Lee Kuan Yew Fellow, School of Computing and Information Systems, Singapore Management University
163K Followers 166 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
56K Followers 853 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
1K Followers 311 Following🇸🇬Research Scientist at Sea AI Lab @SeaGroup; 👨🏻🎓PhD/BS from @Tsinghua_Uni and ex-@MSFTResearch; 🛡️Trustworthy AI and Generative Models.
1K Followers 181 FollowingAssociate Professor @USC.
PhD from @CMU_Robotics, MS from @MIT, MEng from #UTokyo, BS from @NTUA. Previously worked at @SquareEnix in Tokyo.
29K Followers 431 FollowingProfessor, CS, U. British Columbia. CIFAR AI Chair, Vector Institute. Sr. Advisor, DeepMind | ML, AI, deep RL, deep learning, AI-Generating Algorithms (AI-GAs)
8K Followers 270 Following@Harvard Professor & Director Ctr for Computation & Society @HCRCS
@GoogleDeepMind Principal Scientist & Director #AIforGood #AIforhealth #AIforConservation
148 Followers 180 FollowingProfessor of Computer Science, Lee Kuan Yew Fellow, School of Computing and Information Systems, Singapore Management University
25 Followers 60 FollowingAssistant Professor @SheffieldNLP @sheffielduni, working on Multimodal LLM, RAG, Misinformation Detection, Recommender Systems, and AI for Science.