The challenge for starting agent RL research is that very few are willing to do the less glamorous but essential work. Students I worked with usually want to dive straight into training agents or experimenting with RL algorithms. They want to invent the most beautiful new “PPO,…
In the era of experience, we're training LLM agents with RL — but something's missing...
We miss the good old Gym!
So we built 💎GEM: a suite of environments for training LLM 𝚐𝚎𝚗𝚎𝚛𝚊𝚕𝚒𝚜𝚝𝚜.
Let’s build the Gym for LLMs, together: axon-rl.notion.site/gem
I recently had a lunch time conversation with a very senior AI researcher about how are multi-agent problems differ from single agent (their starting point was they do not).
One point that made them think: As computers scale, the rest of the world (i.e. no agentic parts) is not…
@JungNerd Yes, you are allowed to choose a different track when committing your paper to EMNLP! (Note that EMNLP has some new tracks that were not available when submitting to ARR)
Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life.
One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
Tokenization has been the final barrier to truly end-to-end language models.
We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data
[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces:
✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale.
✏️Intermediate checkpoints with incomplete learning…
1/N📢 Debugging NCCL performance problems for LLM workloads is always challenging. In this blog post, we explore various perf-critical parameters in NCCL and tackle datacenter network congestions with UCCL plugin.
uccl-project.github.io/posts/debug-nc…
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning
Overview:
SRFT introduces a single-stage method that unifies supervised and reinforcement fine-tuning through entropy-aware weighting mechanisms, simultaneously optimizing LLMs using…
13. SRFT: A Single-Stage Method with Supervised and Reinforcement
Fine-Tuning for Reasoning
🔑 Keywords: Large Language Models, Supervised Fine-Tuning, Reinforcement Learning, Supervised Reinforcement Fine-Tuning, Entropy
💡 Category: Natural Language Processing
🌟 Research…
300 Followers 735 FollowingResearcher @PontilGroup @IITalk | Ph.D. Student @ELLISforEurope, @Polytechnique and @UniGenova.
Interested in (deep) learning theory and others.
142 Followers 338 Followingqwen team researcher. phd at @NanjingUnivers1.
Member of LAMDA Group (https://t.co/mSqxdHPAQZ).
#Reinforcement Learning, #LLMs, #EmbodiedAI
2K Followers 2K FollowingPh.D. Student @PrincetonCS. Prev @Stanford @UW @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I used to work on computer vision, but it's not all I do.
4K Followers 791 FollowingAssociate Prof. at SJTU, leading GAIR Lab (https://t.co/Nfd8KmZx3B) Co-founder of Inspired Cognition, Postdoc at @LTIatCMU, Previously FNLP, @MILAMontreal,
3K Followers 2K FollowingBuilding personal superintelligence @OPPO, previously @AIWaves_inc. Former CS PhD student at ETHZ. Former researcher at ByteDance, Intern at MSRA and PYI at AI2
1K Followers 103 FollowingAI/RL researcher, Assistant Prof. at @Tsinghua_Uni, leading the RL lab at @AntResearch_, PhD at @berkeley_ai, frequent flyer and milk tea lover.
900 Followers 385 FollowingResearch Scientist@NVIDIA . Making LLMs e.g., Hymba, Nemotron serials. Ex @Harvard @Meta @Tencent| Views and opinions are my own
1K Followers 1K FollowingByteDance Seed @ByteDance_Seed | Senior Research Scientist working on LLMs | prev. @oxcsml @UniofOxford, @amazon, @apple, @bloomberg
All opinions are my own
618 Followers 205 Following🎓phd in Tsinghua University. Focus on RL, Embodied AI, and MLLM. 📖Author of limit-of-RLVR,phyworld,DeeR-VLA. 💼Seek a visit currently.
15K Followers 50 FollowingEMNLP 2025 - The 2025 Conference on Empirical Methods in Natural Language Processing, 2025
Hashtag: #EMNLP2025
Dates: November 5-9
Submission Deadline: May 19th