Yuqian Fu @Chert_Fu

PhD Student @ DRL｜ML fyqqyf.github.io/home Cyberspace Joined April 2022

Tweets

296
Followers

38
Following

818
Likes

540

Guohao Li 🐫 @guohao_li

2 weeks ago

The challenge for starting agent RL research is that very few are willing to do the less glamorous but essential work. Students I worked with usually want to dive straight into training agents or experimenting with RL algorithms. They want to invent the most beautiful new “PPO,…

20 36 417 49K 266

Zichen Liu @zzlccc

a month ago

In the era of experience, we're training LLM agents with RL — but something's missing... We miss the good old Gym! So we built 💎GEM: a suite of environments for training LLM 𝚐𝚎𝚗𝚎𝚛𝚊𝚕𝚒𝚜𝚝𝚜. Let’s build the Gym for LLMs, together: axon-rl.notion.site/gem

5 35 274 29K 177

Download Image

Jakob Foerster @j_foerst

a month ago

I recently had a lunch time conversation with a very senior AI researcher about how are multi-agent problems differ from single agent (their starting point was they do not). One point that made them think: As computers scale, the rest of the world (i.e. no agentic parts) is not…

24 17 234 31K 150

EMNLP 2025 @emnlpmeeting

a month ago

@JungNerd Yes, you are allowed to choose a different track when committing your paper to EMNLP! (Note that EMNLP has some new tracks that were not available when submitting to ARR)

0 1 2 912 1

Jason Wei @_jasonwei

2 months ago

Becoming an RL diehard in the past year and thinking about RL for most of my waking hours inadvertently taught me an important lesson about how to live my own life. One of the big concepts in RL is that you always want to be “on-policy”: instead of mimicking other people’s…

127 341 3K 325K 2K

Andrej Karpathy @karpathy

2 months ago

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

412 863 8K 1.1M 6K

Sukjun (June) Hwang @sukjun_hwang

2 months ago

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

98 744 5K 736K 4K

Download Gif

Hanlin Zhang @_hanlin_zhang_

2 months ago

[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning…

1 17 220 13K 20

uccl_project @uccl_proj

2 months ago

1/N📢 Debugging NCCL performance problems for LLM workloads is always challenging. In this blog post, we explore various perf-critical parameters in NCCL and tackle datacenter network congestions with UCCL plugin. uccl-project.github.io/posts/debug-nc…

1 1 7 885 2

Download Image

The AI Timeline @TheAITimeline

2 months ago

SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Overview: SRFT introduces a single-stage method that unifies supervised and reinforcement fine-tuning through entropy-aware weighting mechanisms, simultaneously optimizing LLMs using…

2 1 8 767 4

Download Image

AI Native Foundation @AINativeF

2 months ago

13. SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning 🔑 Keywords: Large Language Models, Supervised Fine-Tuning, Reinforcement Learning, Supervised Reinforcement Fine-Tuning, Entropy 💡 Category: Natural Language Processing 🌟 Research…