Matthieu Lin @capybara_ai

PhD@TsinghuaCS Interested in Reinforcement Learning, LLM-based Agents, Alignment. linyuhongg.github.io Joined December 2023

Tweets

178
Followers

75
Following

994
Likes

425

❄️Andrew Zhao❄️ @_AndrewZhao

2 days ago

Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)

Lifan Yuan @lifan__yuan

2 days ago

Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)

9 80 389 36K 333

Download Image

0 3 30 3K 13

Introducing 🛡️ExCyTIn‑Bench: Evaluating LLM agents on Cyber Threat Investigations. It’s built on Azure tenant, a real Security Operations Center environment, covering 57 tables. Explore how LLMs fare in realistic, multi-hop incident detection! #Cybersecurity #AI #LLM #Benchmark

10 17 72 8K 42

Download Image

Kunhao Zheng @KunhaoZ

2 weeks ago

People love 𝗽𝗮𝘀𝘀@𝗸 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴. What to do if you have 𝟭𝟬𝟬 samples and you wanna optimize 𝗽𝗮𝘀𝘀@𝟭𝟬? ✨This is the reward. Prsented in the analytic form. Next step? Pass it to GRPO and witness the magic.

Kunhao Zheng @KunhaoZ

4 months ago

12 136 834 132K 730

Download Image

3 3 29 2K 15

Download Image

Jacob Austin @jacobaustin132

3 weeks ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

38 533 3K 376K 5K

Download Image

Jack Morris @jxmnop

3 weeks ago

getting a phd is mostly about developing a system for maximizing the time you spend in flow state, over several years

44 105 2K 91K 650

Download Image

Yi Wu @jxwuyi

4 weeks ago

🔍We introduce ASearcher, a search agent trained by end2end RL Large-scale (up to 128 turns) RL with AReaL unlocks Long-Horizon Agentic Search (+20.8/+46.7% on GAIA/xBench) 💻Data, Code&Model: github.com/inclusionAI/AS… 📄Paper: arxiv.org/abs/2508.07976v #Agent #OpenSource #LLM #AGI

2 56 287 25K 155

Download Image

Jack Morris @jxmnop

4 weeks ago

OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only... or is it? turns out that underneath the surface, there is still a strong base model. so we extracted it. introducing gpt-oss-20b-base 🧵

159 470 6K 919K 4K

Download Image

Wen-Tse Chen @WenzeChen2

4 weeks ago

[0/3] 🚀 Introducing Verlog – an open-source RL framework built specifically for training long-horizon, multi-turn LLM agents. 📊 Max episode length comparison: •VeRL / RAGEN → ~10 turns •verl-agent → ~50 turns •Verlog (ours) → 400+ turns 🔥 ⚙️ Technical foundation:…

2 71 395 34K 365

Download Gif

Feng Yao @fengyao1909

4 weeks ago

⚡𝐅𝐏𝟖 makes RL faster — but at the cost of performance. We present 𝐅𝐥𝐚𝐬𝐡𝐑𝐋, the first 𝐨𝐩𝐞𝐧–𝐬𝐨𝐮𝐫𝐜𝐞 & 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐑𝐋 𝐫𝐞𝐜𝐢𝐩𝐞 that applies 𝐈𝐍𝐓𝟖/𝐅𝐏𝟖 for rollout 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 compared to 𝐁𝐅𝟏𝟔! 📝 Blog:…

11 89 566 56K 451

Download Image

Psyho @FakePsyho

a month ago

I think I found @merettm's favorite t-shirt

78 18 1K 150K 87

Download Image

Feng Yao @fengyao1909

a month ago

Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL? ⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜𝐫𝐞𝐭𝐥𝐲 𝐭𝐮𝐫𝐧𝐬 𝐲𝐨𝐮𝐫 𝐑𝐋 𝐢𝐧𝐭𝐨 𝐨𝐟𝐟-𝐩𝐨𝐥𝐢𝐜𝐲 — even if they share the same weights! 📉 Blog:…

13 117 717 128K 647

Download Image

Ai2 @allen_ai

2 months ago

Today we’re releasing a prototype of Genesys, an autonomous multi-agent LLM discovery system that aims to discover new types of language model architectures. We found Genesys can discover novel architectures competitive with the industry-standard transformer. 🧵

5 36 251 21K 135

Download Image

❄️Andrew Zhao❄️ @_AndrewZhao

2 months ago

OMW to #ICML2025, reach out (on X or Whova) if you’re interested in talking about RL, reasoning/safety of LLMs, agents. I will also be presenting our AI4MATH workshop paper: limits-of-RLVR, 13:45-14:00 pm, July 18, 2025. Location: Ballroom C, feel free to drop by

3 13 120 9K 34

Download Image

Flood Sung @RotekSong

2 months ago

Kimi K2 is here! The first big beautiful model purpose-built for agentic capabilities is now open-source! Agent RL, ready for takeoff!

Kimi.ai @Kimi_Moonshot

2 months ago

Kimi K2 is here! The first big beautiful model purpose-built for agentic capabilities is now open-source! Agent RL, ready for takeoff!

282 1K 7K 2.6M 3K

Download Image

3 18 132 13K 15

Flood Sung @RotekSong

3 months ago

Our first End2End RL Trained Agent is out! Hope you like it!

Kimi.ai @Kimi_Moonshot

3 months ago

Our first End2End RL Trained Agent is out! Hope you like it!

42 234 1K 235K 646

Download Image

4 13 122 8K 27

Seohong Park @seohong_park

3 months ago

Q-learning is not yet scalable seohong.me/blog/q-learnin… I wrote a blog post about my thoughts on scalable RL algorithms. To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).

37 191 1K 163K 1K

Download Image

Greg Yang @TheGregYang

3 months ago

I just had a dream where I wasnt fat

109 15 671 242K 7

Yang Yue @YangYue_THU

3 months ago

Thoughts on Two Papers: 1. our paper Limit-of-RLVR: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? 2. ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models See following points:

7 44 384 38K 351

Download Image

Shenzhi Wang🌟 @ShenzhiWang_THU

3 months ago

🚨Beyond 80/20 in LLM reasoning🚨Dropping 80% low-entropy tokens in RL greatly boosts performance 🔗arxiv.org/abs/2506.01939 🏆Zero-RL SoTA: 63.5/68.1 (AIME24), 56.7 (AIME25) 🚀Insights: 1. RL retains base model entropy patterns 2. High-entropy tokens drive all RL improvement ⬇️

11 53 283 46K 194

Download Image

Kunhao Zheng @KunhaoZ

4 months ago

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?