Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)
Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)
Introducing 🛡️ExCyTIn‑Bench: Evaluating LLM agents on Cyber Threat Investigations. It’s built on Azure tenant, a real Security Operations Center environment, covering 57 tables. Explore how LLMs fare in realistic, multi-hop incident detection! #Cybersecurity#AI#LLM#Benchmark
People love 𝗽𝗮𝘀𝘀@𝗸 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴.
What to do if you have 𝟭𝟬𝟬 samples and you wanna optimize 𝗽𝗮𝘀𝘀@𝟭𝟬?
✨This is the reward. Prsented in the analytic form.
Next step? Pass it to GRPO and witness the magic.
People love 𝗽𝗮𝘀𝘀@𝗸 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴.
What to do if you have 𝟭𝟬𝟬 samples and you wanna optimize 𝗽𝗮𝘀𝘀@𝟭𝟬?
✨This is the reward. Prsented in the analytic form.
Next step? Pass it to GRPO and witness the magic. https://t.co/D8KM0rQm1s
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
OpenAI hasn’t open-sourced a base model since GPT-2 in 2019. they recently released GPT-OSS, which is reasoning-only...
or is it?
turns out that underneath the surface, there is still a strong base model. so we extracted it.
introducing gpt-oss-20b-base 🧵
⚡𝐅𝐏𝟖 makes RL faster — but at the cost of performance.
We present 𝐅𝐥𝐚𝐬𝐡𝐑𝐋, the first 𝐨𝐩𝐞𝐧–𝐬𝐨𝐮𝐫𝐜𝐞 & 𝐰𝐨𝐫𝐤𝐢𝐧𝐠 𝐑𝐋 𝐫𝐞𝐜𝐢𝐩𝐞 that applies 𝐈𝐍𝐓𝟖/𝐅𝐏𝟖 for rollout 𝐰𝐢𝐭𝐡𝐨𝐮𝐭 𝐥𝐨𝐬𝐢𝐧𝐠 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 compared to 𝐁𝐅𝟏𝟔!
📝 Blog:…
Failing on 𝐥𝐚𝐫𝐠𝐞-𝐬𝐜𝐚𝐥𝐞 𝐑𝐋 with VeRL?
⚠️ Mixing inference backend (𝐯𝐋𝐋𝐌/𝐒𝐆𝐋𝐚𝐧𝐠) with training backends (𝐅𝐒𝐃𝐏/𝐌𝐞𝐠𝐚𝐭𝐫𝐨𝐧) 𝐬𝐞𝐜𝐫𝐞𝐭𝐥𝐲 𝐭𝐮𝐫𝐧𝐬 𝐲𝐨𝐮𝐫 𝐑𝐋 𝐢𝐧𝐭𝐨 𝐨𝐟𝐟-𝐩𝐨𝐥𝐢𝐜𝐲 — even if they share the same weights!
📉 Blog:…
Today we’re releasing a prototype of Genesys, an autonomous multi-agent LLM discovery system that aims to discover new types of language model architectures. We found Genesys can discover novel architectures competitive with the industry-standard transformer. 🧵
OMW to #ICML2025, reach out (on X or Whova) if you’re interested in talking about RL, reasoning/safety of LLMs, agents. I will also be presenting our AI4MATH workshop paper: limits-of-RLVR, 13:45-14:00 pm, July 18, 2025. Location: Ballroom C, feel free to drop by
Q-learning is not yet scalable
seohong.me/blog/q-learnin…
I wrote a blog post about my thoughts on scalable RL algorithms.
To be clear, I'm still highly optimistic about off-policy RL and Q-learning! I just think we haven't found the right solution yet (the post discusses why).
Thoughts on Two Papers:
1. our paper
Limit-of-RLVR: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
2. ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
See following points:
🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨
That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing.
You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.
🧵 How?
539K Followers 17K FollowingThe best from AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, and startups.
15K Followers 6K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
748 Followers 613 FollowingThe real AGI is the friends we make along the way. PhD in FAIR CodeGen @AIatMeta. Alumni: @Huggingface, Sea AI Lab, @openai, École Polytechnique, SJTU
136 Followers 689 FollowingPh.D student @UBC Prev: BEng @Tsinghua_Uni. Broadly interested in everything about LLMs. Try to build in public and learn in public.
1K Followers 311 Following🇸🇬Research Scientist at Sea AI Lab @SeaGroup; 👨🏻🎓PhD/BS from @Tsinghua_Uni and ex-@MSFTResearch; 🛡️Trustworthy AI and Generative Models.
762 Followers 73 Following🚀Bringing China's AI & tech trends, voices, and perspectives to the global stage.
⚡️Powered by Zhihu/知乎, China's leading knowledge community.
8K Followers 167 FollowingLarge Model Systems Organization: Join our Slack: https://t.co/mSPNyKTLTS We developed SGLang https://t.co/jEqIJcGwGA, Chatbot Arena (now @lmarena_ai), and Vicuna!
7K Followers 1K FollowingAssistant Professor @UW, Principal Research Scientist @Nvidia. Prior Cofounder @NexusflowX, @Berkeley_EECS @Google @Microsoft. I work on LLMs.
899 Followers 385 FollowingResearch Scientist@NVIDIA . Making LLMs e.g., Hymba, Nemotron serials. Ex @Harvard @Meta @Tencent| Views and opinions are my own
4K Followers 2K FollowingApplied AI @openai 🛠️ , ex-founder, engineer, bio researcher
I care about safe deployment of AI, and applying it to advance science and improve human health.
49K Followers 9K FollowingI lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
20K Followers 1K FollowingResearcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.