GRPO makes reasoning model yap a lot, but there's a simple fix:
Sample more responses during training, and train on the shortest ones.
This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!!
Examples of GRPO vs GFPO versions…
GRPO makes reasoning model yap a lot, but there's a simple fix:
Sample more responses during training, and train on the shortest ones.
This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!!
Examples of GRPO vs GFPO versions… https://t.co/yWKzNgyiQn
🚀 MiMo‑VL 2508 is live! Same size, much smarter 🚀
We’ve upgraded performance, thinking control, and overall user experience.
📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8. Consistent improvements across the board.
🤖 Thinking Control: toggle reasoning…
2026+: everyone releases their own OS
Building with Claude Code SDK made me realize that we are just a UI away from the next ChatGPT moment.
Models are more intelligent than they seem.
AI Agents are already unlocking unique and novel experiences.
Claude Code is the…
What happens if you compare LMs on SWE-bench without the fancy scaffolds?
Our new leaderboard “SWE-bench (bash only)” shows you which LMs are the best at getting the job done with just bash.
More on why this is important 👇
🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios.
Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.
Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-…
Key points:
1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial…
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…
🌕 Did you notice Kimi’s doodle today at kimi.ai?
It’s our little Moon Day surprise -
A tribute to the spirit of exploration, and to the day humans first set foot on the Moon 🍻
May Kimi fuel your next big idea 💡
Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days.
We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks.
Now…
Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…
We built 200k-GPU clusters;
We scaled up & curated higher-quality data;
We scaled compute by 100x;
We developed training & test-time recipes;
We made everything RL native;
We stabilized infrastructure and speeded up;
That's how you turn RL into the pre-training scale.
Yet I am…
> opens claude code
> write a huge ass prompt
> auto-accept edits
> go drink water, watch a yt video, chillax, work on something else
> come back, work is done.
What a time to live in. wow.
🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.
💪DeepSWE…
I suggest a new metric: Pass@1/K.
For a given "K" You only get a point if all "K" attempts were successful. So it's a continuation of the Pass@K graph to the left hand site and intuitively measures robustness / confidence.
194 Followers 392 FollowingPh.D. student @PurdueCS. 2025Intern at @MSFTResearch. I do research that helps developers—from pros to vibe coders to agent builders.
2K Followers 2K FollowingPhD student at Tsinghua NLP & AIR, studying agents that automate tasks ranging from daily activities to creative endeavors. Two drifters with the world to see.
15K Followers 6K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
2K Followers 2K FollowingAssistant Professor at Rice CS | CS Ph.D. at UCSB | Deep Learning System | ex- Amazon, Microsoft Research, NVIDIA Research | NVIDIA Graduate Fellowship’22.
21K Followers 19K FollowingInspired by Algorithms, Powered by Imagination: Unleashing the Potential of Generative AI.
#GenerativeAI #deeplearning #AI #MachineLearning
789 Followers 2K FollowingDoctoral Researcher in SE | Uni of Bremen | interested in Software Engineering, Program Comprehension, Empirical Software Engineering and related topics.
831 Followers 693 FollowingTT Assistant Professor at Colorado State University , Linux Foundation Researcher. Human Aspects of Sw Engineering. Mom, Swimming lover.
831 Followers 2K FollowingCS PhD student @illinoisCDS. Research intern at AWS AI Labs @AmazonScience. Towards building advanced code LLMs with better reasoning and planning.
2K Followers 3K FollowingPhD Student @Cambridge_Uni; Visiting @VectorInst; Intern @MSFTResearch
| Prev: @AWS AI Lab | Do not go gentle into that good night 🧗 | https://t.co/MOPcMcPqcc
17K Followers 78 Following财经作者,写作中国商业深度报道,包括AI/科技巨头/风险投资/人物,也是播客《张小珺商业访谈录》主持人、制作人。Financial writer covering China business world, also the producer and host of "Zhang Xiaojun Podcast."
325K Followers 3K FollowingNVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
263K Followers 664 FollowingBuilding with AI agents @dair_ai • Prev: Meta AI, Galactica LLM, Elastic, PaperswithCode, PhD • I share insights on how to build with AI Agents ↓
194 Followers 392 FollowingPh.D. student @PurdueCS. 2025Intern at @MSFTResearch. I do research that helps developers—from pros to vibe coders to agent builders.
354K Followers 1K FollowingML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW).
109K Followers 166 FollowingUPMC Professor of Computer Science @ CMU, President Elect ICML Board, VP of Research @ Meta (Multimodal LLMs, AI Agents), ex-Director of AI research at @Apple
5K Followers 1K FollowingCo-founder @allhands_ai, building OpenHands | PhD candidate @IllinoisCDS | BS @UMichCSE ('22) | Ex Intern @GoogleAI @Microsoft | Opinions are my own
82K Followers 631 FollowingLow-cost, high performance inference platform, powered by the Groq LPU. Delivering instant access to leading AI models with GroqCloud™.
4.4M Followers 363 FollowingThrough your lens, we hope to see your story at Xiaomi Imagery Awards 2025: https://t.co/IwaCpzK78a
For support, please contact @XiaomiSupport
56K Followers 853 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
602K Followers 5K FollowingPresident & CEO @ycombinator —Founder @Initialized—designer/engineer who helps founders—San Francisco Dem accelerating the boom loop—e/acc—technology brother
12K Followers 1K FollowingFounder of https://t.co/9KM4uFScMi, Associate Professor at Columbia. Making ai agent design and deployment easy and fast!
Forbes 30 under 30.
2K Followers 2K FollowingPhD student at Tsinghua NLP & AIR, studying agents that automate tasks ranging from daily activities to creative endeavors. Two drifters with the world to see.
55K Followers 2K FollowingHead of Design @Cursor_ai. Early @NotionHQ, @Stripe, built startups. I make a world where anyone can make software. Aspiring k-pop idol.
493 Followers 158 FollowingUndergrad @sjtu1896.
Intern @ GAIR Lab (https://t.co/QWViO83puG)
Visiting @stanfordnlp.
NLP/LLMs/Reasoning.
Looking for a Ph.D. in the 26 fall.
29K Followers 431 FollowingProfessor, CS, U. British Columbia. CIFAR AI Chair, Vector Institute. Sr. Advisor, DeepMind | ML, AI, deep RL, deep learning, AI-Generating Algorithms (AI-GAs)
5K Followers 668 FollowingIncoming Assistant Prof, Toyota Technical Institute at Chicago @TTIC_Connect
Recruiting PhD students (start 2026) 👀
Will irl - TC0 enthusiast
2.5M Followers 2K FollowingStocks/Options/Crypto/Market News + Tools. Not advice
Get a bonus opening a new tastytrade account: https://t.co/wGf2ZdlXpw
Discord: https://t.co/0xJ9e0ZYYG
More: https://t.co/nsxZlPV0pC