Linghao Zhang @starryzhangcs

Staff @XiaomiMiMo. I build something verifiable and scalable for code. Ex @MSFTResearch linghaoz.com Beijing Joined March 2023

Tweets

77
Followers

49
Following

438
Likes

1K

Dimitris Papailiopoulos @DimitrisPapail

3 weeks ago

GRPO makes reasoning model yap a lot, but there's a simple fix: Sample more responses during training, and train on the shortest ones. This creates a length pressure that makes the model sound much more terse, without sacrificing accuracy!! Examples of GRPO vs GFPO versions…

Dimitris Papailiopoulos @DimitrisPapail

3 weeks ago

19 45 361 93K 273

Download Image

6 37 352 31K 238

Download Image

heiner @HeinrichKuttler

a month ago

@giffmana cuelang.org

1 1 16 6K 19

XiaomiMiMo @XiaomiMiMo

4 weeks ago

🚀 MiMo‑VL 2508 is live! Same size, much smarter 🚀 We’ve upgraded performance, thinking control, and overall user experience. 📈 Benchmark gains across image + video: MMMU 70.6, VideoMME 70.8. Consistent improvements across the board. 🤖 Thinking Control: toggle reasoning…

7 37 258 37K 82

Download Image

Linghao Zhang @starryzhangcs

4 weeks ago

exactly

Justus Mattern @MatternJustus

4 weeks ago

exactly

17 23 450 52K 161

0 0 0 73 0

elvis @omarsar0

a month ago

2026+: everyone releases their own OS Building with Claude Code SDK made me realize that we are just a UI away from the next ChatGPT moment. Models are more intelligent than they seem. AI Agents are already unlocking unique and novel experiences. Claude Code is the…

14 38 264 53K 137

Download Image

carlos @_carlosejimenez

a month ago

What happens if you compare LMs on SWE-bench without the fancy scaffolds? Our new leaderboard “SWE-bench (bash only)” shows you which LMs are the best at getting the job done with just bash. More on why this is important 👇

14 26 206 32K 72

Download Image

Wenhao Yu @wyu_nd

a month ago

🗒️Have been exploring Agent-RL training over the past few months, particularly in GUI scenarios. Here’s a summary of some practical insights and lessons 🤔 learned from the perspective of an industry researcher, and some reference papers.

2 19 120 9K 85

Download Image

Denny Zhou @denny_zhou

a month ago

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial…

41 466 3K 266K 4K

Linghao Zhang @starryzhangcs

a month ago

based

Kilian Lieret @KLieret

a month ago

based

12 74 791 107K 901

Download Image

0 0 4 210 0

Tanishq Mathew Abraham, Ph.D. @iScienceLuvr

a month ago

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains 'We introduce Rubrics as Rewards (RaR), a framework that uses structured, checklist-style rubrics as interpretable reward signals for on-policy training with GRPO. Our best RaR method yields up to a relative…

11 80 570 82K 590

Download Image

Kimi.ai @Kimi_Moonshot

2 months ago

Kimi K2 tech report just dropped! Quick hits: - MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale - 20K+ tools, real & simulated: unlocking scalable agentic data - Joint RL with verifiable + self-critique rubric rewards: alignment that adapts -…

80 255 2K 87K 429

Download Image

Kimi.ai @Kimi_Moonshot

2 months ago

🌕 Did you notice Kimi’s doodle today at kimi.ai? It’s our little Moon Day surprise - A tribute to the spirit of exploration, and to the day humans first set foot on the Moon 🍻 May Kimi fuel your next big idea 💡

47 77 886 47K 118

Download Video

OpenRouter @OpenRouterAI

2 months ago

kimi

Toven @pingToven

2 months ago

kimi

0 0 15 6K 0

1 5 31 8K 0

Alex Shaw @alexgshaw

2 months ago

Evaluating agents on benchmarks is a pain. Each benchmark comes with its own harness, scoring scripts, and environments and integrating can take days. We're introducing the Terminal-Bench dataset registry to solve this problem. Think of it as the npm of agent benchmarks. Now…

1 24 99 11K 41

Download Image

Andrej Karpathy @karpathy

2 months ago

Scaling up RL is all the rage right now, I had a chat with a friend about it yesterday. I'm fairly certain RL will continue to yield more intermediate gains, but I also don't expect it to be the full story. RL is basically "hey this happened to go well (/poorly), let me slightly…

412 863 8K 1.1M 6K

Shengyang Sun @ssydasheng

2 months ago

We built 200k-GPU clusters; We scaled up & curated higher-quality data; We scaled compute by 100x; We developed training & test-time recipes; We made everything RL native; We stabilized infrastructure and speeded up; That's how you turn RL into the pre-training scale. Yet I am…

53 163 1K 183K 243

Download Image

Dhravya Shah @DhravyaShah

2 months ago

> opens claude code > write a huge ass prompt > auto-accept edits > go drink water, watch a yt video, chillax, work on something else > come back, work is done. What a time to live in. wow.

107 62 2K 157K 582

Download Image

Agentica Project @Agentica_

2 months ago

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE…

15 70 362 64K 229

Download Image

Jakob Foerster @j_foerst

3 months ago

I suggest a new metric: Pass@1/K. For a given "K" You only get a point if all "K" attempts were successful. So it's a continuation of the Pass@K graph to the left hand site and intuitively measures robustness / confidence.