Wenting Zhao @wzhao_nlp

reasoning & llms @Alibaba_Qwen wenting-zhao.github.io NYC Joined June 2013

Tweets

426
Followers

3K
Following

595
Likes

339

Junxian He @junxian_he

2 days ago

Mirage or method? We re-assess a series of RL observations such as spurious reward, one-shot RL, test-time RL, and negative-sample training. 🧐These approaches were all proved on Qwen+Math combination originally, but do they work in other settings? If not, under which…

2 36 173 15K 81

Download Image

Ofir Press @OfirPress

a day ago

Feels like everyone is slowly admitting that there's no moat in foundational models, and the only way to build a business out of AI is to build products.

Hayden Field @haydenfield

a day ago

Feels like everyone is slowly admitting that there's no moat in foundational models, and the only way to build a business out of AI is to build products.

102 105 1K 204K 385

2 1 19 3K 2

Sasha Rush @srush_nlp

a day ago

seems like worktrees / workspaces are going to be essential if we're going to have 20 agents going at once.

1 1 14 4K 4

Wenting Zhao @wzhao_nlp

a day ago

💙

Qwen @Alibaba_Qwen

a day ago

💙

450 312 5K 834K 397

0 0 5 1K 0

Wenting Zhao @wzhao_nlp

a week ago

I've always been skeptical about PRMs, but being able to apply RL+reasoning changes the entire story for me. It was a fun ride with @weixiong_1, who has been teaching me a unified view to think about all RL methods. He'll be on the job market! It'd be so lucky to work with him.

Jason Weston @jaseweston

a week ago

11 97 487 83K 364

Download Image

1 17 164 19K 117

Yuchen Jin @Yuchenj_UW

3 weeks ago

Being disliked is not a weakness. Needing to be liked is.

44 26 405 37K 33

Prophet Arena @ProphetArena

3 weeks ago

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting today’s dots? 👉 What makes it special? - It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen…

90 149 1K 440K 878

Download Image

Jacob Austin @jacobaustin132

3 weeks ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

38 532 3K 376K 5K

Download Image

Yuntian Deng @yuntiandeng

4 weeks ago

🚀New dataset release: WildChat-4.8M 4.8M real user-ChatGPT conversations collected from our public chatbots: - 122K from reasoning models (o1-preview, o1-mini): represent real uses in the wild and very costly to collect - 2.5M from GPT-4o 🔗 hf.co/datasets/allen… (1/4)

Yuntian Deng @yuntiandeng

a year ago

6 16 96 76K 45

Download Gif

5 51 257 40K 168

Eric Zelikman @ericzelikman

4 weeks ago

i've been thinking lately about how future ai systems will interact with us and how we can make systems that care about people and wanted to put words to it -- hopefully it resonates a bit!

Eric Zelikman @ericzelikman

4 weeks ago

i've been thinking lately about how future ai systems will interact with us and how we can make systems that care about people and wanted to put words to it -- hopefully it resonates a bit!

12 57 322 103K 126

23 21 280 31K 88

Wenting Zhao @wzhao_nlp

2 months ago

I'll be around the ICML venue this afternoon. Message me if you want to meet! These days, I think about reasoning and RL. Also happy to talk about academia vs. industry (I think the lack of compute in academia is a feature not a bug), faculty and PhD student recruiting at UMass.

0 5 118 15K 22

Justin T Chiu @justintchiu

2 months ago

haven't made a new blog post in over a year, so here's a new one: justintchiu.com/blog/sftrl/ it's short

3 22 178 16K 175

Yoram Bachrach @yorambac

2 months ago

AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554 #LLM #Agents #MLEBench

8 68 317 29K 169

Download Image

Michael Hu @michahu8

2 months ago

📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule. a quick read about scaling law fails: 📜arxiv.org/abs/2507.00885 🧵1/5👇

4 38 283 29K 193

Download Image

Ori Press @ori_press

2 months ago

Do language models have algorithmic creativity? To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️

6 60 159 23K 67

Download Image

Jason Wei @_jasonwei

2 months ago

We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade. The first thing to know is that…

83 169 1K 377K 698

Wenting Zhao @wzhao_nlp

2 months ago

Congrats to team! They built my dream benchmark.

Minqi Jiang @MinqiJiang

2 months ago

Congrats to team! They built my dream benchmark.

41 196 1K 550K 809

Download Image

0 0 11 2K 3

NovaSky @NovaSkyAI

2 months ago

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…