❄️Andrew Zhao❄️ @_AndrewZhao

PhD @Tsinghua_Uni. Absolute Zero,ExpeL,Diver-CT Research Intern @MSFTResearch, Ex. @ BIGAI. Interested in RL, Reasoning/Safety 4 LLMs, Agents. On job market 26' andrewzh112.github.io Joined September 2020

Tweets

1K
Followers

4K
Following

3K
Likes

3K

❄️Andrew Zhao❄️ @_AndrewZhao

9 hours ago

Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)

Lifan Yuan @lifan__yuan

a day ago

Interesting findings! We also attempted something similar in our AZR paper section D.2, where the proposer needs to construct a composite function f(g,..g)

7 64 305 28K 265

Download Image

0 2 23 2K 10

Lifan Yuan @lifan__yuan

a day ago

🧩New blog: From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones Do LLMs learn new skills through RL, or just activate existing patterns? Answer: RL teaches the powerful meta-skill of composition when properly incentivized. 🔗:husky-morocco-f72.notion.site/From-f-x-and-g…

7 64 305 28K 265

Download Image

Yiran Wu @YiranWu18

19 hours ago

Introducing 🛡️ExCyTIn‑Bench: Evaluating LLM agents on Cyber Threat Investigations. It’s built on Azure tenant, a real Security Operations Center environment, covering 57 tables. Explore how LLMs fare in realistic, multi-hop incident detection! #Cybersecurity #AI #LLM #Benchmark

10 14 56 5K 26

Download Image

Ilya Sutskever @ilyasut

a day ago

a revolutionary breakthrough if i've ever seen one

͏Alps͏ @alpaysh

3 days ago

a revolutionary breakthrough if i've ever seen one

184 87 4K 2.3M 366

Download Image

676 871 21K 1.8M 1K

Jason Weston @jaseweston

2 days ago

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

4 77 390 53K 321

Download Image

will brown @willccbb

a week ago

and we’re live! been a very long time in the making, huge thanks to everyone who’s made it possible along the way. can’t wait to see what you guys all build here. we’re just getting started :)

Prime Intellect @PrimeIntellect

a week ago

117 395 3K 1.2M 2K

Download Video

38 38 609 58K 92

Andrej Karpathy @karpathy

a week ago

In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from. In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit…

Prime Intellect @PrimeIntellect

a week ago

117 395 3K 1.2M 2K

Download Video

261 851 7K 845K 5K

Prime Intellect @PrimeIntellect

a week ago

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

117 395 3K 1.2M 2K

Download Video

Sainbayar Sukhbaatar @tesatory

a week ago

📈 Process reward strikes back 🚨 I think it is obvious that eventually we need to rely on stepwise judges instead of final outcome rewards. As tasks get longer (or even endless), it is unreasonable to push up/down all steps involved. Here we show you can obtain stepwise labels…

Jason Weston @jaseweston

a week ago

11 97 487 83K 364

Download Image

3 13 102 12K 81

Jason Weston @jaseweston

a week ago

🪜Introducing: StepWiser🦉 📝: arxiv.org/abs/2508.19229 - Reframes stepwise reward modeling as a reasoning task: outputs CoT + judgment. - Trained by RL using relative outcomes of rollouts. Results: (1) SOTA performance on ProcessBench! (2) Improves policy at train time. (3)…

11 97 487 83K 364

Download Image

Banghua Zhu @BanghuaZ

2 weeks ago

Beyond prompt / context engineers, we’re seeing the rise of environment engineers, experts who build high-quality RL environments with verifiable reward. In RLHF, we had labelers for human preferences. In RLVR, the “label” is the environment and verifiable reward itself: coming…

10 11 143 13K 67

Zichen Liu @zzlccc

2 weeks ago

With just a few lines of code, Feng’s (@fengyao1909) suggested fix—applying importance sampling on the behavior policy—resolved the training instability in my case (oat). I believe the result can generalize to other RL frameworks as well. Great work, Feng!

7 54 472 44K 406

Download Image

Gradient @Gradient_HQ

2 weeks ago

Reinforcement Learning is the future tense of intelligence. Echo is how it scales. Echo is Gradient’s distributed RL framework, running on everyday consumer devices. From its early experiments, Echo powered a 30B Sokoban model that outperformed DeepSeek-R1 and GPT-OSS-120B.

351 706 3K 240K 52

Download Video

will brown @willccbb

2 weeks ago

the easiest way to get hired at @PrimeIntellect for research is to just make it very clear that you're already doing excellent work. go deep on projects that let you show off your strengths. don't give up on them after a weekend. share your work publicly. make us aware of you.

24 17 617 62K 223

Oleksii Kuchaiev @kuchaev

3 weeks ago

We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: huggingface.co/collections/nv…

9 59 295 63K 81

Download Image

𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8

3 weeks ago

NVIDIA Nemotron-Nano v2 Models: 12B Base, 9B Reasoning, 9B Base - Arch: Hybrid Mamba2–Transformer (128K ctx, 4 attn layers) - Training: 10.6T tokens (3.5T synthetic from DeepSeek, Qwen, Nemotron-4, phi-4, etc.) - 15 natural languages + 43 programming languages - Datasets:…

9 37 231 28K 119

Download Image

Chris @chatgpt21

3 weeks ago

It looks like Andrew Garfield will play Sam Altman in the Open AI movie coming to Amazon MGM I think @apples_jimmy deserves a feature

20 9 170 15K 18

Download Image

gensyn @gensynai

3 weeks ago

Coming soon

186 78 678 39K 28

Download Image

❄️Andrew Zhao❄️ @_AndrewZhao

3 weeks ago

LLMs as internet/knowledge base, no need for external tools. Reminiscent of older work from AI2/UW, Rainer arxiv.org/pdf/2210.03078 and CRYSTAL arxiv.org/abs/2310.04921 arxiv.org/abs/2508.10874

7 54 318 21K 259

Download Image

Hyung Won Chung @hwchung27

3 weeks ago

After a great time at OpenAI, we (@EdwardSun0909, @_jasonwei) recently joined @Meta Superintelligence Labs. The first month has already been so much fun building from a clean slate with a truly talent-dense team! Very excited about the compute and long term focus of the new lab