Adam Yanxiao Zhao @sdpkjc_adam

🧑‍🎓 CS PhD Student @UCAS1978 | 🤖 RL | 🏄‍♂️ Research Intern @Zai_org | 🦶 Ex-Intern @ LiAuto @SenseTime @ https://t.co/lQs9eBMtvx sdpkjc.com Joined June 2018

Tweets

124
Followers

46
Following

287
Likes

345

Xiao Liu (Shaw) @ShawLiu12

3 weeks ago

🚨Thrilled to share our latest progress on Computer Use Agent, ComputerRL, an end-to-end RL method which achieves 48.1% success rate on OSWorld Benchmark with only 9B open model, beating OpenAI Operator, Claude Sonnet 4.0, and other previous models, state-of-the-art performance.…

Z.ai @Zai_org

3 weeks ago

10 103 577 69K 289

Download Image

1 5 43 4K 16

Adam Yanxiao Zhao @sdpkjc_adam

3 weeks ago

Lucky to have collaborated with an amazing team on this work! 🎉🚀😃

Z.ai @Zai_org

3 weeks ago

Lucky to have collaborated with an amazing team on this work! 🎉🚀😃

10 103 577 69K 289

Download Image

0 0 2 42 0

Tanishq Mathew Abraham, Ph.D. @iScienceLuvr

3 weeks ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents "To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale…

3 23 118 9K 98

Download Image

Z.ai @Zai_org

a month ago

Introducing GLM-4.5 and GLM-4.5 Air: new flagship models designed to unify frontier reasoning, coding, and agentic capabilities. GLM-4.5: 355B total / 32B active parameters GLM-4.5-Air: 106B total / 12B active parameters API Pricing (per 1M tokens): GLM-4.5: $0.6 Input / $2.2…

265 648 3K 1.2M 1K

Download Image

TNG Technology Consulting GmbH @tngtech

4 months ago

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to @deepseek_ai V3-0324 with a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. The Chimera is a child LLM, using V3s…

26 107 608 80K 269

Download Image

will brown @willccbb

7 months ago

trying to make it really really easy to build LLM RL envs

8 22 357 44K 273

Download Image

Joseph Suarez 🐡 @jsuarez5341

9 months ago

x.com/i/article/1863…

9 26 295 29K 175

Adam Yanxiao Zhao @sdpkjc_adam

10 months ago

🚀

Roger Creus Castanyer @creus_roger

10 months ago

🚀

5 11 43 7K 22

0 0 0 110 0

Jarek Liesen @JarekLiesen

a year ago

🥳 I'm releasing Rejax, a lightweight library of fully vectorizable RL algorithms! ⚡ Enjoy lightning-fast speed using jax.jit on the training function 🧬Use vmap and pmap on hyperparameters 🔙 Log using flexible callbacks 🌐 Available @ github.com/kerajli/rejax 📸 Take a tour!

4 29 169 24K 87

Download Video

Quentin Gallouédec @QGallouedec

a year ago

Sorry to hear that @jsuarez5341, Open RL Benchmark was also rejected from RLC, and we mostly feel the same way about review quality (LLM-generated?). Among other things, we read that "the meaning of "metrics" is never made clear", whereas we have a section dedicated to metrics,…

Joseph Suarez 🐡 @jsuarez5341

a year ago

4 2 29 9K 9

1 2 14 2K 2

Quentin Gallouédec @QGallouedec

a year ago

The Open RL Leaderboard now fully supports all Stable Baselines 3 models! 🚀 Thanks to this update, it now compares over 10,000 models! 📈🎉 🏆 Leaderboard: huggingface.co/spaces/open-rl… 🐙 RL Zoo 3: github.com/DLR-RM/rl-base…

2 8 43 6K 12

Download Image

Quentin Gallouédec @QGallouedec

a year ago

🆕 LeRobot 🤖 github.com/huggingface/le… 📈 Pre-trained robotics models 💾 Datasets of human collected demos 🔩 Modular architecture This is part of our efforts @huggingface to make 🤖 more accessible. By @RemiCadene @asoare159 @alibert_s @Thom_Wolf @AdilZtn , @HaixuanT ...

1 3 18 2K 8

Download Video

Quentin Gallouédec @QGallouedec

a year ago

Which is the best RL agent on the Hub? Now you can, thanks to the Open RL leaderboard 🏆 ! 🧩 Features: - Automatic evaluation of models on the 🤗 Hub - Compatible with all torch-based RL libraries - Supports 87 environments, with more to come 🔥 huggingface.co/spaces/open-rl…

3 19 56 12K 21

Machine Learning @Memoirs

2 years ago

Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency. arxiv.org/abs/2403.00673

0 1 2 116 1

RL Beyond Rewards Workshop @RLBRew_RLC

a year ago

Announcing the Reinforcement Learning Beyond Rewards workshop at the first @RL_Conference. Think that rewards aren't enough for RL? Working on RLHF? Thinking of alternative ways of alignment? Creating a foundational model for RL? or have ideas on task-agnostic RL algo? Join us

1 15 65 24K 21

Download Image

Costa Huang @vwxyzjn

a year ago

Happy to share our work on reproducing RLHF scaling behaviors in @OpenAI's work in summarizing from feedback. We built an RLHF pipeline from scratch and enumerated over 20+ implementation details 🚀 Fun collab with @mnoukhov, @arianTBD, @krasul, @weixunwang, and @_lewtun 📜…

7 70 348 60K 272

Download Image

Andrew Silva @andrewsilva9

2 years ago

I wrote a modification of CleanRL that runs with MLX, feel free to check it out or offer suggestions! github.com/andrew-silva/c… Thanks @awnihannun for the amazing library!

2 2 26 10K 11

Aviral Kumar @aviral_kumar2

2 years ago

Super simple code change to get value-based deep RL scale *much* better w/ big models across the board on Atari games, robotic manipulation w/ transformers, LLM + text games, & even Chess! Just use classification loss (i.e., cross entropy), not MSE!! arxiv.org/abs/2403.03950🧵⬇️

3 41 265 52K 140

Download Image

Aran Komatsuzaki @arankomatsuzaki

2 years ago

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents Presents an exploration-based trajectory optimization approach, which consistently surpasses baseline performance by a large margin repo: github.com/Yifan-Song793/… abs: arxiv.org/abs/2403.02502

3 51 214 44K 152

Download Image

Yen-Jen Wang @wangyenjen

2 years ago

Check out our Humanoid-Gym! Humanoid-Gym is an easy-to-use RL framework that emphasizes zero-shot sim2real transfer for humanoid robots! We construct specifically designed reward functions for humanoid robots, which greatly reduces the difficulty of the sim2real transfer.…