Zhiyong Wang @Zhiyong16403503

Ph.D. candidate at CUHK. Former Visiting Scholar at Cornell. Working on reinforcement learning and multi-armed bandits. zhiyongwangwzy.github.io Hong Kong Joined September 2021

Tweets

135
Followers

782
Following

4K
Likes

2K

Kempner Institute at Harvard University @KempnerInst

a month ago

New in the #DeeperLearningBlog: @GaoZhaolin and collaborators including the #KempnerInstitute's Kianté Brantley presents a powerful new #RL algorithm tailored for reasoning tasks with #LLMs that updates using only one generation per prompt. bit.ly/44US1Mt @xkianteb #AI

0 6 14 2K 9

ARLET @arlet_workshop

a month ago

Delighted to announce that the 2nd edition of our workshop has been accepted to #NeurIPS2025! We have an amazing lineup of speakers: @WenSun1, @ajwagenmaker, @yayitsamyzhang, @MengdiWang10, @nanjiang_cs, Alessandro Lazaric, and a special guest!

1 8 29 6K 4

Download Image

Wen Sun @WenSun1

2 months ago

How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel…

Kaiwen Wang @kaiwenw_ai

2 months ago

2 16 47 12K 21

0 9 23 6K 15

Zhiyong Wang @Zhiyong16403503

2 months ago

Happy to share our work "Provable Zero-Shot Generalization in Offline Reinforcement Learning" at ICML 2025! 📍 Poster | 🗓️July 16, 11:00 AM – 1:30 PM 📌 West Exhibition Hall B2-B3 #W-1012 🤖 How can offline RL agents generalize zero-shot to unseen environments? We introduce…

0 2 11 865 3

Wen Sun @WenSun1

2 months ago

Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better? Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you…

Gokul Swamy @g_k_swamy

2 months ago

11 72 481 84K 425

Download Image

1 14 104 13K 57

Ruhan Wang @iu_ruhan

2 months ago

Curious how to combine federated learning and in-context learning for QA tasks — with privacy preservation, efficiency, and boosting performance round by round? 🚀 Meet Fed-ICL — our framework collaboratively refines answers without transmitting model weights or sharing raw…

0 3 3 539 0

Owen Oertell @owenoertell

3 months ago

Tired of over-optimized generations that stray too far from the base distribution? We present SLCD: Supervised Learning based Controllable Diffusion, which (provably) solves the KL constrained reward maximization problem for diffusion through supervised learning! (1/n)

2 10 28 7K 10

Download Image

Nicolas Espinosa Dice @nico_espinosa_d

3 months ago

by incorporating self-consistency during offline RL training, we unlock three orthogonal directions of scaling: 1. efficient training (i.e. limit backprop through time) 2. expressive model classes (e.g. flow matching) 3. inference-time scaling (sequential and parallel) which,…

2 16 76 13K 73

Download Gif

yobibyte @y0b1byte

4 months ago

Excellently written paper

9 253 2K 148K 2K

Download Image

Gokul Swamy @g_k_swamy

5 months ago

I won't be at #ICLR2025 myself this time around but please go talk to lead authors @nico_espinosa_d, @GaoZhaolin, and @runzhe_wu about their bleeding-edge algorithms for imitation learning and RLHF!

0 8 37 2K 6

Download Image

Yuda Song @yus167

5 months ago

Heading to #ICLR2025 🇸🇬! Excited to connect with friends and chat about RL: theory, LLM reasoning and robotics! I will present our Oral paper on LLM self-improvement📍4:18pm Sat. Join me if you want to learn about its scaling laws, iterative training and test-time improvement.

1 11 75 6K 9

Download Image

Carlo Sferrazza @carlo_sferrazza

5 months ago

What is the place of exploration in today's AI landscape and in which settings can exploration algorithms address current open challenges? Join us to discuss this at our exciting workshop at @icmlconf 2025: EXAIT! exait-workshop.github.io #ICML2025

4 10 32 9K 7

Download Image

Gokul Swamy @g_k_swamy

5 months ago

I think of misspecification (embodiment / sensory gaps) as the fundamental reason behavioral cloning isn't "all you need" for imitation as matching actions != matching outcomes. Introducing @nico_espinosa_d's #ICLR2025 paper proving that "local search" *is* all you need! [1/n]

1 28 102 15K 73

Download Image

Association for Computing Machinery @TheOfficialACM

6 months ago

Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD

35 471 2K 447K 138

Download Video

Laixi Shi @ShiLaixi

6 months ago

🚀 Rising Star Workshops for Junior/Senior PhDs, and Postdocs! 🌟 Don't miss these career-boosting opportunities! notion.so/List-of-Rising… Please share with your peers, students, and anyone who might benefit! #PhD #Postdoc #Academia #RisingStars

3 25 154 21K 115

Download Image

Daniel Russo @DanielRuss0

6 months ago

There are multiple postdoc positions available as part of an exciting new AI-agent initiative at Columbia that tackles challenges at the frontier of agentic systems and sequential decision-making. I am not very active here so please help me spread the word!

1 19 56 8K 13

Wen Sun @WenSun1

7 months ago

Extremely honored to receive this award. Credit goes to my collaborators, mentors, and especially my amazing students! #SloanFellow

Sloan Foundation @SloanFoundation

7 months ago

Extremely honored to receive this award. Credit goes to my collaborators, mentors, and especially my amazing students! #SloanFellow

6 30 167 259K 21

Download Image

11 6 81 7K 3

Emtiyaz Khan @EmtiyazKhan

7 months ago

List of accepted papers for AISTATS 2025 is now available. aistats.org/aistats2025/ Congratulations to the authors and thanks to the reviewers, AC, and SACs for their help. Thanks to my co-chair @ashipra & workflow chairs: Christopher Anders (RIKEN) & Tingting Ou (Columbia).

0 25 112 20K 20

Gergely Neu @neu_rips

8 months ago

check this out: new postdoc program for AI-related research in Catalunya! our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me. (retweets appreciated!) ramonllull-aira.eu/application

0 12 23 4K 6

Runzhe Wu @runzhe_wu

7 months ago

SMILING😊 is accepted to #ICLR2025! Do not miss it if you're seeking an imitation learning algorithm with rigorous theory and strong empirical results!

Runzhe Wu @runzhe_wu

11 months ago

SMILING😊 is accepted to #ICLR2025! Do not miss it if you're seeking an imitation learning algorithm with rigorous theory and strong empirical results!