Ph.D. candidate at CUHK. Former Visiting Scholar at Cornell. Working on reinforcement learning and multi-armed bandits.zhiyongwangwzy.github.io Hong KongJoined September 2021
How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel…
How can small LLMs match or even surpass frontier models like DeepSeek R1 and o3 Mini in math competition (AIME & HMMT) reasoning? Prior work seems to suggest that ideas like PRMs do not really work or scale well for long context reasoning. @kaiwenw_ai will reveal how a novel…
Happy to share our work "Provable Zero-Shot Generalization in Offline Reinforcement Learning" at ICML 2025!
📍 Poster | 🗓️July 16, 11:00 AM – 1:30 PM
📌 West Exhibition Hall B2-B3 #W-1012
🤖 How can offline RL agents generalize zero-shot to unseen environments?
We introduce…
Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better?
Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you…
Does RL actually learn positively under random rewards when optimizing Qwen on MATH? Is Qwen really that magical such that even RLing on random rewards can make it reason better?
Following prior work on spurious rewards on RL, we ablated algorithms. It turns out that if you…
Curious how to combine federated learning and in-context learning for QA tasks — with privacy preservation, efficiency, and boosting performance round by round?
🚀 Meet Fed-ICL — our framework collaboratively refines answers without transmitting model weights or sharing raw…
Tired of over-optimized generations that stray too far from the base distribution?
We present SLCD: Supervised Learning based Controllable Diffusion, which (provably) solves the KL constrained reward maximization problem for diffusion through supervised learning! (1/n)
by incorporating self-consistency during offline RL training, we unlock three orthogonal directions of scaling:
1. efficient training (i.e. limit backprop through time)
2. expressive model classes (e.g. flow matching)
3. inference-time scaling (sequential and parallel)
which,…
I won't be at #ICLR2025 myself this time around but please go talk to lead authors @nico_espinosa_d, @GaoZhaolin, and @runzhe_wu about their bleeding-edge algorithms for imitation learning and RLHF!
Heading to #ICLR2025 🇸🇬! Excited to connect with friends and chat about RL: theory, LLM reasoning and robotics!
I will present our Oral paper on LLM self-improvement📍4:18pm Sat. Join me if you want to learn about its scaling laws, iterative training and test-time improvement.
What is the place of exploration in today's AI landscape and in which settings can exploration algorithms address current open challenges?
Join us to discuss this at our exciting workshop at @icmlconf 2025: EXAIT!
exait-workshop.github.io#ICML2025
I think of misspecification (embodiment / sensory gaps) as the fundamental reason behavioral cloning isn't "all you need" for imitation as matching actions != matching outcomes. Introducing @nico_espinosa_d's #ICLR2025 paper proving that "local search" *is* all you need! [1/n]
Meet the recipients of the 2024 ACM A.M. Turing Award, Andrew G. Barto and Richard S. Sutton! They are recognized for developing the conceptual and algorithmic foundations of reinforcement learning. Please join us in congratulating the two recipients! bit.ly/4hpdsbD
🚀 Rising Star Workshops for Junior/Senior PhDs, and Postdocs!
🌟 Don't miss these career-boosting opportunities!
notion.so/List-of-Rising…
Please share with your peers, students, and anyone who might benefit! #PhD#Postdoc#Academia#RisingStars
There are multiple postdoc positions available as part of an exciting new AI-agent initiative at Columbia that tackles challenges at the frontier of agentic systems and sequential decision-making.
I am not very active here so please help me spread the word!
List of accepted papers for AISTATS 2025 is now available.
aistats.org/aistats2025/
Congratulations to the authors and thanks to the reviewers, AC, and SACs for their help.
Thanks to my co-chair @ashipra & workflow chairs: Christopher Anders (RIKEN) & Tingting Ou (Columbia).
check this out: new postdoc program for AI-related research in Catalunya!
our group is looking to hire within this program, ideally to work on topics related to RL theory. in case you're interested, pls DM or email me.
(retweets appreciated!)
ramonllull-aira.eu/application
520 Followers 817 FollowingPostdoc at @MIT | ex PhD student @Princeton | Exploring how to train AIs and their interaction with the world, while brewing my espresso.
613 Followers 1K FollowingProfessor at Texas A&M University; ML/AI researcher; optimization for ML/AI; large reasoning models, developing LibAUC library for training deep neural nets.
316 Followers 357 FollowingSenior undergraduate @thudcst.
Research intern @LTIatCMU (previously:@HKUST-NLP, @thukeg).
Interested in LLM & AI for Education/Research/Software Eng.
5 Followers 100 FollowingPostdoc @CMU_ECE & @UMassAmherst. Working on reinforcement learning to make optimal decisions in AI computing and communication systems.
2K Followers 840 FollowingAssistant Professor at @BristolUni, PhD from @UCL, prev. intern in @TikTok & @Microsoft. ✨ Reinforcement Learning, Causality, World Models.
13K Followers 689 FollowingResearch @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.
5K Followers 1 FollowingTao's my name and math is my thing:) I could say my greatest quality is that I have my own theorem, you know the green-tao theorem, everyone's heard of that:)
103 Followers 80 Following@WUSTL CSE PhD candidate . Former B.Eng. in Software Engineering @FudanUni AI researcher. Interested in LLM efficiency and reasoning.
298 Followers 279 FollowingPhD at University of Amsterdam, working on Information Retrieval and Natural Language Processing | Former Applied Scientist Intern at Amazon
110 Followers 21 Following2nd AI for Math Workshop @ ICML 2025
West Ballroom C, Vancouver Convention Center
July 18th, 2025 @ Vancouver, Canada (Hybrid)
5K Followers 828 FollowingPostdoc @LTIatCMU. PhD from Ohio State @osunlp. Author of MMMU, MAmmoTH. Training & evaluating foundation models. Opinions are my own.
195 Followers 412 FollowingIncoming Prof. at NUS Computer Science on AI4Sci & Machine Learning. Currently recruiting students (PhDs, Postdocs, RAs, interns etc). Email me if interested!
520 Followers 817 FollowingPostdoc at @MIT | ex PhD student @Princeton | Exploring how to train AIs and their interaction with the world, while brewing my espresso.
11 Followers 7 FollowingWorkshop on the Foundations of Post-training at COLT 2025 @AssocCompLearn. A deep dive into the theoretical and practical aspects of the post-training of LLMs.
No recent Favorites. New Favorites will appear here.