Linh Le @linhlpv

PhD student at A2I2 Reinforcement Learning, Adaptation and Generalization linhlpv.github.io Joined December 2015

Tweets

200
Followers

70
Following

462
Likes

2K

Richard Sutton @RichardSSutton

3 weeks ago

I was happy to give a more technical talk on how we might create an AI at RLC-2025 and AGI-2025 (video below). The Oak Architecture: A Vision of Super-Intelligence from Experience As AI has become a huge industry, to an extent it has lost its way. What is needed to get us back on…

20 101 672 57K 507

Gabriele Berton @gabriberton

3 weeks ago

Finding an ML summer school has never been easier Here is a GitHub repo with a comprehensive list, with 50+ ML summer (and winter) schools all over the world (link in comments) Some of them are even free, few even offer scholarship so you don't have to pay absolutely anything

2 38 268 23K 370

Download Image

Alexandre Brown 🇨🇦 @AlexandreBrown0

3 weeks ago

🚀 I'm excited to share our new paper: SegDAC: Segmentation-Driven Actor-Critic for Visual Reinforcement Learning 🧠 SegDAC combines large vision models with online RL to reason about its environment at the object and sub-object level, avoiding noisy pixel-level reasoning. 🛠️…

5 17 80 10K 56

Download Video

Ilir Aliu - eu/acc @IlirAliu_

3 weeks ago

Humanoids finally move like humans… and can do more than copy. [Details + demos in thread 👇] A new framework, BeyondMimic, shows how to learn naturalistic whole-body control from human motion. But then goes further by composing those skills into versatile, zero-shot…

17 111 617 72K 236

Download Video

Joseph Suarez 🐡 @jsuarez5341

a month ago

At what point does perf optimization get ridiculous. During my PhD, everything was 500-5000 sps. Then I got 10k and was very proud. Then 100k in early versions of PufferLib. Then 1M in 2.0... and now we're at up to 6M productive SPS on some RL envs

6 8 225 12K 70

Download Image

Perry Dong @perryadong

2 months ago

Fine-tuning pre-trained robotic models with online RL requires a way to train RL with expressive policies Can we design an effective method for this? We propose EXPO, a sample-efficient online RL algorithm that enables stable fine-tuning of expressive policy classes (1/6)

1 10 57 37K 46

Cansu Sancaktar @CcansuSancaktar

2 months ago

✨Introducing SENSEI✨ We bring semantically meaningful exploration to model-based RL using VLMs. With intrinsic rewards for novel yet useful behaviors, SENSEI showcases strong exploration in MiniHack, Pokémon Red & Robodesk. Accepted at ICML 2025🎉 Joint work with @cgumbsch 🧵

2 36 149 12K 68

Download Gif

Nan Jiang @nanjiang_cs

2 months ago

missing ICML, and I used this week to write my first technical blog on some recent thoughts on two different roles of simulators in RL and the confusions/misconceptions around them. Comments welcome! nanjiang.cs.illinois.edu/2025/07/16/sim…

4 20 143 11K 100

Download Image

Eugene Vinitsky (@RLC) 🍒🦋 @EugeneVinitsky

2 months ago

The paper: arxiv.org/abs/2502.03349

0 1 11 934 3

Qiyang Li @qiyang_li

2 months ago

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N

3 69 364 38K 295

Download Video

Sergey Levine @svlevine

2 months ago

Warm-start RL (WSRL) can learn to control a real robot in under 20 minutes! Deep RL is getting really fast. Warm-start from offline data + super-efficient online learning is increasingly making real world RL not just practical but pretty easy.

Paul Zhou @zhiyuan_zhou_

2 months ago

9 38 253 82K 128

Download Video

4 63 385 57K 216

Andrew Gordon Wilson @andrewgwils

2 months ago

You don't _need_ a PhD (or any qualification) to do almost anything. A PhD is a rare opportunity to grow as an independent thinker in an academic environment, rather than immediately becoming a gear in a corporate agenda. It's definitely not for everyone!

Noam Brown @polynoamial

2 months ago

199 208 4K 1.3M 897

29 101 2K 197K 303

Karsten Kreis @karsten_kreis

2 months ago

📢📢 "Align Your Flow: Scaling Continuous-Time Flow Map Distillation" New flow map framework for state-of-the-art few-step generation, w/ the amazing @amsabour and @FidlerSanja. 🔥 Project page: research.nvidia.com/labs/toronto-a… 📜 Paper: arxiv.org/abs/2506.14603 🧵Thread below... (1/n)

3 37 189 21K 77

Download Video

Roger Creus Castanyer @creus_roger

3 months ago

🚨 Excited to share our new work: "Stable Gradients for Stable Learning at Scale in Deep Reinforcement Learning"! 📈 We propose gradient interventions that enable stable, scalable learning, achieving significant performance gains across agents and environments! Details below 👇

2 33 155 30K 116

Download Image

Sergey Levine @svlevine

3 months ago

Self-supervised representation learning looks a bit like RL. What if we literally use RL as a SSL method for visual representations? Turns out that it works quite well. In new work by @its_dibya, we show how this can be done: dibyaghosh.com/annotation_boo…

7 117 616 48K 489

Download Video

Maximilian Du @du_maximilian

3 months ago

Normally, changing robot policy behavior means changing its weights or relying on a goal-conditioned policy. What if there was another way? Check out DynaGuide, a novel policy steering approach that works on any pretrained diffusion policy. dynaguide.github.io 🧵

5 34 142 17K 68

Download Video

Aviral Kumar @aviral_kumar2

3 months ago

Our view on test-time scaling has been to train models to discover algos that enable them to solve harder problems. @setlur_amrith & @matthewyryang's new work e3 shows how RL done with this view produces best <2B LLM on math that extrapolates beyond training budget. 🧵⬇️…

2 30 183 12K 108

Download Image

Chongyi Zheng @chongyiz1

3 months ago

1/ How should RL agents prepare to solve new tasks? While prior methods often learn a model that predicts the immediate next observation, we build a model that predicts many steps into the future, conditioning on different user intentions: chongyi-zheng.github.io/infom.

5 34 189 62K 151

Download Gif

John Zhou @johnlyzhou

3 months ago

Hierarchical methods for offline goal-conditioned RL (GCRL) can scale to very distant goals that stymie flat (non-hierarchical) policies — but are they really necessary? Paper: arxiv.org/abs/2505.14975 Project page: johnlyzhou.github.io/saw/ Code: github.com/johnlyzhou/saw Thread ↓

2 13 64 5K 40

Download Video

Miguel Suau @SuauMiguel

3 months ago

Phaidra is hiring a Research Scientist to work on sequential decision-making problems. I'm at the RLDM conference in Dublin this week. If you're attending and would like to learn more about the role or the company, feel free to reach out! job-boards.greenhouse.io/phaidra/jobs/4…