Oleg Rybkin @_oleh

RL at scale at xAI olehrybkin.com Philadelphia Joined January 2014

Tweets

307
Followers

950
Following

424
Likes

1K

Aviral Kumar @aviral_kumar2

3 days ago

We have been doing work on scaling laws for off-policy RL for some time now and we just put a new paper out: arxiv.org/abs/2508.14881 Here, @preston_fu @_oleh lead a study on how to best allocate compute for training value functions in deep RL: 🧵⬇️

2 25 159 7K 93

Download Video

Sergey Levine @svlevine

3 days ago

Following up on our work on scaling laws for value-based RL (led by @_oleh and @preston_fu), we've been trying to figure out compute optimal parameters for value-based RL training. Check out Preston's post about our findings!

Preston Fu @preston_fu

4 days ago

6 24 147 32K 89

Download Video

3 18 186 17K 82

Paul Zhou @zhiyuan_zhou_

3 days ago

How can we best scale up value based RL? We need to use bigger models, which mitigate what we call “TD-overfitting” (more below!👇 🧵 ). Further, we need to scale batch size and UTD accordingly as the models get bigger. Great work led by @preston_fu and @_oleh

Preston Fu @preston_fu

4 days ago

6 24 147 32K 89

Download Video

1 1 11 659 1

Oleg Rybkin @_oleh

4 days ago

📈📈📈

Preston Fu @preston_fu

4 days ago

📈📈📈

6 24 147 32K 89

Download Video

0 0 9 813 0

Oleg Rybkin @_oleh

a month ago

Cool work by David and friends! Could this be the thing that finally makes everyone stop using Gaussians as their policies? 🤔

David McAllister @davidrmcall

a month ago

Cool work by David and friends! Could this be the thing that finally makes everyone stop using Gaussians as their policies? 🤔

8 200 1K 132K 931

Download Video

2 0 19 1K 5

Qiyang Li @qiyang_li

2 months ago

Everyone knows action chunking is great for imitation learning. It turns out that we can extend its success to RL to better leverage prior data for improved exploration and online sample efficiency! colinqiyangli.github.io/qc/ The recipe to achieve this is incredibly simple. 🧵 1/N

3 69 364 38K 295

Download Video

Oleg Rybkin @_oleh

3 months ago

Very insightful analysis that I mostly agree with (except the overly pessimistic title :)!

Seohong Park @seohong_park

3 months ago

Very insightful analysis that I mostly agree with (except the overly pessimistic title :)!

37 191 1K 163K 1K

Download Image

3 4 23 5K 10

Oleg Rybkin @_oleh

3 months ago

Really interesting result! Scaling value-based RL is hard and we are still missing much of the machinery to do it. @seohong_park shows that horizon is the critical issue.

Seohong Park @seohong_park

3 months ago

Really interesting result! Scaling value-based RL is hard and we are still missing much of the machinery to do it. @seohong_park shows that horizon is the critical issue.

10 146 921 136K 753

Download Video

0 3 17 2K 2

Seohong Park @seohong_park

3 months ago

We found a way to do RL *only* with BC policies. The idea is simple: 1. Train a BC policy π(a|s) 2. Train a conditional BC policy π(a|s, z) 3. Amplify(!) the difference between π(a|s, z) and π(a|s) using CFG Here, z can be anything (e.g., goals for goal-conditioned RL). 🧵↓

5 43 346 34K 302

Download Image

Paul Zhou @zhiyuan_zhou_

4 months ago

This was fun thanks for having me @chris_j_paxton @micoolcho! See the podcast for some livestream of the robot in real time and me evaluating a policy live! Or check it out for yourself at auto-eval.github.io and eval your policy in real without breaking a sweat

RoboPapers @RoboPapers

4 months ago

0 5 17 5K 5

Download Video

2 6 34 4K 9

Arthur Allshire @arthurallshire

4 months ago

our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy (w/ @redstone_hong @junyi42 @davidrmcall)

36 113 643 153K 207

Download Video

Amber Xie @amberxie_

4 months ago

@_oleh @DorsaSadigh @chelseabfinn To be presented at ICML 2025 as a *spotlight poster* :)

0 1 6 847 0

Aviral Kumar @aviral_kumar2

4 months ago

@_oleh will also present an oral talk on our recent work on building scaling laws for value-based RL. We find that value-based deep RL algorithms scale predictably. Talk at Workshop on robot learning (WRL), April 27. @sea_snell will then present the poster!…

1 3 9 902 0

Download Image

Oleg Rybkin @_oleh

4 months ago

Check out a new paper by @amberxie_! We show that you can do robotic imitation learning well by planning future latent states instead of actions with a diffusion model. This planning method is also more flexible, allowing you to use suboptimal and action-free data.

Amber Xie @amberxie_

4 months ago

3 44 357 37K 229

Download Video

1 1 12 6K 3

Chuning Zhu @chuning_zhu

5 months ago

Scaling imitation learning has been bottlenecked by the need for high-quality robot data, which are expensive to collect. But are we utilizing existing data to the fullest extent? A thread (1/11)