Yunhao (Robin) Tang @robinphysics

Interested in RL. Science @MistralAI. Prev Llama post-training @AIatMeta, Gemini post-training and deep RL research @Deepmind, PhD @Columbia robintyh1.github.io Joined November 2018

Tweets

128
Followers

1K
Following

729
Likes

1K

Mistral AI @MistralAI

3 months ago

Announcing Magistral, our first reasoning model designed to excel in domain-specific, transparent, and multilingual reasoning.

108 456 3K 719K 581

Download Video

Eventually, humans will need to supervise superhuman AI - but how? Can we study it now? We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself. Does this work? It’s complicated: 🧵👇

5 61 245 53K 167

Download Image

Yunhao (Robin) Tang @robinphysics

a year ago

Thanks @_akhaliq for promoting our work! Unlike regular RL where golden r(s,a) are available and online is generally deemed better than offline, in RLHF this is less clear. Complementary to some concurrent work, we investigate causes to the perf gap between online vs. offline.

AK @_akhaliq

a year ago

2 25 76 13K 49

Download Image

0 4 16 2K 5

Michal Valko @misovalko

2 years ago

Fast-forward ⏩ alignment research from @GoogleDeepMind ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!

4 129 809 193K 516

Download Image

Yunhao (Robin) Tang @robinphysics

2 years ago

Interested in how **non-contrastive representation learning for RL** is magically equivalent to **gradient-based PCA/SVD on the transition matrix** and hence won't collapse and capture spectral info about the transition? Come talk to us at #ICML2023 Hall 1 #308 at 1:30pm

Yunhao (Robin) Tang @robinphysics

2 years ago

1 56 161 30K 110

Download Image

0 4 49 10K 21

Will Dabney @wwdabney

2 years ago

Even if all you want is a value function, using quantile TD (QTD) can give a better estimate than standard TD. Today at #ICML2023, Mark Rowland presents our latest work on distributional RL in collaboration with @robinphysics, @clarelyle, Remi Munos, @marcgbellemare #809 @ 2pm

1 3 31 3K 5

Yunhao (Robin) Tang @robinphysics

2 years ago

Interested in how non-contrastive representation learning works in RL? We show (1) Why representations do not collapses (2) How it relates to gradient PCA / SVD of transition matrix Understanding Self-Predictive Learning for RL #ICML2023 @GoogleDeepMind arxiv.org/pdf/2212.03319