Interested in RL. Science @MistralAI. Prev Llama post-training @AIatMeta, Gemini post-training and deep RL research @Deepmind, PhD @Columbiarobintyh1.github.ioJoined November 2018
Eventually, humans will need to supervise superhuman AI - but how? Can we study it now?
We don't have superhuman AI, but we do have LLMs. We study protocols where a weaker LLM uses stronger ones to find better answers than it knows itself.
Does this work? It’s complicated: 🧵👇
Thanks @_akhaliq for promoting our work!
Unlike regular RL where golden r(s,a) are available and online is generally deemed better than offline, in RLHF this is less clear.
Complementary to some concurrent work, we investigate causes to the perf gap between online vs. offline.
Thanks @_akhaliq for promoting our work!
Unlike regular RL where golden r(s,a) are available and online is generally deemed better than offline, in RLHF this is less clear.
Complementary to some concurrent work, we investigate causes to the perf gap between online vs. offline.
Fast-forward ⏩ alignment research from @GoogleDeepMind ! Our latest results enhance alignment outcomes in Large Language Models (LLMs). Presenting NashLLM!
Interested in how
**non-contrastive representation learning for RL**
is magically equivalent to
**gradient-based PCA/SVD on the transition matrix**
and hence won't collapse and capture spectral info about the transition?
Come talk to us at #ICML2023 Hall 1 #308 at 1:30pm
Interested in how
**non-contrastive representation learning for RL**
is magically equivalent to
**gradient-based PCA/SVD on the transition matrix**
and hence won't collapse and capture spectral info about the transition?
Come talk to us at #ICML2023 Hall 1 #308 at 1:30pm
Even if all you want is a value function, using quantile TD (QTD) can give a better estimate than standard TD.
Today at #ICML2023, Mark Rowland presents our latest work on distributional RL in collaboration with @robinphysics, @clarelyle, Remi Munos, @marcgbellemare
#809 @ 2pm
Interested in how non-contrastive representation learning works in RL? We show
(1) Why representations do not collapses
(2) How it relates to gradient PCA / SVD of transition matrix
Understanding Self-Predictive Learning for RL #ICML2023@GoogleDeepMindarxiv.org/pdf/2212.03319
325K Followers 3K FollowingNVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
20K Followers 2K FollowingThis is the site where I talk about the attacks on science and immigration.
Science is on the other site.
Lab website: https://t.co/vrtbcqRyRn
26K Followers 876 FollowingResearch Scientist Director in Meta FAIR. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.
18K Followers 4K FollowingAssociate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
6K Followers 1K FollowingResearch scientist at @GoogleDeepMind, working on generative models, deep learning, RL. PhD from @stanford. Gemini Diffusion lead.
349 Followers 1K FollowingI post updates about the best LLMs! Here's my list; 🇨🇳 Qwen, DeepSeek, GLM, Kimi, StepFun, MiniMax, Hunyuan. 🇺🇸🇪🇺• GPT, Claude, Gemini, Grok, Mistral
264 Followers 8K FollowingThe 69 Controversies of AI Adoption | Spreading the Word on AI Adoption | From the author of The Last AI @The_Last_AI @s_m_sohn |5/25/25| https://t.co/eMyARc66RG
728 Followers 2K FollowingStand fast therefore in the liberty by which Christ has made us free, and do not be entangled again with a yoke of bondage.
Galations 5:1
46 Followers 215 FollowingWhere AI meets independence. From ruggedized on-prem & fine-tuned LLM nodes to zero-trust cloud systems. We build the AI layer that survives.
124 Followers 2K FollowingDefinite optimist - bred by effort. Software engineer - making an effort to put quality in quantity. Staying upwind. recovering from knowledge porn.
2K Followers 140 FollowingSilver Professor at NYU Courant and CDS, Research Scientist at FAIR
Research in Machine Learning, past in Quantum Computing & Finance. Posts my own.
100 Followers 1K FollowingPurdue Math PhD —Vanderbilt postdoc--VAP at UCI--Fisk AP
•Interpretable PDE in ML, Liquid crystals, and geometry/rock climbing.
949K Followers 764 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
11K Followers 723 Following"If there is not folly in the world, then the world itself is folly. You must understand that mistakes are not always regrets." - Paul Tobin, Bandette🤠
325K Followers 3K FollowingNVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
20K Followers 2K FollowingThis is the site where I talk about the attacks on science and immigration.
Science is on the other site.
Lab website: https://t.co/vrtbcqRyRn
45K Followers 64 FollowingStudent of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award
22K Followers 1K FollowingBuilding AI that makes autonomous decisions using world models, artificial curiosity, and temporal abstraction @GoogleDeepMind
16K Followers 349 FollowingCSO & co-founder, Reliant AI. Ex RL research lead at Google Brain, DeepMind. Known for Atari 2600 RL benchmark, Distributional RL (MIT Press 2023).
57K Followers 568 FollowingAssistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant September 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
2K Followers 140 FollowingSilver Professor at NYU Courant and CDS, Research Scientist at FAIR
Research in Machine Learning, past in Quantum Computing & Finance. Posts my own.
32K Followers 279 FollowingBlogger primarily on AI and AI x-risk but also other things at Don't Worry About the Vase (SS/WP/LW), founding Balsa Research to fix policy.
710 Followers 814 Following@PrincetonCS postdoc w/ Tom Griffiths @cocosci_lab |@StanfordAILab PhD w/ Benjamin Van Roy |@BrownCSDept BS+MS w/ Michael Littman @mlittmancs | RL & Info Theory
6K Followers 1K FollowingGroup Leader,
Physics of Intelligence Program at Harvard University
Physics of Artificial Intelligence Group, NTT Research, Inc.
42K Followers 109 Following• Center for AI Safety Director
• xAI and Scale AI advisor
• GELU/MMLU/MATH/HLE
• PhD in AI
• Analyzing AI models, companies, policies, and geopolitics
15K Followers 6K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
104 Followers 254 FollowingWorking on evaluation of AI models (via human and AI feedback), CS PhD candidate @Cambridge_Uni, prev. intern @Apple, latest project: https://t.co/9frWUrQipG
3K Followers 3K FollowingPost-Training Lead @ Together AI | OpenChat Project Lead (#1 7B LLM on Arena for 2+ months, 2M+ downloads) | DeepCoder, DeepSWE
3K Followers 256 FollowingLLMs and AI Research (Llama 2 & 3 lead) @Meta | ex @Google (PaLM lead, T5), ex @Baidu (Deep Speech 2, Sparse Neural Networks), ex @Nvidia
2K Followers 29 FollowingCo-Founder, CTO, @reflection_ai
DQN, AlphaGo, AlphaZero, MuZero, Gemini RLHF
Prev Senior Staff RS and founding eng @GoogleDeepMind
AGI one PR at a time