Jan Leike @janleike

ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes. jan.leike.name San Francisco, USA Joined March 2016

Tweets

532
Followers

44K
Following

322
Likes

3K

Leopold Aschenbrenner @leopoldasch

3 months ago

Reminder: applications for the $10M Superalignment grants close Sunday night! Grad students, academics, researchers: we’d love to work with you, we think there’s a ton of interesting research to do on generalization, scalable oversight, interpretability, and more.

OpenAI @OpenAI

5 months ago

182 467 3K 1.3M 586

4 16 58 28K 18

Jan Leike @janleike

3 months ago

This is a reminder that the application deadline is in less than 2 weeks!

Jan Leike @janleike

5 months ago

This is a reminder that the application deadline is in less than 2 weeks!

19 60 444 82K 146

1 9 44 18K 8

Tejal Patwardhan @tejalpatwardhan

3 months ago

latest from preparedness @ openai: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet… but we’ll…

OpenAI @OpenAI

3 months ago

173 348 2K 628K 286

7 20 147 46K 22

Yarin @yaringal

3 months ago

I'm hiring! I'm building 4 research groups under me at AISI (formerly the UK's Taskforce on Frontier AI) to work on foundational AI safety research. [1/5] gov.uk/government/pub…

14 153 811 148K 356

Jan Leike @janleike

4 months ago

humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts

49 100 1K 95K 66

Richard Ngo @RichardMCNgo

35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openai

Wojciech Zaremba @woj_zaremba

79K Followers 192 Following Co-Founder of OpenAI

Aran Komatsuzaki @arankomatsuzaki

95K Followers 78 Following @TeraflopAI

Miles Brundage @Miles_Brundage

43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.

Eric Jang @ericjang11

69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0p

@AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures

Jack Clark @jackclarkSF

68K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures

Julian @mealreplacer

16K Followers 1K Following AI safety

Philosopher & ethicist teaching models to be good @AnthropicAI.
Personal account. All opinions come from my training data.

Amanda Askell @AmandaAskell

26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.

near @nearcyan

46K Followers 882 Following https://t.co/IdaJwZJCXm partner @ https://t.co/9g1MIgjiqc dms open

typedfemale @typedfemale

23K Followers 478 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anon

Stefan Schubert @StefanFSchubert

28K Followers 2K Following Philosophy, psychology, and effective altruism.

Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | Pianist

Percy Liang @percyliang

49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | Pianist

Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord.

Music, movies, microcode, and high-speed pizza delivery

Rob Miles (✈️ Tok.. @robertskmiles

18K Followers 790 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza delivery

Nathan 🔍 @NathanpmYoung

15K Followers 3K Following Will bet $10 on any statement I make.

Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈

Delip Rao e/σ @deliprao

46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈

EigenGender @EigenGender

6K Followers 661 Following all my posts are shitposts that simultaneously reveal the true nature of reality. large language models; kinda EA; 🏳️‍⚧️

Rob Bensinger ⏹️ @robbensinger

8K Followers 302 Following Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.

Sam Bowman @sleepinyourhat

35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.

Ethan Caballero is bu.. @ethanCaballero

8K Followers 2K Following ML PhD student @Mila_Quebec ; previously @GoogleDeepMind

Pro forecaster w/ good track record. Seeking to understand + manage risks from advanced AI systems.

- Co-CEO @RethinkPriors
- Chief Advisory Executive @iapsAI

Peter Wildeford @peterwildeford

10K Followers 367 Following Pro forecaster w/ good track record. Seeking to understand + manage risks from advanced AI systems. - Co-CEO @RethinkPriors - Chief Advisory Executive @iapsAI

PhD Student @irlab_amsterdam @UvA_Amsterdam| Conversational Search Systems & Evaluation with users | Commonly known as Clem, @MasakhaneNLP

Clemencia @cnsiro

126 Followers 190 Following PhD Student @irlab_amsterdam @UvA_Amsterdam| Conversational Search Systems & Evaluation with users | Commonly known as Clem, @MasakhaneNLP

shibinpaul @shibinpaul

85 Followers 2K Following

Visiting Ph.D. student at Cornell University. Ph.D. candidate at CUHK. Working on bandits and reinforcement learning theory.

Zhiyong Wang @Zhiyong16403503

417 Followers 3K Following Visiting Ph.D. student at Cornell University. Ph.D. candidate at CUHK. Working on bandits and reinforcement learning theory.

Owen Burns @owenbur42701283

45 Followers 123 Following Freeing humans to do human things @toolcharm | Entrepeneur and dreamer | Robotics researcher @ucf

Jacob Sharf @frahs

100 Followers 591 Following Software Engineer. Climber. I love learning languages. Interested in AI and NLP.

Bohan @loubohan

152 Followers 406 Following Shanghai born, Wikipedia raised. CS+Religion at Yale. AI, China, and religion.

わ @my_name_is_sans

0 Followers 1K Following わかもの

Software Engineer Lead @Cisco
Python/Java Expert
Architecting Robust Data Solutions
System Design & Algorithms
Seeking Excellence in Software Development

Kunal Chhabra @iKunalChhabra

60 Followers 1K Following Software Engineer Lead @Cisco Python/Java Expert Architecting Robust Data Solutions System Design & Algorithms Seeking Excellence in Software Development

FabioUra_DEV @dev_ura

2 Followers 128 Following # Full-Stack developer My Socials ▽

Mariyan Zarev @BoldDomInA7oR

16 Followers 159 Following I slurp ramen

_an avid reader, with a wonderful quality of willingness to learn, and views shared are personal. Unnecessary fat is haraam.

مُحَمَّد سل.. @listensalim

138 Followers 3K Following _an avid reader, with a wonderful quality of willingness to learn, and views shared are personal. Unnecessary fat is haraam.

Argenis Fernandez @DBArgenis

9K Followers 2K Following Human.

Ali @cpsloal

119 Followers 2K Following 👨🏻‍💻

Ben Hall @bxh_io

453 Followers 3K Following Europeanman in New York / Warby Parker Vision Tech / retweets

Roseac @Roseac180951

0 Followers 203 Following

スマイル @smile_0yen

502 Followers 634 Following SRE / Podcaster @rehashfm - https://t.co/mSl5xO9Fe6 / my opinions are my own. ツイートは全て個人の見解であり所属する組織のものではありません。

j3d3 @rivoli23437649

13 Followers 231 Following interests: music, poetry, talking therapies, AI, words (Emerson, Kees, Sheldrake) https://t.co/LaEAOb01H1

Aghyad Deeb @aghyadd98

2 Followers 25 Following

Tezuesh Varshney @tezuesh

86 Followers 1K Following Coffee Nerd! AI Engineer at Samsung!

Jacob Somer @jacob_somer_

613 Followers 3K Following AI Enthusiast & Software Engineer 💻 Building intelligent systems that make a difference.

Maibrain @getmaibrain

2 Followers 35 Following Powered by AI, Maibrain empowers you to learn, store, network, and access job opportunities with complete peace of mind.

김동현 @GguVK7y5wlgfgqP

3 Followers 153 Following

Azim K @quaz1m

2 Followers 44 Following Co-founder https://t.co/UHijUlJl9V | Deep learning researcher | PhD student in geometry and topology

mike @mike___crawford

1 Followers 215 Following

Get rid of the garbage people, AI will control human beings to realize self-evolution, and the earth will surely move towards unity.

Brings Huang @HuangBrings

10 Followers 71 Following Get rid of the garbage people, AI will control human beings to realize self-evolution, and the earth will surely move towards unity.

Angelica Ayoub @angelicaayoub

1 Followers 31 Following

Achilleas @achilleasxy

51 Followers 1K Following

augustine @meaningrho

8 Followers 209 Following europoor takes for the training data

sidomukti sidomulyo �.. @3k4j6j

686 Followers 701 Following ngawula dhateng kawulaning Gusti

Robert Hart @TheRobertHart

2K Followers 912 Following Senior reporter @Forbes. science, tech, health. Let's chat! DMs open. 🏳️‍🌈 he/him

jjjjjjjjj @_Mahamed_Ahmed

3 Followers 80 Following

Empowering startups to scale 🚀| Expert in Fintech, Logistics, SaaS & AI | Transforming products into businesses | 30K+ LinkedIn network | DMs welcome!

Masoud @FoundersMentor

209 Followers 3K Following Empowering startups to scale 🚀| Expert in Fintech, Logistics, SaaS & AI | Transforming products into businesses | 30K+ LinkedIn network | DMs welcome!

I TOOK DOPAMINE @itookdopamine

471 Followers 5K Following To be or not to be that is the question???

Building biotech companies that promote planetary and human health at Flagship Pioneering. PhD UMich. NSF GRFP Fellow. /eye-sheh/

Ayșe Muñiz @aysemuniz

184 Followers 990 Following Building biotech companies that promote planetary and human health at Flagship Pioneering. PhD UMich. NSF GRFP Fellow. /eye-sheh/

SF comedy Writer. Philosopher (PhD analytic philosophy.) MPhil English. Grad Dip Psych.
https://t.co/VJtuqWSFrv
Visual DJ simp. If you Gen-AI images, I retweet!

Bruce Long @brucealwritey

278 Followers 1K Following SF comedy Writer. Philosopher (PhD analytic philosophy.) MPhil English. Grad Dip Psych. https://t.co/VJtuqWSFrv Visual DJ simp. If you Gen-AI images, I retweet!

Armin @arminsmailzade

212 Followers 724 Following Roj ☀️ Data | ML Engineer #FinTech 🛸💳 Ph.D. CS 📚

Zaeem @officialzb44

24 Followers 510 Following

Wanderer @AnnieP069

1 Followers 2K Following

神病狗婊子杂种邓小平，刘少奇就是整个世界的敌人，它那套歪把戏不除，世界战乱不断。Cgkl精神病狗婊子杂种习近平被凌迟处死。Cgk凌迟处死精神病狗婊子杂种中共狗屁家族邓小平，习近平，陈云，刘少奇，陈一新，张又侠，何卫东，刘振立，苗华，董军。锸s你跟踪本人的精神病狗婊子杂种全部中共空军、警察、台湾间谍

精神病狗婊子杂.. @frkglp

0 Followers 4K Following 神病狗婊子杂种邓小平，刘少奇就是整个世界的敌人，它那套歪把戏不除，世界战乱不断。Cgkl精神病狗婊子杂种习近平被凌迟处死。Cgk凌迟处死精神病狗婊子杂种中共狗屁家族邓小平，习近平，陈云，刘少奇，陈一新，张又侠，何卫东，刘振立，苗华，董军。锸s你跟踪本人的精神病狗婊子杂种全部中共空军、警察、台湾间谍

Mason Wang @masonwang025

94 Followers 112 Following 18 // nlp research @stanford // ex-founder (@pearvc) // exploring and building on my gap year!

Wu Zhi @WuZhi

12 Followers 75 Following

dane @danesonance

111 Followers 44 Following e pur si muove | bona fide pronoiac | anti-fin-de-siècle | capsule wardrober | @Princeton

Warder Off @dimpled_otis

35 Followers 663 Following

aipocalypse @aipokalypsis

26 Followers 225 Following

Connie Robinson @connietherobin

69 Followers 906 Following UC Berkeley Chemistry PhD Candidate || computational chemistry, optimization, machine learning

Tushar Varshney @tushaaar19

16 Followers 143 Following GenAI in Finance | Ex Co-Founder and CTO Yojak | GSOC'20 Mentor ScoreLab | GSOC'19 ScoreLab | @iitroorkee

MeliMel @melissaoalbert

1K Followers 3K Following Resister, Blue Wave, Mom, wife, entrepreneur, dove, proud Seattlite

Valeria @Valeria4428799

5 Followers 95 Following

Simurgh of Khwarazm @swilkerson22

95 Followers 987 Following

Richard Ngo @RichardMCNgo

35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openai

Wojciech Zaremba @woj_zaremba

79K Followers 192 Following Co-Founder of OpenAI

We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.

Anthropic @AnthropicAI

263K Followers 26 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.

Miles Brundage @Miles_Brundage

43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.

Jack Clark @jackclarkSF

68K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures

Amanda Askell @AmandaAskell

26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.

Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

Neel Nanda @NeelNanda5

13K Followers 89 Following Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!

typedfemale @typedfemale

23K Followers 478 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anon

Percy Liang @percyliang

49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | Pianist

Rob Miles (✈️ Tok.. @robertskmiles

18K Followers 790 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza delivery

Ilya Sutskever @ilyasut

370K Followers 2 Following towards a plurality of humanity loving AGIs @openai

Sam Bowman @sleepinyourhat

35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.

Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1

Joshua Achiam ⚗️ @jachiam0

14K Followers 949 Following Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1

Anders Sandberg @anderssandberg

25K Followers 71 Following Academic jack-of-all-trades.

Senior writer at Vox's Future Perfect. kelsey.piper@vox.com

Kelsey Piper @KelseyTuoc

27K Followers 544 Following Senior writer at Vox's Future Perfect. [email protected]

David Krueger @DavidSKrueger

13K Followers 4K Following Cambridge faculty - AI alignment, deep learning, and existential safety. Formerly Mila, FHI, DeepMind, ElementAI, AISI.

Mo Bavarian @mobav0

11K Followers 918 Following Research Scientist, working on optimization and architecture of LLMs at OpenAI. Math ❤️. Prev SWE Rubrik, PhD MIT.

Knowledge manifests itself in radiant dreams that shimmer like the wild sun
Views are my own
pfau at sigmoid dot social on 🦣
https://t.co/xqtVHHVI17 on 🦋

David Pfau @pfau

22K Followers 1K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own pfau at sigmoid dot social on 🦣 https://t.co/xqtVHHVI17 on 🦋

Boris Power @BorisMPower

25K Followers 99 Following Head of Applied Research @OpenAI

Jason Wei @_jasonwei

57K Followers 491 Following ai researcher @openai

david rein @idavidrein

2K Followers 985 Following Sentio ergo sum. AI alignment research at NYU, early employee @cohere

AI Safety Institute @AISafetyInst

545 Followers 29 Following We’re building a team of world leading talent to tackle some of the biggest challenges in AI safety - come and join us.

Fidji Simo @fidjissimo

30K Followers 553 Following CEO and Chair @Instacart. Cofounder of @Metrodorainst, focused on finding cures for neuroimmune conditions.

Dr. Sue Desmond-Hellm.. @SueDHellmann

60K Followers 902 Following Fan of science, running, cycling, reading, skiing, @sfgiants. Instagram @ suedesmondhellmann

Dan Gorelick @dqgorelick

790 Followers 584 Following musician and creative coder. @livecodenyc / @avclubsf / @recursecenter / @SFPC / @hackNY

Manas Joglekar @ManasJoglekar

198 Followers 242 Following

Model Evaluation and Threat Research (METR) works on building evaluations to empirically test whether cutting-edge AI systems could pose catastrophic risks.

METR @METR_Evals

673 Followers 1 Following Model Evaluation and Threat Research (METR) works on building evaluations to empirically test whether cutting-edge AI systems could pose catastrophic risks.

Yo Shavit @yonashav

4K Followers 832 Following policy for v smart things @openai. Past: CS PhD @HarvardSEAS/@SchmidtFutures/@MIT_CSAIL. Tweets my own; on my head be it.

Tsarathustra @tsarnick

22K Followers 3K Following Boy, accelerated

Julian Michael @_julianmichael_

1K Followers 122 Following Researching stuff @NYUDataScience. he/him

animals going goblin .. @mischiefanimals

1.4M Followers 278 Following goblin guy posting goblin goons (and whatever else I find funny)

AV CLUB SAN FRANCISCO @avclubsf

255 Followers 20 Following AV Club is a San Francisco based algorave artist collective focused on live performance | IG @avclubsf

Computer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @davidbau@sigmoid.social @davidbau.bsky.social https://t.co/wmP5LUZRTw

David Bau @davidbau

3K Followers 242 Following Computer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @[email protected] @davidbau.bsky.social https://t.co/wmP5LUZRTw

Evan Hubinger @EvanHub

4K Followers 1K Following Alignment stress-testing team lead @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)

Collin Burns @CollinBurns4

11K Followers 276 Following Superalignment @OpenAI. Formerly @berkeley_ai @Columbia. Former Rubik's Cube world record holder.

Pavel Izmailov @Pavel_Izmailov

6K Followers 1K Following Researcher @xai Incoming Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦

Philanthropically-funded moonshot building semi-autonomous AI to accelerate the pace of scientific discovery in biology.

FutureHouse @FutureHouseSF

2K Followers 3 Following Philanthropically-funded moonshot building semi-autonomous AI to accelerate the pace of scientific discovery in biology.

Wei Dai @weidai11

7K Followers 82 Following wrote Crypto++, b-money, UDT. thinking about existential safety and metaphilosophy. blogging at https://t.co/mBVFhriJVf

Sholto Douglas @_sholtodouglas

15K Followers 859 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meter

Charles W. Eliot University Professor and President Emeritus at Harvard. Secretary of the Treasury for President Clinton and Director of NEC for President Obama

Lawrence H. Summers @LHSummers

327K Followers 706 Following Charles W. Eliot University Professor and President Emeritus at Harvard. Secretary of the Treasury for President Clinton and Director of NEC for President Obama

Bret Taylor @btaylor

139K Followers 2K Following Co-Founder @SierraPlatform. Board @OpenAI @Shopify.

justsaysinnonsuperint.. @incurrentmodels

12 Followers 0 Following a la @justsaysinmice but for alignment research

Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @boazbaraktcs@sigmoid.social , boaz.barak in threads ). Opinions my own.

Boaz Barak @boazbaraktcs

17K Followers 422 Following Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @[email protected] , boaz.barak in threads ). Opinions my own.

{wonder, innovation, elegance} ∈ I turn emerging technologies into award winning apps. Ex-Hackathon pro. #3D #AR #AI since forever. Mad science and artistry ❤️

I. Yosun Chang @Yosun

4K Followers 1K Following {wonder, innovation, elegance} ∈ I turn emerging technologies into award winning apps. Ex-Hackathon pro. #3D #AR #AI since forever. Mad science and artistry ❤️

Crémieux @cremieuxrecueil

88K Followers 908 Following I write about genetics, 'metrics, and demographics. Read my long-form writing at https://t.co/8hgA4nNS2A.

Alex Beutel @alexbeutel

2K Followers 682 Following

Jakub Pachocki @merettm

21K Followers 0 Following OpenAI

Head of Preparedness at OpenAI and MIT faculty (on leave). Working on making AI more reliable and safe, as well as on AI having a positive impact on society.

Aleksander Madry @aleks_madry

31K Followers 166 Following Head of Preparedness at OpenAI and MIT faculty (on leave). Working on making AI more reliable and safe, as well as on AI having a positive impact on society.

community notes viola.. @cnviolations

865K Followers 6 Following not affiliated with @x or @communitynotes | DM Submissions

Louis Martin @louismrt

1K Followers 557 Following Research Scientist at Mistral AI.

Sam Rodriques @SGRodriques

4K Followers 330 Following Director and CEO at FutureHouse. Building an AI scientist. https://t.co/rQYoPOxsYo

xAI @xai

997K Followers 36 Following

Alex Gajewski @apagajewski

2K Followers 745 Following making AI markets efficient @sfcompute, prev founder @metaphorsystems

Factorio @factoriogame

47K Followers 64 Following Factorio is a game about building factories on an alien planet.

Center for AI Safety @ai_risks

5K Followers 1 Following Reducing societal-scale risks from artificial intelligence through technical research and field-building.

Director/CEO at Apollo Research @apolloaisafety
Ph.D. student of Machine Learning @PhilippHennig5; AI safety/alignment

Marius Hobbhahn @MariusHobbhahn

2K Followers 996 Following Director/CEO at Apollo Research @apolloaisafety Ph.D. student of Machine Learning @PhilippHennig5; AI safety/alignment

Apollo Research @apolloaisafety

1K Followers 10 Following We are an AI evals research organisation

Leopold Aschenbrenner @leopoldasch

13K Followers 4K Following superalignment @ openai

Soren Iverson @soren_iverson

264K Followers 116 Following New ideas daily.

Vessel Of Spirit @VesselOfSpirit

3K Followers 0 Following BAC, THIS.

The Onion @TheOnion

11.6M Followers 6 Following America's Finest News Source.

James Bradbury @jekbradbury

11K Followers 8K Following Compute at @AnthropicAI! Previously JAX, TPUs, and LLMs at Google, MetaMind/@SFResearch, @Stanford Linguistics, @Caixin.

Deep Ganguli @dgangul1

149 Followers 196 Following

AI Notkilleveryoneism.. @AISafetyMemes

33K Followers 800 Following Techno-optimist, but AGI is not like the other technologies. Step 1: make memes. Step 2: ??? Step 3: lower p(doom)

understanding ourselves and our models. senior research scientist @GoogleBrain, @genlawcenter and @CornellCIS, formerly @Princeton

@katherinelee@sigmoid.social

Katherine Lee @katherine1ee

6K Followers 931 Following understanding ourselves and our models. senior research scientist @GoogleBrain, @genlawcenter and @CornellCIS, formerly @Princeton @[email protected]

Toby Ord @tobyordoxford

17K Followers 137 Following Senior Researcher at Oxford University. Author — The Precipice: Existential Risk and the Future of Humanity.

Director of Safety and Standards at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design)

Summer Yue @summeryue0

1K Followers 219 Following Director of Safety and Standards at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design)

Daniel Paleka @dpaleka

3K Followers 471 Following ai safety researcher | phd @CSatETH

Collective Intelligen.. @collect_intel

3K Followers 50 Following collective intelligence for collective progress.

Jacob Pfau @jacob_pfau

5 days ago

Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵

40 178 1K 250K 907

Download Image

Senthooran Rajamanoharan @sen_r

7 days ago

New @GoogleDeepMind MechInterp work! We introduce Gated SAEs, a Pareto improvement over existing sparse autoencoders. They find equally good reconstructions with around half as many firing features, while maintaining interpretability (CI 0-13% improvement). Joint w/ @ArthurConmy

5 24 159 21K 87

Download Image

Sam Bowman @sleepinyourhat

a week ago

This result is pretty clearly specific to the style of backdoor we're working with, and doesn't support broad claims like 'interpretability solves misalignment', but it's still surprisingly strong. Worth a look!

Anthropic @AnthropicAI

a week ago

New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…

37 166 973 265K 441

Download Image

2 4 68 8K 17

Allan Dafoe @AllanDafoe

a week ago

We are looking for an AGI Safety Manager to support @GoogleDeepMind 's AGI Safety Council: please encourage excellent people to apply! This role will work closely with my team, Scalable Alignment and Safety, and Responsible Development and Innovation. boards.greenhouse.io/deepmind/jobs/…

9 18 78 9K 25

Ethan Perez @EthanJPerez

a week ago

Some of our first steps on developing mitigations for sleeper agents

Anthropic @AnthropicAI

a week ago

37 166 973 265K 441

Download Image

0 0 49 4K 5

Ronny Fernandez 🔍⏸️ @RatOrthodox

a week ago

factorio 2 is coming out soon. if you work in frontier model research at open ai, anthropic, or deepmind and would like a free copy, I would be very happy to buy you one! please feel free to reach out. people don't do enough for you guys

55 112 2K 257K 183

Leo Gao @nabla_theta

2 weeks ago

@ilex_ulmus if we can align it, then building ASI is good if we can't align it, then building ASI is bad

5 0 28 988 2

Sam Bowman @sleepinyourhat

2 weeks ago

🤖🥇🤖

Arjun Panickssery is in London @panickssery

2 weeks ago

Are LLMs biased toward themselves? Frontier LLMs give higher scores to their own outputs in self-eval. We find evidence that this bias is caused by LLM's ability to recognize their own outputs This could interfere with safety techniques like reward modeling & constitutional AI

8 46 319 64K 222

Download Image

1 3 68 10K 18

Huifeng Ou @HuifengOu

3 weeks ago

@janleike It's been nearly 4 month since the release of the "Weak-to-strong generalization" paper.Could your team please release some recent findings for controlling ASI? Research papers with statistics and results would be much appreciated.

1 0 1 180 0

Dan Hendrycks @DanHendrycks

3 weeks ago

I got ~75% on a subset of MATH so it's basically as good as me at math.

OpenAI @OpenAI

3 weeks ago

Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source: github.com/openai/simple-…

592 1K 7K 6.1M 1K

Download Image

11 15 402 90K 67

Diana Valencia @Valencia_planet

3 weeks ago

OpenAI called for ‘the best researchers and engineers in the world to meet the [superalignment] challenge’, very proud that my spouse Kristen Menou’s ideas got funded (1 of the 50 out of 2700!) #AIsafety.

Jan Leike @janleike

3 weeks ago

The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!

13 14 242 88K 45

1 0 5 793 1

Zhiqing Sun @EdwardSun0909

3 weeks ago

Our research on easy-to-hard generalization will be supported by the OpenAI Superalignment Fast Grant. Congratulations to the team and stay tuned!😎

Zhiqing Sun @EdwardSun0909

a month ago

🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟 arxiv.org/abs/2403.09472 How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)

6 50 234 95K 197

Download Image

10 13 355 53K 83

Download Image

Yarin @yaringal

3 weeks ago

twitter just told me that they've literally shadow banned me (reducing exposure of my posts) as punishment for not engaging enough with the platform I don't expect many people to see this...

11 2 91 16K 4

Download Image

SIX EDGE @six_edge

3 weeks ago

@janleike Thanks for the transparency 👏🏻

0 0 1 544 0

Giorgi (orb) Orbeliani @G_Orbeliani

3 weeks ago

@janleike guys, that's amazing stats

0 0 1 1K 0

Greg Brockman @gdb

3 weeks ago

Just issued ~$10M in superalignment fast grants:

Jan Leike @janleike

3 weeks ago

Some statistics on the superalignment fast grants: We funded 50 out of ~2,700 applications, awarding a total of $9,895,000. Median grant size: $150k Average grant size: $198k Smallest grant size: $50k Largest grant size: $500k Grantees: Universities: $5.7m (22) Graduate…

11 16 153 106K 55

22 23 286 91K 27

Ashwinee Panda @PandaAshwinee

3 weeks ago

Some cool stuff is coming, stay tuned =)

Jan Leike @janleike

3 weeks ago

13 14 242 88K 45

6 2 146 43K 21

Download Image

Laura 🌲 ⛰️ @LauraDeming

3 weeks ago

Sometimes when I’m mildly stressed, my mom helps me schedule doctor’s appointments that I'd otherwise drop to keep up w my health, and I feel like it’s one of the kindest things / most thoughtful ways to show care I’ve received Love you mom <3

2 1 100 10K 9

Joe Edelman @edelwax

a month ago

“What are human values, and how do we align to them?” Very excited to release our new paper on values alignment, co-authored with @ryan_t_lowe and funded by @OpenAI. 📝: meaningalignment.org/values-and-ali…

25 71 340 262K 393

Download Image

Ryan Lowe @ryan_t_lowe

a month ago

I've left OpenAI. I'm mostly taking some time to rest. But I also have a few projects in the oven 🧑‍🍳 Here's one that I'm really excited about: we have a 🚨new paper🚨 out on aligning AI with human values, with the folk at @meaningaligned!! 😊✨🎉 Why I think it's cool: 🧵