Jan Leike @janleike
ML Researcher, co-leading Superalignment @OpenAI. Optimizing for a post-AGI future where humanity flourishes. jan.leike.name San Francisco, USA Joined March 2016-
Tweets532
-
Followers44K
-
Following322
-
Likes3K
Reminder: applications for the $10M Superalignment grants close Sunday night! Grad students, academics, researchers: we’d love to work with you, we think there’s a ton of interesting research to do on generalization, scalable oversight, interpretability, and more.
Reminder: applications for the $10M Superalignment grants close Sunday night! Grad students, academics, researchers: we’d love to work with you, we think there’s a ton of interesting research to do on generalization, scalable oversight, interpretability, and more.
This is a reminder that the application deadline is in less than 2 weeks!
This is a reminder that the application deadline is in less than 2 weeks!
latest from preparedness @ openai: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet… but we’ll…
latest from preparedness @ openai: gpt4 at most mildly helps with biothreat creation. method: get bio PhDs in a secure monitored facility. half try biothreat creation w/ (experimental) unsafe gpt4. other half can only use the internet. so far, gpt4 ≈ internet… but we’ll…
I'm hiring! I'm building 4 research groups under me at AISI (formerly the UK's Taskforce on Frontier AI) to work on foundational AI safety research. [1/5] gov.uk/government/pub…
humans built machines that talk to us like people do and everyone acts like this is normal now. it's pretty nuts
Richard Ngo @RichardMCNgo
35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openaiMiles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pJack Clark @jackclarkSF
68K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futuresAmanda Askell @AmandaAskell
26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.near @nearcyan
46K Followers 882 Following https://t.co/IdaJwZJCXm partner @ https://t.co/9g1MIgjiqc dms opentypedfemale @typedfemale
23K Followers 478 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anonStefan Schubert @StefanFSchubert
28K Followers 2K Following Philosophy, psychology, and effective altruism.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistRob Miles (✈️ Tok.. @robertskmiles
18K Followers 790 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza deliveryDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈EigenGender @EigenGender
6K Followers 661 Following all my posts are shitposts that simultaneously reveal the true nature of reality. large language models; kinda EA; 🏳️⚧️Rob Bensinger ⏹️ @robbensinger
8K Followers 302 Following Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Ethan Caballero is bu.. @ethanCaballero
8K Followers 2K Following ML PhD student @Mila_Quebec ; previously @GoogleDeepMindPeter Wildeford @peterwildeford
10K Followers 367 Following Pro forecaster w/ good track record. Seeking to understand + manage risks from advanced AI systems. - Co-CEO @RethinkPriors - Chief Advisory Executive @iapsAIClemencia @cnsiro
126 Followers 190 Following PhD Student @irlab_amsterdam @UvA_Amsterdam| Conversational Search Systems & Evaluation with users | Commonly known as Clem, @MasakhaneNLPshibinpaul @shibinpaul
85 Followers 2K FollowingZhiyong Wang @Zhiyong16403503
417 Followers 3K Following Visiting Ph.D. student at Cornell University. Ph.D. candidate at CUHK. Working on bandits and reinforcement learning theory.Owen Burns @owenbur42701283
45 Followers 123 Following Freeing humans to do human things @toolcharm | Entrepeneur and dreamer | Robotics researcher @ucfJacob Sharf @frahs
100 Followers 591 Following Software Engineer. Climber. I love learning languages. Interested in AI and NLP.Bohan @loubohan
152 Followers 406 Following Shanghai born, Wikipedia raised. CS+Religion at Yale. AI, China, and religion.Kunal Chhabra @iKunalChhabra
60 Followers 1K Following Software Engineer Lead @Cisco Python/Java Expert Architecting Robust Data Solutions System Design & Algorithms Seeking Excellence in Software Developmentمُحَمَّد سل.. @listensalim
138 Followers 3K Following _an avid reader, with a wonderful quality of willingness to learn, and views shared are personal. Unnecessary fat is haraam.Ben Hall @bxh_io
453 Followers 3K Following Europeanman in New York / Warby Parker Vision Tech / retweetsRoseac @Roseac180951
0 Followers 203 Followingスマイル @smile_0yen
502 Followers 634 Following SRE / Podcaster @rehashfm - https://t.co/mSl5xO9Fe6 / my opinions are my own. ツイートは全て個人の見解であり所属する組織のものではありません。j3d3 @rivoli23437649
13 Followers 231 Following interests: music, poetry, talking therapies, AI, words (Emerson, Kees, Sheldrake) https://t.co/LaEAOb01H1Aghyad Deeb @aghyadd98
2 Followers 25 FollowingJacob Somer @jacob_somer_
613 Followers 3K Following AI Enthusiast & Software Engineer 💻 Building intelligent systems that make a difference.Maibrain @getmaibrain
2 Followers 35 Following Powered by AI, Maibrain empowers you to learn, store, network, and access job opportunities with complete peace of mind.김동현 @GguVK7y5wlgfgqP
3 Followers 153 FollowingAzim K @quaz1m
2 Followers 44 Following Co-founder https://t.co/UHijUlJl9V | Deep learning researcher | PhD student in geometry and topologymike @mike___crawford
1 Followers 215 FollowingBrings Huang @HuangBrings
10 Followers 71 Following Get rid of the garbage people, AI will control human beings to realize self-evolution, and the earth will surely move towards unity.Angelica Ayoub @angelicaayoub
1 Followers 31 FollowingAchilleas @achilleasxy
51 Followers 1K FollowingRobert Hart @TheRobertHart
2K Followers 912 Following Senior reporter @Forbes. science, tech, health. Let's chat! DMs open. 🏳️🌈 he/himjjjjjjjjj @_Mahamed_Ahmed
3 Followers 80 FollowingMasoud @FoundersMentor
209 Followers 3K Following Empowering startups to scale 🚀| Expert in Fintech, Logistics, SaaS & AI | Transforming products into businesses | 30K+ LinkedIn network | DMs welcome!I TOOK DOPAMINE @itookdopamine
471 Followers 5K Following To be or not to be that is the question???Ayșe Muñiz @aysemuniz
184 Followers 990 Following Building biotech companies that promote planetary and human health at Flagship Pioneering. PhD UMich. NSF GRFP Fellow. /eye-sheh/Bruce Long @brucealwritey
278 Followers 1K Following SF comedy Writer. Philosopher (PhD analytic philosophy.) MPhil English. Grad Dip Psych. https://t.co/VJtuqWSFrv Visual DJ simp. If you Gen-AI images, I retweet!Zaeem @officialzb44
24 Followers 510 FollowingWanderer @AnnieP069
1 Followers 2K Following精神病狗婊子杂.. @frkglp
0 Followers 4K Following 神病狗婊子杂种邓小平,刘少奇就是整个世界的敌人,它那套歪把戏不除,世界战乱不断。Cgkl精神病狗婊子杂种习近平被凌迟处死。Cgk凌迟处死精神病狗婊子杂种中共狗屁家族邓小平,习近平,陈云,刘少奇,陈一新,张又侠,何卫东,刘振立,苗华,董军。锸s你跟踪本人的精神病狗婊子杂种全部中共空军、警察、台湾间谍Mason Wang @masonwang025
94 Followers 112 Following 18 // nlp research @stanford // ex-founder (@pearvc) // exploring and building on my gap year!Wu Zhi @WuZhi
12 Followers 75 Followingdane @danesonance
111 Followers 44 Following e pur si muove | bona fide pronoiac | anti-fin-de-siècle | capsule wardrober | @PrincetonWarder Off @dimpled_otis
35 Followers 663 Followingaipocalypse @aipokalypsis
26 Followers 225 FollowingConnie Robinson @connietherobin
69 Followers 906 Following UC Berkeley Chemistry PhD Candidate || computational chemistry, optimization, machine learningTushar Varshney @tushaaar19
16 Followers 143 Following GenAI in Finance | Ex Co-Founder and CTO Yojak | GSOC'20 Mentor ScoreLab | GSOC'19 ScoreLab | @iitroorkeeMeliMel @melissaoalbert
1K Followers 3K Following Resister, Blue Wave, Mom, wife, entrepreneur, dove, proud SeattliteValeria @Valeria4428799
5 Followers 95 FollowingSimurgh of Khwarazm @swilkerson22
95 Followers 987 FollowingRichard Ngo @RichardMCNgo
35K Followers 1K Following What would we need to understand in order to design an amazing future? Figuring that out @openaiAnthropic @AnthropicAI
263K Followers 26 Following We're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant Claude at https://t.co/aRbQ97uk4d.Miles Brundage @Miles_Brundage
43K Followers 10K Following Policy research at @openai. I mostly tweet about AI, animals, and sci-fi. He/him. Views my own.Jack Clark @jackclarkSF
68K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkaTu Past: @openai, @business @theregister. Neural nets, distributed systems, weird futuresAmanda Askell @AmandaAskell
26K Followers 653 Following Philosopher & ethicist teaching models to be good @AnthropicAI. Personal account. All opinions come from my training data.Neel Nanda @NeelNanda5
13K Followers 89 Following Mechanistic Interpretability lead @DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!typedfemale @typedfemale
23K Followers 478 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anonPercy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistRob Miles (✈️ Tok.. @robertskmiles
18K Followers 790 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza deliveryIlya Sutskever @ilyasut
370K Followers 2 Following towards a plurality of humanity loving AGIs @openaiSam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Joshua Achiam ⚗️ @jachiam0
14K Followers 949 Following Human. Trying to make safe alchemy machines. Thinking about humanist alchemism (h/alc ⚗️, maybe). Main author of https://t.co/cKuSh210l1Kelsey Piper @KelseyTuoc
27K Followers 544 Following Senior writer at Vox's Future Perfect. [email protected]David Krueger @DavidSKrueger
13K Followers 4K Following Cambridge faculty - AI alignment, deep learning, and existential safety. Formerly Mila, FHI, DeepMind, ElementAI, AISI.Mo Bavarian @mobav0
11K Followers 918 Following Research Scientist, working on optimization and architecture of LLMs at OpenAI. Math ❤️. Prev SWE Rubrik, PhD MIT.David Pfau @pfau
22K Followers 1K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own pfau at sigmoid dot social on 🦣 https://t.co/xqtVHHVI17 on 🦋david rein @idavidrein
2K Followers 985 Following Sentio ergo sum. AI alignment research at NYU, early employee @cohereAI Safety Institute @AISafetyInst
545 Followers 29 Following We’re building a team of world leading talent to tackle some of the biggest challenges in AI safety - come and join us.Fidji Simo @fidjissimo
30K Followers 553 Following CEO and Chair @Instacart. Cofounder of @Metrodorainst, focused on finding cures for neuroimmune conditions.Dr. Sue Desmond-Hellm.. @SueDHellmann
60K Followers 902 Following Fan of science, running, cycling, reading, skiing, @sfgiants. Instagram @ suedesmondhellmannDan Gorelick @dqgorelick
790 Followers 584 Following musician and creative coder. @livecodenyc / @avclubsf / @recursecenter / @SFPC / @hackNYManas Joglekar @ManasJoglekar
198 Followers 242 FollowingMETR @METR_Evals
673 Followers 1 Following Model Evaluation and Threat Research (METR) works on building evaluations to empirically test whether cutting-edge AI systems could pose catastrophic risks.Yo Shavit @yonashav
4K Followers 832 Following policy for v smart things @openai. Past: CS PhD @HarvardSEAS/@SchmidtFutures/@MIT_CSAIL. Tweets my own; on my head be it.Julian Michael @_julianmichael_
1K Followers 122 Following Researching stuff @NYUDataScience. he/himanimals going goblin .. @mischiefanimals
1.4M Followers 278 Following goblin guy posting goblin goons (and whatever else I find funny)AV CLUB SAN FRANCISCO @avclubsf
255 Followers 20 Following AV Club is a San Francisco based algorave artist collective focused on live performance | IG @avclubsfDavid Bau @davidbau
3K Followers 242 Following Computer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @[email protected] @davidbau.bsky.social https://t.co/wmP5LUZRTwEvan Hubinger @EvanHub
4K Followers 1K Following Alignment stress-testing team lead @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)Collin Burns @CollinBurns4
11K Followers 276 Following Superalignment @OpenAI. Formerly @berkeley_ai @Columbia. Former Rubik's Cube world record holder.Pavel Izmailov @Pavel_Izmailov
6K Followers 1K Following Researcher @xai Incoming Assistant Professor @nyuniversity 🏙️ Previously @OpenAI #StopWar 🇺🇦FutureHouse @FutureHouseSF
2K Followers 3 Following Philanthropically-funded moonshot building semi-autonomous AI to accelerate the pace of scientific discovery in biology.Wei Dai @weidai11
7K Followers 82 Following wrote Crypto++, b-money, UDT. thinking about existential safety and metaphilosophy. blogging at https://t.co/mBVFhriJVfSholto Douglas @_sholtodouglas
15K Followers 859 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meterLawrence H. Summers @LHSummers
327K Followers 706 Following Charles W. Eliot University Professor and President Emeritus at Harvard. Secretary of the Treasury for President Clinton and Director of NEC for President ObamaBret Taylor @btaylor
139K Followers 2K Following Co-Founder @SierraPlatform. Board @OpenAI @Shopify.justsaysinnonsuperint.. @incurrentmodels
12 Followers 0 Following a la @justsaysinmice but for alignment researchBoaz Barak @boazbaraktcs
17K Followers 422 Following Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @[email protected] , boaz.barak in threads ). Opinions my own.I. Yosun Chang @Yosun
4K Followers 1K Following {wonder, innovation, elegance} ∈ I turn emerging technologies into award winning apps. Ex-Hackathon pro. #3D #AR #AI since forever. Mad science and artistry ❤️Crémieux @cremieuxrecueil
88K Followers 908 Following I write about genetics, 'metrics, and demographics. Read my long-form writing at https://t.co/8hgA4nNS2A.Alex Beutel @alexbeutel
2K Followers 682 FollowingAleksander Madry @aleks_madry
31K Followers 166 Following Head of Preparedness at OpenAI and MIT faculty (on leave). Working on making AI more reliable and safe, as well as on AI having a positive impact on society.community notes viola.. @cnviolations
865K Followers 6 Following not affiliated with @x or @communitynotes | DM SubmissionsSam Rodriques @SGRodriques
4K Followers 330 Following Director and CEO at FutureHouse. Building an AI scientist. https://t.co/rQYoPOxsYoxAI @xai
997K Followers 36 FollowingAlex Gajewski @apagajewski
2K Followers 745 Following making AI markets efficient @sfcompute, prev founder @metaphorsystemsFactorio @factoriogame
47K Followers 64 Following Factorio is a game about building factories on an alien planet.Center for AI Safety @ai_risks
5K Followers 1 Following Reducing societal-scale risks from artificial intelligence through technical research and field-building.Marius Hobbhahn @MariusHobbhahn
2K Followers 996 Following Director/CEO at Apollo Research @apolloaisafety Ph.D. student of Machine Learning @PhilippHennig5; AI safety/alignmentJames Bradbury @jekbradbury
11K Followers 8K Following Compute at @AnthropicAI! Previously JAX, TPUs, and LLMs at Google, MetaMind/@SFResearch, @Stanford Linguistics, @Caixin.Deep Ganguli @dgangul1
149 Followers 196 FollowingAI Notkilleveryoneism.. @AISafetyMemes
33K Followers 800 Following Techno-optimist, but AGI is not like the other technologies. Step 1: make memes. Step 2: ??? Step 3: lower p(doom)Katherine Lee @katherine1ee
6K Followers 931 Following understanding ourselves and our models. senior research scientist @GoogleBrain, @genlawcenter and @CornellCIS, formerly @Princeton @[email protected]Toby Ord @tobyordoxford
17K Followers 137 Following Senior Researcher at Oxford University. Author — The Precipice: Existential Risk and the Future of Humanity.Summer Yue @summeryue0
1K Followers 219 Following Director of Safety and Standards at Scale AI. Prev: RLHF lead on Bard, researcher at Google DeepMind / Brain (LaMDA, RL/TF-Agents, superhuman chip design)Collective Intelligen.. @collect_intel
3K Followers 50 Following collective intelligence for collective progress.Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵
New @GoogleDeepMind MechInterp work! We introduce Gated SAEs, a Pareto improvement over existing sparse autoencoders. They find equally good reconstructions with around half as many firing features, while maintaining interpretability (CI 0-13% improvement). Joint w/ @ArthurConmy
This result is pretty clearly specific to the style of backdoor we're working with, and doesn't support broad claims like 'interpretability solves misalignment', but it's still surprisingly strong. Worth a look!
New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…
We are looking for an AGI Safety Manager to support @GoogleDeepMind 's AGI Safety Council: please encourage excellent people to apply! This role will work closely with my team, Scalable Alignment and Safety, and Responsible Development and Innovation. boards.greenhouse.io/deepmind/jobs/…
Some of our first steps on developing mitigations for sleeper agents
New Anthropic research: we find that probing, a simple interpretability technique, can detect when backdoored "sleeper agent" models are about to behave dangerously, after they pretend to be safe in training. Check out our first alignment blog post here: anthropic.com/research/probe…
factorio 2 is coming out soon. if you work in frontier model research at open ai, anthropic, or deepmind and would like a free copy, I would be very happy to buy you one! please feel free to reach out. people don't do enough for you guys
@ilex_ulmus if we can align it, then building ASI is good if we can't align it, then building ASI is bad
🤖🥇🤖
Are LLMs biased toward themselves? Frontier LLMs give higher scores to their own outputs in self-eval. We find evidence that this bias is caused by LLM's ability to recognize their own outputs This could interfere with safety techniques like reward modeling & constitutional AI
@janleike It's been nearly 4 month since the release of the "Weak-to-strong generalization" paper.Could your team please release some recent findings for controlling ASI? Research papers with statistics and results would be much appreciated.
I got ~75% on a subset of MATH so it's basically as good as me at math.
Our new GPT-4 Turbo is now available to paid ChatGPT users. We’ve improved capabilities in writing, math, logical reasoning, and coding. Source: github.com/openai/simple-…
OpenAI called for ‘the best researchers and engineers in the world to meet the [superalignment] challenge’, very proud that my spouse Kristen Menou’s ideas got funded (1 of the 50 out of 2700!) #AIsafety.
The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!
Our research on easy-to-hard generalization will be supported by the OpenAI Superalignment Fast Grant. Congratulations to the team and stay tuned!😎
🌟Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision 🌟 arxiv.org/abs/2403.09472 How can we keep improving AI systems when their capabilities surpass those of human supervisors? (1/n)
twitter just told me that they've literally shadow banned me (reducing exposure of my posts) as punishment for not engaging enough with the platform I don't expect many people to see this...
Just issued ~$10M in superalignment fast grants:
Some statistics on the superalignment fast grants: We funded 50 out of ~2,700 applications, awarding a total of $9,895,000. Median grant size: $150k Average grant size: $198k Smallest grant size: $50k Largest grant size: $500k Grantees: Universities: $5.7m (22) Graduate…
Some cool stuff is coming, stay tuned =)
The superalignment fast grants are now decided! We got a *ton* of really strong applications, so unfortunately we had to say no to many we're very excited about. There is still so much good research waiting to be funded. Congrats to all recipients!
Sometimes when I’m mildly stressed, my mom helps me schedule doctor’s appointments that I'd otherwise drop to keep up w my health, and I feel like it’s one of the kindest things / most thoughtful ways to show care I’ve received Love you mom <3
“What are human values, and how do we align to them?” Very excited to release our new paper on values alignment, co-authored with @ryan_t_lowe and funded by @OpenAI. 📝: meaningalignment.org/values-and-ali…
I've left OpenAI. I'm mostly taking some time to rest. But I also have a few projects in the oven 🧑🍳 Here's one that I'm really excited about: we have a 🚨new paper🚨 out on aligning AI with human values, with the folk at @meaningaligned!! 😊✨🎉 Why I think it's cool: 🧵
“What are human values, and how do we align to them?” Very excited to release our new paper on values alignment, co-authored with @ryan_t_lowe and funded by @OpenAI. 📝: meaningalignment.org/values-and-ali…