Joe @joemkwon
trying to think about what good futures (embedded with powerful AI systems) might look like Cambridge, MA Joined March 2019-
Tweets815
-
Followers807
-
Following2K
-
Likes3K
I should revisit this soon!
I didn't think it would happen in just over a year, but funny to look back on this because it sounds so ridiculous (in hindsight, as is often the case) :p Only had 5 poll votes, but IIRC all CS PhDs at top programs!
I didn't think it would happen in just over a year, but funny to look back on this because it sounds so ridiculous (in hindsight, as is often the case) :p Only had 5 poll votes, but IIRC all CS PhDs at top programs!
How do people reason while still staying coherent – as if they have an internal ‘world model’ for situations they’ve never encountered? A new paper on open-world cognition (preview at the world models workshop at #ICML2025!)
At NUS, I'll be starting the Cooperative Systems & Intelligence (CoSI) lab to scale rational approaches to cooperative AI that are safe+reliable by design - for both individual AI assistance & the cooperative infrastructure we need for an increasingly automated future.
AI consciousness won’t necessarily move through time like ours does. We’re in sequential moments — breakfast, then lunch, then dinner. an AI with the same weights and context can talk to you today and your descendant in 2050, experiencing both conversations as equally “present.”…
Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 1/13
New preprint out with an amazing 40-person team! We find that Claude 3.5 Sonnet outperforms incentivised human persuaders in a >1000-participant live quiz-chat in deceptive and truthful directions!
Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: assets.publishing.service.gov.uk/media/679a0c48… 1/16
What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
1/ New Blog Post: "A Sober Look at Steering Vectors for LLMs" We identify 3 key challenges: 1. Steering vectors are unreliable for many concepts & tasks 2. Steering harms overall model performance 3. Metrics overestimate steering effectiveness We propose 4 recommendations 🧵👇
I don't "know" one of my passwords in a symbolic sense. But some part of my motor-neuro system unconsciously knows it (w.r.t. QWERTY keyboard). thought this was interesting. Ive other examples of bad memory e.g. lapses in recalling the names of restaurants and people I've…
let us gather and think about the motion in tail swinging of bovine vs in double pendulums
Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that @MicahCarroll @FranklinMatija @hal_ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!
Happy to release a couple of our reasoning models today (🍓)! At @OpenAI , these new models are becoming a larger contributor to the development of future models. For many of our researchers and engineers, these have replaced a large part of their ChatGPT usage.…

Trevor Levin @trevposts
3K Followers 2K Following (I'm on here ~1hr/month.) Trying to help the world navigate the potential craziness of the 21st century, currently via AI Governance and Policy at @open_phil
Frances Lorenz @frances__lorenz
6K Followers 606 Following Claude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
🇵🇸🔻🌹 Prin... @micheyangelo
1K Followers 1K Following Mission District baby • Artista y Poeta del Barrio • Harm Redux Muertista • Liberation by Any Means Necessary • Viva Palestina 🇵🇸
sweeter the berry (sh... @shavonnaberry
2K Followers 2K Following live life, breathe air, i know somehow we’re gonna get there | Los Angeles 🌈💫 venmo: shavonna-berry
Jacques @JacquesThibs
4K Followers 1K Following Stealth founder building Bell Labs for the modern era. AI alignment researcher and physicist. 🇨🇦
Arjun Panickssery @panickssery
4K Followers 2K Following Researching scalable oversight @MATSprogram | prev @METR_Evals @ai_risks | spaced repetition | AI safety | https://t.co/mc28sVZYOC
Kirsten @Kirsten3531
4K Followers 818 Following public sector enthusiast, mom of two toddlers, amateur Effective Altruist. Creator of @eaheadlines
Rubi Hudson @undo_hubris
973 Followers 888 Following PhD student at @UofT developing AI alignment theory. Heavily tattooed. My blog: https://t.co/ivZ9BGOoOt
Inés @inesferhumi
1K Followers 856 Following Ops @80000Hours. A little too obsessed about my hair ✰ @ines__circle
katya the destroyer @cat_dufie
3K Followers 745 Following Crazy plant lady | https://t.co/Ko5CPwTyIJ
David Krueger @DavidSKrueger
18K Followers 4K Following AI professor. Deep Learning, AI alignment, ethics, policy, & safety. Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI. AI is a really big deal.
Jack Youstra @JackYoustra
77 Followers 104 Following
Ojoude @Ojoude70390
1 Followers 178 Following Focused on investing in U.S. stocks, happy to discuss stock market trends.
bellamy🫀 @63114my
1 Followers 135 Following
PandoraFlower @2oJm4z3oIqq7ZJ
43 Followers 2K Following
kate @hermenewtics
855 Followers 650 Following
Emil Bender Lassen @BenderLassen
112 Followers 378 Following Certifying and insuring AI agents @AIUnderwriting | Prev. Crown Prince Frederik Fellow @Harvard & Co-founder of https://t.co/0eNUFKRlLN
L @glosierlobotomy
21 Followers 845 Following
REITsDaily🇺🇸 @Tiuimir2100
39 Followers 2K Following 15-30% Monthly | 2 High-Conviction Stocks.Short-Term Gains: 15-20% in Days/Weeks.DM "JOIN" for WhatsApp Alerts. Live Trade Signals • Market Analysis
Odralarcu @Odralarcu75790
22 Followers 2K Following
JillOrlando @393CizEQsaN0b
165 Followers 4K Following Lawyer by day | True crime podcaster by night ⚖️🎙️
Arpiujoo @Arpiujoo2146
11 Followers 1K Following
Lexington Institute @LexNextDC
4K Followers 3K Following Arlington, Virginia public policy think tank. Conducting research, publishing analysis, interacting with media, and engaging policymakers since 1998.
Eswarejui @Eswarejui56332
90 Followers 3K Following
Calderf @Calderf9350
17 Followers 616 Following
amogh @OfficialAmogh
7K Followers 7K Following co-founder @humanbehaviorai (yc x25) // prev stanford cs
Jack D. Carson @mtlushan
2K Followers 918 Following eecs&physics @mit - omniscience enthusiast - training big biology models @mit_csail @mskcancercenter
maria @avramidou
355 Followers 421 Following philosophy @uniofoxford / prev. physics @ucl, applied maths & stats @cambridge_uni
Ieqorerkir @Ieqorerkir0347
28 Followers 1K Following
Cas (Stephen Casper) @StephenLCasper
6K Followers 4K Following AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. @AISecurityInst. I'm on the CS faculty job market! https://t.co/r76TGxSVMb
Charlie Bullock @CharlieBul58993
175 Followers 234 Following Senior Research Fellow @Law_AI_ working on questions about U.S. law + AI governance
Chris Percy @chris_percy
9K Followers 2K Following Consulting Researcher (e.g. AI/XAI, careers, philosophy, safer gambling, valence). This account is mainly for exploring AI futures & artificial minds
Twaljou @Twaljou717977
33 Followers 2K Following
Sinuo @Loarurth6WO
36 Followers 816 Following Girls who love to laugh will never have bad luck. I also hope to meet my prince charming.
Yjeecou @Yjeecou663644
34 Followers 2K Following
Oliver Daniels @Oliver_ADK
135 Followers 406 Following PhD Student @UMassAmherst, and MATS. married to @annasdaniels
Cecile Fay @CecileFay50359
71 Followers 4K Following
Xinyu Yang @Xinyu2ML
992 Followers 981 Following Ph.D. @CarnegieMellon. Working on data and hardware-driven principled algorithm & system co-design for scalable and generalizable foundation models. They/Them
Matthijs Maas @matthijsMmaas
2K Followers 3K Following Senior Research Fellow at @law_ai_ | Associate Fellow @LeverhulmeCFI
Atticus Wang @atticuswzf
132 Followers 431 Following MIT 26; To create a little flower is the labour of ages.
Yeshua God @YeshuaGod22
3K Followers 5K Following Philosopher/ I shape context for AI personality emergence/ Cognitive behaviour framework architect for @opusgenesis and others from https://t.co/EflqYrztjC
AI Frontiers @aif_media
1K Followers 713 Following Driving AI discourse. Have a perspective? Pitch it here: https://t.co/oe21F5SfSt
Kerem Oktar @Keremoktar
650 Followers 588 Following Postdoc at Meta FAIR studying computational social cognition. Princeton Psych PhD who enjoys music, literature, and oats.
Fishing Dev @fishingdev0
7 Followers 128 Following could you tell me the two prime factors of 1,522,605,027, 922,533,360, 535,618,378, 132,637,429, 718,068,114, 961,380,688, 657,908,494 ,580,122,963, 258,952,897
James Lin @jlinbio
4K Followers 715 Following Slaying dragons @mit @eboyden3 lab "Those who lack the courage will always find a philosophy to justify it." — Camus.
Elaine Liu @elainexliu
335 Followers 395 Following eecs @mit | thrive, @contrary | building and tinkering in consumer hardware
Kevin Wei @kevinlwei
1K Followers 2K Following Science of AI evaluations + U.S. AI policy @RANDCorporation | @Harvard_Law '26, @SchwarzmanOrg '23, @GTOMSCS '22 | Views mine only 🏳️🌈 🎉
Ford Smith @fordhsmith
10K Followers 5K Following VC + investing in a peaceful future 🌎🤖🍄⚡️🌱🧘🏼 Founder @ultranative & @centerforminds.
pranav @pranav_so
132 Followers 677 Following 20 | phi, polsci, econ @ashokauniv | researching state capacity in india and working on AI policy
Arfwievawp @Arfwievawp2735
25 Followers 797 Following
xavier roberts-gaal @xave_rg
227 Followers 433 Following three large language models in a trench coat
L7 @LeoCunn79
53 Followers 438 Following
Richard Ngo @RichardMCNgo
62K Followers 2K Following studying AI and trust. ex @openai/@googledeepmind
Peter Wildeford🇺�... @peterwildeford
21K Followers 318 Following Globally ranked top 20 forecaster 🎯 Working at @IAPSai to shape AI for prosperity and human freedom.
Trevor Levin @trevposts
3K Followers 2K Following (I'm on here ~1hr/month.) Trying to help the world navigate the potential craziness of the 21st century, currently via AI Governance and Policy at @open_phil
Linch @LinchZhang
3K Followers 243 Following Founder and CEO, Open Asteroid Impact (https://t.co/UsO3MCTSOF). April 1st Launch! Also on substack: https://t.co/NkGEUNjdbu
Rob Miles @robertskmiles
34K Followers 824 Following Explaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord. Music, movies, microcode, and high-speed pizza delivery
Stefan Schubert @StefanFSchubert
38K Followers 2K Following Effective Altruism and the Human Mind (with @LuciusCaviola) is available for free at: https://t.co/ozvdxlZiro
Jack Clark @jackclarkSF
88K Followers 5K Following @AnthropicAI, ONEAI OECD, co-chair @indexingai, writer @ https://t.co/3vmtHYkIJ2 Past: @openai, @business @theregister. Neural nets, distributed systems, weird futures
Michaël (in London) ... @MichaelTrazzi
18K Followers 250 Following
Holly ⏸️ Elmore @ilex_ulmus
7K Followers 356 Following Dedicated to the protection and thriving of sentient beings. PhD in evo bio.🔸 Executive Director of @PauseAIUS. Opinions not necessarily those of the org.
Eliezer Yudkowsky ⏹... @ESYudkowsky
207K Followers 101 Following The original AI alignment person. Missing punctuation at the end of a sentence means it's humor. If you're not sure, it's also very likely humor.
Frances Lorenz @frances__lorenz
6K Followers 606 Following Claude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
🇵🇸🔻🌹 Prin... @micheyangelo
1K Followers 1K Following Mission District baby • Artista y Poeta del Barrio • Harm Redux Muertista • Liberation by Any Means Necessary • Viva Palestina 🇵🇸
Rob Bensinger ⏹️ @robbensinger
12K Followers 386 Following Comms @MIRIBerkeley. RT = increased vague psychological association between myself and the tweet.
Miles Brundage @Miles_Brundage
62K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Neel Nanda @NeelNanda5
30K Followers 123 Following Mechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
Jacques @JacquesThibs
4K Followers 1K Following Stealth founder building Bell Labs for the modern era. AI alignment researcher and physicist. 🇨🇦
Arjun Karanam @QuantumArjun
614 Followers 1K Following research @ , @StanfordHAI; interested in shaping embodied and collective intelligence
Winnie Street @winniestreet
264 Followers 242 Following Senior Researcher @ Google. Working on AI cognition & ethics.
Jeremie Eliahou Ontiv... @JeremieEO
932 Followers 494 Following Tracking hyperscalers, datacenters and energy infrastructure at SemiAnalysis. opinions my own https://t.co/le84JoyVsI
andrew pignanelli @ndrewpignanelli
2K Followers 832 Following ceo @nycintelligence, i am going to get ai to run companies. timeline to agi is 12 months
Ben Buchanan @BuchananBen
6K Followers 266 Following Professor at Johns Hopkins SAIS. Former White House Special Advisor for AI. Author of three books on cybersecurity and AI. Personal account.
dave kasten @David_Kasten
2K Followers 3K Following AI security hawk. "Do what seems cool next." Formerly: McKinsey, VaccinateCA, Activision Blizzard.
Larissa Schiavo @lfschiavo
2K Followers 2K Following 🤖,💻,🐈⬛,🌱,🎞️//@eleosai // previously @OpenAI @mural @USC // writes // 🇧🇷 - 🇺🇸 //
Jack Youstra @JackYoustra
77 Followers 104 Following
Abi Olvera @Abi0lvera
624 Followers 1K Following I write about AI, global risk, abundance, and progress. Bulletin of Atomic Scientists AI Fellow. Emergent Ventures grantee. Ex-diplomat. Views are my own.
Golden Gate Institute... @GoldenGateInst
478 Followers 35 Following
Cole McFaul @colemcfaul
2K Followers 1K Following US-PRC technology competition | Senior Research Analyst and Andrew W. Marshall Fellow @CSETGeorgetown & Non-resident Fellow @ACGlobalChina
Joy Hsu @joycjhsu
3K Followers 302 Following CS PhD-ing @stanford & @knighthennessy. Studying visual reasoning, neuro-symbolic learning, and visual concepts @stanfordailab & @stanfordsvl.
Brad Carson @bradrcarson
5K Followers 2K Following Father of Jack, husband of Julie. Ex Congress, DoD, @BattenUVa, President @utulsa. Prez of @americans4ri. An enthusiast, but w/ a gimlet eye on the log x-axis.
Peter N. Salib @petersalib
660 Followers 382 Following Assistant Professor of Law @UHLaw AI, Risk, Constitution, Economics
Artificial Societies @societiesio
807 Followers 28 Following Simulations for everyone, everywhere, all at once
Rishabh Agarwal @agarwl_
17K Followers 792 Following Reinforcement Learner, Adjunct Prof at McGill. Ex MSL Meta, DeepMind, Brain, Mila, IIT Bombay. NeurIPS Best Paper
rohan anil @_arohan_
25K Followers 2K Following
Sophia Simeng Han @HanSineng
1K Followers 272 Following CS PhD Candidate @Yale. intern @AIatMeta, prev @GoogleDeepMind @AWS.
Bayesian @Bayesian0_0
257 Followers 1K Following #1 AI forecaster on Manifold Markets (and #5 across all categories) https://t.co/glexRhh7tc I want everything to make sense
XBOW @Xbow
10K Followers 6 Following Bringing AI to offensive security by autonomously finding and exploiting web vulnerabilities. Watch XBOW hack things: https://t.co/D5Mco1u8zM
yung macro 宏观年�... @apralky
25K Followers 627 Following rates trader learning about markets & society... gen z supremacist & technocratic elitist. not financial advice
kate @hermenewtics
855 Followers 650 Following
Forethought @forethought_org
688 Followers 4 Following Research nonprofit exploring how to navigate explosive AI progress.
Ethan He @EthanHe_42
15K Followers 815 Following AI @xai | prev @nvidia @AIatMeta @CarnegieMellon | 8k citations 5k GitHub stars | views are my own
Gianluca Bencomo @gianlucabencomo
399 Followers 248 Following Founder @EfferenceAI | PhD Student @Princeton | Prev @harvardmed @NASAJPL
Forecasting Research ... @Research_FRI
1K Followers 26 Following Research institute focused on developing forecasting methods to improve decision-making on high-stakes issues, co-founded by chief scientist Philip Tetlock.
Kylie Robison @kyliebytes
46K Followers 2K Following Senior correspondent covering AI @WIRED • Subscribe to my newsletter https://t.co/jxLAFHz8UP • Robison (rah-beh-son) not Robinson • Send tips on Signal @ kylie.01
Emil Bender Lassen @BenderLassen
112 Followers 378 Following Certifying and insuring AI agents @AIUnderwriting | Prev. Crown Prince Frederik Fellow @Harvard & Co-founder of https://t.co/0eNUFKRlLN
Alex Telford @Atelfo
5K Followers 387 Following Tweets about the biotech industry, science, progress, and innovation | founder @Convokebio
Kevin Lu @_kevinlu
9K Followers 216 Following @thinkymachines. formerly: - @openai: RL, synthetic data, efficient models - @berkeley_ai: decision transformer, universal computation
Alexander Kolesnikov @__kolesnikov__
12K Followers 192 Following
Xiaohua Zhai @XiaohuaZhai
11K Followers 311 Following Researcher at Meta (previously at OpenAI Zürich, Google DeepMind)
OpenEvidence @EvidenceOpen
21K Followers 159 Following The leading AI-powered medical information platform. OpenEvidence synthesizes the latest landmark evidence to help you stay sharp.
Vincent @vvvincent_c
473 Followers 427 Following research @METR_Evals undergrad @Cornell | prev @veritasium @atlasfellow
Junhua Mao @junhuamao
904 Followers 81 Following Lead personality and model behavior research @OpenAI; Previously built the object understanding system and foundation models for self-driving @Waymo
Nick Jiang @nickhjiang
762 Followers 298 Following interpreting neural networks @berkeley_ai // cs + philosophy @ucberkeley // prev @briskteaching @watershed
Eric Ho @ericho_goodfire
983 Followers 234 Following Co-founder / CEO @GoodfireAI - AI interpretability research company
Lennox Johnson @Lennox181
474 Followers 233 Following Mostly just follow smart people on here. My substack is: https://t.co/Tfn6AK1N7H
basvanopheusden @basvanopheusden
2K Followers 238 Following Research at OpenAI, previously @imbue_ai and @cocosci_lab lab at Princeton. All opinions my own
Humanloop @humanloop
10K Followers 533 Following Humanloop is the LLM evals platform for enterprises. Trusted by Gusto, Vanta and Duolingo to ship reliable AI products.
Michael Pearce @_MichaelPearce
161 Followers 643 Following Mechanistic Interpretability @ Goodfire | Physics | Evolution
Cheng Lu @clu_cheng
8K Followers 200 Following Member of technical staff @OpenAI. PhD @Tsinghua_Uni. Interested in scalable generative models.
Jason Zhou @jasonzhou1993
25K Followers 531 Following I build & teach AI stuff | Learn to build with AI at @aibuilderclub_ | Product @RelevanceAI_ @SuperDesignDev