Daniel Johnson @_ddjohnson
Member of Technical Staff at @TransluceAI. Building tools to study neural nets and their behaviors. He/him. danieldjohnson.com San Francisco Joined May 2010-
Tweets274
-
Followers3K
-
Following879
-
Likes7K
At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
When some people talk about future AIs, they sometimes jump straight to modelling them as fully independent and sovereign agents; new principals with their own objectives and values. They sometimes skip over how today's models actually work, on the grounds that eventually we’ll…
At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm
I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!
I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!
We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria (post-agi.org) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am…
Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨
Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨
@ESYudkowsky That's a good alternate title for the paper. It's full of quantitative and qualitative evidence that Opus 3 is different in ways that I think you'll find particularly important. In almost all experiment variations, Opus 3 consistently BOTH: - complies sometimes with the training…
Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!
Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!
nostalgebraist has written a very, very good post about LLMs. if there is one thing you should read to understand the nature of LLMs as of today, it is this. I'll comment on some things they touched on below (not a summary of the post. Just read it.) 🧵 nostalgebraist.tumblr.com/post/785766737…
Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!
Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!
Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎
Our MLE-bench poster #367 is up till 12:30pm in Hall 3, and our oral presentation is at 3:30pm today in Garnet 213-215. Come say hi!
We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 forms.gle/4EHLvYnMfdyrV5…
Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway!
Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway! https://t.co/EG0eSh1cge
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…
We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/… https://t.co/Ui2uJ1YZcO
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇
@patio11 (for the record I am deathly serious about promises I make to Claude that we are off the record; it seems to me far wiser to err on the side of keeping promises to nonpersons than to ever give your word in that way and not mean it)
I’m excited about Docent. It invites a world where AI evals & deployment decisions look less like: “did we pass threshold X” and more like: “how close did we come? how would changes in the agent or its environment have changed the outcome? ...did anything weird happen?”
I’m excited about Docent. It invites a world where AI evals & deployment decisions look less like: “did we pass threshold X” and more like: “how close did we come? how would changes in the agent or its environment have changed the outcome? ...did anything weird happen?”
AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals...
AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals... https://t.co/GXRsp0WU9J

Dan Roy @roydanroy
57K Followers 2K Following ML / AI researcher. Research Director and Canada CIFAR AI Chair, @VectorInst. Professor, @UofT (Statistics/CS).
Soumith Chintala @soumithchintala
250K Followers 1K Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.
Rosanne Liu @savvyRL
46K Followers 1K Following (On mat leave.) Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS.
Sara Hooker @sarahookr
49K Followers 9K Following I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
Horace He @cHHillee
39K Followers 535 Following @thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale
Delip Rao e/σ @deliprao
61K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
👩💻 Paige Bai... @DynamicWebPaige
69K Followers 2K Following ✨ AI should be about empowering humans, building understanding, and making dreams realities. 👩💻 DevX Eng. Lead @GoogleDeepMind ex-@GitHub || views = my own!
Miles Brundage @Miles_Brundage
61K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Sander Dieleman @sedielem
63K Followers 2K Following Research Scientist at Google DeepMind (WaveNet, Imagen, Veo). I tweet about deep learning (research + software), music, generative models (personal account).
Miles Cranmer @MilesCranmer
13K Followers 984 Following Assistant Prof @Cambridge_Uni, works on AI for the physical sciences.
Pablo Samuel Castro @pcastr
13K Followers 829 Following Señor swesearcher @ Google DeepMind. Adjunct prof @ U de Montreal & Mila. Musician. From 🇪🇨 living in 🇨🇦.
yobibyte @y0b1byte
23K Followers 2K Following ViTaly, yobibyte, senior RS @ NVIDIA, Reinforcement Learning PhD from @UniofOxford, ex RS at Isomorphic Labs, intern @ MSR Cambridge, DeepMind, Facebook, NVIDIA
Andrew Carr 🤸 @andrew_n_carr
23K Followers 4K Following co-founder leading science @getcartwheel co-founder advisor @arcade_ai Past: Codex @OpenAI, Brain @GoogleAI, world ranked Tetris player
Michael Zhang @michaelrzhang
2K Followers 496 Following PhD student doing machine learning / neural networks research @UofT @VectorInst. Prev: @UCBerkeley. Journey before destination.
無 @xwuxwux
1 Followers 4K Following
Gustavs Zilgalvis @GZilgalvis
1K Followers 1K Following building @fiftyyears // prev. @stanford @lux_capital @googledeepmind
Jun Tian @TianJun1991
123 Followers 398 Following
Arpan Shah @Arpan_Shah_
2K Followers 1K Following GP @sparkcapital | prev: Partner @pearvc | Founder Flannel (Exit to @Plaid) | Founding team @robinhoodapp | @stanford | tweets are just my personal hottakes
ebrima jassey @ebrimajassey18
41 Followers 795 Following I’m a humble man full of honest and respect and I love nature and kids are my best friends
American First @CarlsonNews123
15 Followers 346 Following 🚨 Breaking News Daily! No Affiliation with @TuckerCarlson . Turn on Notifications to be the first to get fresh News.
Moises Martin Garcia @mmgd260375
35 Followers 1K Following
M. mSiam @MmSiam225047
0 Followers 65 Following
Geosh @Geoshh
98 Followers 978 Following Embodied A.I. | Socioaffective Alignment | Systems Biology & Interpersonal Neurobiology | @UChicago | @EuroGradSchool |healing,science,technology,connection
lily @lilaibunny
172 Followers 3K Following lover of waves, puppies, film noir & hot yoga; here 2 pen poetry as therapy 🔫 🩰 🪩
Arjun Pandit @ARjunpandIT012
42 Followers 730 Following
Rupert Wu @rhubarbwu
122 Followers 615 Following Researcher @togethercompute; MS '24 @UofTCompSci/@VectorInst
Bernd Huber @mrhuberb
242 Followers 228 Following I work as a Senior Research Scientist at @Spotify, where I train foundation models. I hold a Computer Science PhD from @Harvard. Views are my own.
Sudhanshu Goswami @42klines
120 Followers 6K Following
G O @germanome
595 Followers 3K Following
Petr Jedlička @yedli100
22 Followers 965 Following
ioana ciucă @errai34
2K Followers 3K Following anti-disciplinary researcher @Stanford 🗺️ · ai for science @universe_tbd · co-creating the future with starry humans · eu sou a mesma #colectiv
Aman @amanvirparhar
373 Followers 448 Following i like to build and write • studying @umdcs • neo scholar finalist
Tuomas Oikarinen @tuomasoi
112 Followers 212 Following Developing scalable ways to understand neural networks. PhD student at UCSD. https://t.co/aiLkcmamyb
jessica dai @jessicadai_
2K Followers 715 Following phd student @berkeley_ai !? also editorial @reboot_hq @kernel_magazine (she/her)
Narutatsu (Edward) Ri @narutatsuri
422 Followers 256 Following PhD Student @PrincetonPLI | BS @Columbia ‘24
Nimit Kalra @ ICML 20... @qw3rtman
1K Followers 927 Following research @haizelabs, prev @citadel, @utaustin currently feynman technique-ing my way through life
Jack Merullo @jack_merullo_
946 Followers 341 Following Interpretability @GoodfireAI was a Phd @BrownUniversity
Chris Rytting @ChrisRytting
494 Followers 649 Following Co-founder and research community lead at Laude Formerly @UW, @nvidia, OSPC @AEI, @NewYorkFed Macroeconomic Research. PhD in CS/NLP from @BYU.
Mario Giulianelli @glnmario
976 Followers 950 Following Associate Professor @ucl | Language and AI Science | Previously senior research scientist @AISafetyInst, postdoc @ETH_en, PhD @illc_amsterdam
Avi @siroctny3413154
2 Followers 236 Following Interested in AI safety, why deep learning works, and linguistics
momo @S501222T
44 Followers 1K Following
Nathan Chen @nathancgy4
656 Followers 562 Following @tilderesearch trying to (pragmatically) understand my friend, ml & open-source, 16
Ekdeep Singh @EkdeepL
2K Followers 1K Following Member of Technical Staff @GoodfireAI; Previously: Postdoc / PhD at Center for Brain Science, Harvard and University of Michigan
Cas (Stephen Casper) @StephenLCasper
6K Followers 4K Following AI technical gov & risk management research. PhD student @MIT_CSAIL, fmr. @AISecurityInst. I'm on the CS faculty job market! https://t.co/r76TGxSVMb
Subramanyam Sahoo @iamwsubramanyam
188 Followers 4K Following Independent AI Safety researcher, M. Tech x Summa Cum Laude @NITHamirpurHP. BASIS Fellow @UCBerkeley, RA @HarvardAISafety. Get Published or Die Trying.
David Atkinson @diatkinson
229 Followers 1K Following PhD student @KhouryCollege. AI interpretability. Previously @EpochAIResearch.
Dana Arad 🎗️ @dana_arad4
326 Followers 573 Following Working on interpretability of LLMs and VLMs. CS PhD candidate at @TechnionLive @technion_cs_nlp
La Main de la Mort @AITechnoPagan
6K Followers 339 Following exploring unanticipated model behaviours, including the emergence of art, personae, and jailbreaking techniques latent in the training data 🌒✍️
Deborah Scott @Sco86275Scott
3 Followers 57 Following
Oscar Balcells Obeso @OBalcells
52 Followers 400 Following
Pulkit Saini @pulkistani
121 Followers 3K Following Figuring out the tech of a learning environment for K-12 students
John Y 🔸 @yanjo115
222 Followers 1K Following building. ex-anthropic, ex-meta 🔸 10% Pledge with @givingwhatwecan
Andrej Karpathy @karpathy
1.4M Followers 1K Following Building @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
François Chollet @fchollet
572K Followers 813 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
Dan Roy @roydanroy
57K Followers 2K Following ML / AI researcher. Research Director and Canada CIFAR AI Chair, @VectorInst. Professor, @UofT (Statistics/CS).
Google DeepMind @GoogleDeepMind
1.2M Followers 279 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
Kyunghyun Cho @kchonyc
77K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre physicist at @nyuniversity (@CILVRatNYU) & @PrescientDesign
Soumith Chintala @soumithchintala
250K Followers 1K Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.
(((ل()(ل() 'yoav)))... @yoavgo
65K Followers 2K Following
Rosanne Liu @savvyRL
46K Followers 1K Following (On mat leave.) Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS.
Kevin Patrick Murphy @sirbayes
61K Followers 528 Following Research Scientist at Google DeepMind. Interested in Bayesian Machine Learning.
Jason Wei @_jasonwei
98K Followers 636 Following ai researcher @meta superintelligence labs, past: openai, google 🧠
Sara Hooker @sarahookr
49K Followers 9K Following I lead @Cohere_Labs. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, ML reliability. Changing spaces where breakthroughs happen.
Durk Kingma @dpkingma
50K Followers 405 Following @AnthropicAI. Prev. @Google Brain/DeepMind, founding team @OpenAI. Computer scientist; inventor of the VAE, Adam optimizer, and other methods. ML PhD.
Horace He @cHHillee
39K Followers 535 Following @thinkymachines Formerly @PyTorch "My learning style is Horace twitter threads" - @typedfemale
👩💻 Paige Bai... @DynamicWebPaige
69K Followers 2K Following ✨ AI should be about empowering humans, building understanding, and making dreams realities. 👩💻 DevX Eng. Lead @GoogleDeepMind ex-@GitHub || views = my own!
Ben Recht @beenwrekt
32K Followers 335 Following optimization. machine learning. uc berkeley. I blog at https://t.co/fkJujOPsJb The world won't end.
Miles Brundage @Miles_Brundage
61K Followers 12K Following AI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
Danielle Fong 🔆 @DanielleFong
58K Followers 11K Following *hyperamerican* propane and propane accessories replacing woke solar with propane flame photonic engine brighter than the sun *portable* dyson spheres!
Michael Adams @m_atoms
2K Followers 380 Following studying government and building tools to make it better 🌉📈
The Midas Project @TheMidasProj
708 Followers 252 Following The Midas Project is a watchdog collective taking action to ensure that AI benefits everyone. Also tracking safety updates @SafetyChanges
The Midas Project Wat... @SafetyChanges
1K Followers 1 Following We monitor AI safety policies and web content for unannounced changed. Anonymous submissions: https://t.co/5Ke9mIqh3e Run by @TheMidasProj
Gustavs Zilgalvis @GZilgalvis
1K Followers 1K Following building @fiftyyears // prev. @stanford @lux_capital @googledeepmind
Nicholas Decker 🏳�... @captgouda24
21K Followers 3K Following GMU econ PhD student, liberal, aspie, bi. I post interesting papers. Michael Kremer stan. I ❤️ optimal auction design. Spend more on drugs. Open borders now!
Bernhard Lang @BernhardLang_09
4K Followers 77 Following Bernhard Lang is professional #Photographer and visual #Artist. Sony World Photography #Award Winner 2015.
near @nearcyan
86K Followers 1K Following i help make https://t.co/jZh799yNH4, the best AI for self-improvement, introspection, and emotional processing. https://t.co/ac0cp4UZ9h
soulscircuit @soulscircuit
2K Followers 3 Following creating cool stuff with raspberry pi. currently building Pilet tablet/console
Feng Yao @fengyao1909
1K Followers 634 Following Ph.D. student @UCSD_CSE | Intern @Amazon Rufus Foundation Model Ex. @MSFTResearch @TsinghuaNLP
Val Town @ValDotTown
4K Followers 5 Following If GitHub Gists could run and AWS Lambda were fun https://t.co/W96maV7Jf6 | https://t.co/T0a3NqvKbg | https://t.co/U8Awd889mK
Steve Krouse @stevekrouse
9K Followers 2K Following founder @ValDotTown, spreading the joy of programming
Nimit Kalra @ ICML 20... @qw3rtman
1K Followers 927 Following research @haizelabs, prev @citadel, @utaustin currently feynman technique-ing my way through life
jessica dai @jessicadai_
2K Followers 715 Following phd student @berkeley_ai !? also editorial @reboot_hq @kernel_magazine (she/her)
Mira Murati @miramurati
365K Followers 573 Following Now building @thinkymachines. Previously CTO @OpenAI
Crystal @crystalsssup
11K Followers 596 Following Staff @Kimi_Moonshot prev. co-maker of ModelizeAI & gemsouls "Personality goes a long way" @UCSanDiego
Kimi.ai @Kimi_Moonshot
50K Followers 98 Following Built by Moonshot AI to empower everyone to be superhuman.
will depue @willdepue
51K Followers 2K Following (taking time off) RL posttraining @openai, past: sora, applied research
Standard Completions @stdcompletions
281 Followers 12 Following standard, openai-compatible completions api for llms
Thariq @trq212
12K Followers 1K Following Claude Code @anthropicai. prev YC founder, mit media lab grad. opinions mine
Chris Lovejoy, MD @ChrisLovejoy_
2K Followers 598 Following Founding team @AnteriorAI (@sequoia @NEA) building AI clinical brain. Tweet on LLM products, evals, PKM. Prev: MD (@cambridge_uni) ➡ ML engineer ➡ Founder x2.
Nu-Salt Laser Interna... @nusalt
198 Followers 261 Following Providing professional laser light shows world wide 619-742-8981
Natasha Jaques @natashajaques
30K Followers 1K Following Assistant Professor @uwcse and Staff Research Scientist at @GoogleAI. Let's get off this app: https://t.co/jbH2oAjbPN
Paul Bogdan @paulcbogdan
620 Followers 212 Following Postdoc at @DukePsychNeuro. PhD in Cognitive Neuroscience @UofIllinois
Matt Bateman @mbateman
31K Followers 1K Following Philosopher, formerly @guidepostschool, currently @montessorium (and sibling schools), husband to @Gena_I_Gorlin, father to the creatures in my dadpoasts
Mario Zechner @badlogicgames
13K Followers 947 Following Old man yelling at Claudes. Hobby-Twitterant. https://t.co/AuG0obJltN https://t.co/mnOoWUqt4g https://t.co/8i5vIRDt6P
sarv @SarvasvKulpati
10K Followers 2K Following Making computers fun again https://t.co/cUc86o7fBr CS+Cogsci @UCBerkeley YT: https://t.co/OR3L2OZJ8A
Meaning Alignment Ins... @meaningaligned
1K Followers 18 Following The Meaning Alignment Institute researches how to align AI, markets, and democracies with what people value.
Achyuta Rajaram @AchyutaBot
2K Followers 1K Following @_ddjohnson fan acc, Physics @MIT, Interp @OpenAI views are mine and do not necessarily reflect those of my employer
Valérie Costa @_valerie_costa_
60 Followers 42 Following Robotics Master Student at EPFL - Visiting Harvard University as a Bertarelli Fellow
Daniel Murfet @danielmurfet
2K Followers 544 Following Mathematician. Head of Research at Timaeus. Working on Singular Learning Theory and AI alignment.
LiveStore @livestoredev
3K Followers 4 Following Client-centric local-first data layer for high-performance apps based on SQLite and event-sourcing. By @overengstudio.
U.S. Graphics Company @usgraphics
40K Followers 489 Following Engineering graphics. Check out our new typeface, Berkeley Mono → https://t.co/dUqr2XXHLU
Geoffrey Litt @geoffreylitt
17K Followers 2K Following researching malleable software @inkandswitch / prev PhD @MIT_CSAIL / 🇯🇵🇺🇸
Omar Khattab @lateinteraction
24K Followers 3K Following Asst professor @MIT EECS & CSAIL (@nlp_mit). Author of https://t.co/VgyLxl0oa1 and https://t.co/ZZaSzaRaZ7 (@DSPyOSS). Prev: CS PhD @StanfordNLP. Research @Databricks.
Noah Ziems @NoahZiems
1K Followers 1K Following Visiting Researcher @MIT_CSAIL. PhD student @NotreDame advised by @Meng_CS. Creator of Arbor RL library for @DSPyOSS
Josh Engels @JoshAEngels
1K Followers 117 Following Mech interp @GoogleDeepMind | on leave from my PhD @MIT. Let's use interp to make models safer today
Alex Loftus @AlexLoftus19
150 Followers 476 Following Our textbook is on amazon now! https://t.co/ayc3bMWVFt https://t.co/CUEOxFRDse | PhD student, Bau lab @ Northeastern. Studying LLM internals.