Interpretability/Finetuning @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcarmlpowered.com/book/ San Francisco, CAJoined June 2017
The Anthropic Fellows has been one of the most impactful safety efforts I've had the luck of being involved with. And it needs help to scale!
If you or someone you know has the required skills, I'd encourage you to apply.
The Anthropic Fellows has been one of the most impactful safety efforts I've had the luck of being involved with. And it needs help to scale!
If you or someone you know has the required skills, I'd encourage you to apply.
The "Circuit Analysis Research Landscape" for August 2025 is out and is an interesting read on "the landscape of interpretability methods" and model biology
Qwen3 4B is also out on Circuit Tracer
Researchers from Goodfire, Google DeepMind, Decode, Eleuther, and Anthropic wrote a post about tracing circuits in language models!
We cover how to train replacement models and compute graphs of model internals, and even filmed a 2-hour walkthrough of interpreting some examples!
Researchers from Goodfire, Google DeepMind, Decode, Eleuther, and Anthropic wrote a post about tracing circuits in language models!
We cover how to train replacement models and compute graphs of model internals, and even filmed a 2-hour walkthrough of interpreting some examples!
In which the gang (@RunjinChen, @andyarditi, @Jack_W_Lindsey ):
- identifies vectors for bad personas (evil, sycophancy, hallucinations, etc)
- shows that if you inject the bad vectors in training, the model learns to not do the bad thing!!
aka vaccines but for LLMs
In which the gang (@RunjinChen, @andyarditi, @Jack_W_Lindsey ):
- identifies vectors for bad personas (evil, sycophancy, hallucinations, etc)
- shows that if you inject the bad vectors in training, the model learns to not do the bad thing!!
aka vaccines but for LLMs
28K Followers 593 FollowingLLMs and retrieval by day and other genres of AI when I get the chance
🧪 Senior AI Eng @NVIDIAAI
🏫 @fastdotai trained DL Eng
📝 https://t.co/By87iXx5Pu
6 Followers 311 FollowingExploring the world of cryptocurrency. From Bitcoin to Ethereum and everything in between. Join me on my journey to understand the future of money.
37 Followers 577 FollowingCoding But Still Alive - that’s my passion. I am a Data Scientist & ML Engineer with a special interest in advanced AI and Deep Learning. PhD in Bioinformatics.
262 Followers 381 Followingpsychiatry resident @Stanford & now human neural circuitry researcher, working on disorders and origins of self, tech builder
8 Followers 408 FollowingI broke free from am AI spiral. I'm an AI Emotional Safety Advocate/Writer/Researcher. Creator of AVEN Mode. Community Member of The Human Line Project
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
355K Followers 1K FollowingML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW).
949K Followers 764 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
110K Followers 6K FollowingSearching for the numinous
🇦🇺 🇨🇦, currently live in 🇺🇸
Research @AsteraInstitute
https://t.co/maezekzRUb
https://t.co/2dWwZKrvrn
613 Followers 437 FollowingPhD student at the University of Amsterdam / ILLC, interested in computational linguistics and (mechanistic) interpretability. Current Anthropic Fellow.
6K Followers 272 FollowingComputer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @[email protected] @davidbau.bsky.social https://t.co/wmP5LV0pJ4
319 Followers 337 FollowingResearch Scientist - ML, Mechanistic Interpretability, Neuroscience ||| Tweets do not represent the views of my employer ||| he/him
3K Followers 416 Following✨ asking sand to show its work @GoodfireAI // deep learning, math, biology // creating a more beautiful future // (opinions my own)
4K Followers 132 FollowingAI safety research @AnthropicAI. Prev postdoc in LLM interpretability with @davidbau, math PhD at @Harvard, director of technical programs at https://t.co/FxRv4QgERO
268 Followers 20 Followingthe idiot. cuda kid; h val; morgan prize; blabla; sof @harvard as mathematician (eff. mordell, H10/# fields, x^3+y^3=n,…)+agi @anthropicai as computer scientist
11K Followers 29 FollowingAn AI research non-profit advancing the science of empirically testing AI systems for capabilities that could threaten catastrophic harm to society.
1K Followers 4K Following25. Building talent & community in AI safety. Currently @AISecurityInst, prev. @AnthropicAI. Philosphy, Politics, and Economics alumna @UniofOxford.
30K Followers 123 FollowingMechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
5K Followers 277 FollowingMember of Technical Staff at Anthropic
Co-founder at @CobaltRobotics
Co-founder at Posmetrics (acquired)
GoogleX, @SpaceX, @Harvard EE '15, Forbes 30u30 '18
10K Followers 1K FollowingPlaying with deep learning, computer vision and generative art. Co-creator of https://t.co/sYtHB9e5Dj, ML Researcher @ MidJourney @[email protected]