Emmanuel Ameisen @mlpowered

Interpretability/Finetuning @AnthropicAI Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar mlpowered.com/book/ San Francisco, CA Joined June 2017

Tweets

2K
Followers

10K
Following

235
Likes

5K

Emmanuel Ameisen @mlpowered

3 days ago

The Anthropic Fellows has been one of the most impactful safety efforts I've had the luck of being involved with. And it needs help to scale! If you or someone you know has the required skills, I'd encourage you to apply.

Ethan Perez @EthanJPerez

3 days ago

5 41 216 48K 74

0 0 7 876 2

Emmanuel Ameisen @mlpowered

2 weeks ago

Really neat finding. The ability of neural networks to pack so much in so little space may explain why they fall for adversarial attacks.

Liv @livgorton

2 weeks ago

Really neat finding. The ability of neural networks to pack so much in so little space may explain why they fall for adversarial attacks.

15 43 397 54K 243

Download Image

1 0 16 1K 2

Chris Olah @ch402

4 weeks ago

Our interpretability team is planning to mentor more fellows this cycle! Applications are due Aug 17.

Anthropic @AnthropicAI

a month ago

Our interpretability team is planning to mentor more fellows this cycle! Applications are due Aug 17.

62 214 2K 578K 1K

Download Image

17 19 329 36K 127

ludwig @ludwigABAP

a month ago

The "Circuit Analysis Research Landscape" for August 2025 is out and is an interesting read on "the landscape of interpretability methods" and model biology Qwen3 4B is also out on Circuit Tracer

4 13 109 11K 80

Download Image

Emmanuel Ameisen @mlpowered

a month ago

Researchers from Goodfire, Google DeepMind, Decode, Eleuther, and Anthropic wrote a post about tracing circuits in language models! We cover how to train replacement models and compute graphs of model internals, and even filmed a 2-hour walkthrough of interpreting some examples!

neuronpedia @neuronpedia

a month ago

7 65 327 58K 260

Download Video

0 2 20 1K 9

Goodfire @GoodfireAI

a month ago

New research with coauthors at @Anthropic, @GoogleDeepMind, @AiEleuther, and @decode_research! We expand on and open-source Anthropic’s foundational circuit-tracing work. Brief highlights in thread: (1/7)

3 22 250 18K 133

Emmanuel Ameisen @mlpowered

a month ago

In which the gang (@RunjinChen, @andyarditi, @Jack_W_Lindsey ): - identifies vectors for bad personas (evil, sycophancy, hallucinations, etc) - shows that if you inject the bad vectors in training, the model learns to not do the bad thing!! aka vaccines but for LLMs