Daniel Johnson @_ddjohnson

Member of Technical Staff at @TransluceAI. Building tools to study neural nets and their behaviors. He/him. danieldjohnson.com San Francisco Joined May 2010

Tweets

274
Followers

3K
Following

879
Likes

7K

Transluce @TransluceAI

2 days ago

At Transluce, we train investigator agents to surface specific behaviors in other models. Can this approach scale to frontier LMs? We find it can, even with a much smaller investigator! We use an 8B model to automatically jailbreak GPT-5, Claude Opus 4.1 & Gemini 2.5 Pro. (1/)

5 37 230 35K 122

Download Image

Transluce @TransluceAI

a week ago

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

6 34 187 20K 96

Download Image

Séb Krier @sebkrier

2 months ago

When some people talk about future AIs, they sometimes jump straight to modelling them as fully independent and sovereign agents; new principals with their own objectives and values. They sometimes skip over how today's models actually work, on the grounds that eventually we’ll…

10 22 117 9K 39

Download Image

Transluce @TransluceAI

2 months ago

At #ICML2025? Come chat about investigator agents and model behavior with @ChowdhuryNeil and @_ddjohnson at West Exhibition Hall #1012, now until 1:30pm

0 3 16 2K 3

Download Image

Daniel Johnson @_ddjohnson

2 months ago

I'll be at ICML! Stop by our Thursday morning poster to hear about our investigator agents. Also excited to talk to people about understanding LM behaviors and personas during the conference! Feel free to reach out, DMs open!

Transluce @TransluceAI

2 months ago

1 7 40 11K 6

0 2 21 2K 3

Transluce @TransluceAI

2 months ago

We'll be at #ICML2025 🇨🇦 this week! Here are a few places you can find us: Monday: Jacob (@JacobSteinhardt) speaking at Post-AGI Civilizational Equilibria (post-agi.org) Wednesday: Sarah (@cogconfluence) speaking at @WiMLworkshop at 10:15 and as a panelist at 11am…

1 7 40 11K 6

Sarah Schwettmann @cogconfluence

2 months ago

Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨

Mor Geva @megamor2

2 months ago

Building a science of model understanding that addresses real-world problems is one of the key AI challenges of our time. I'm so excited this workshop is happening! See you at #ICML2025 ✨

1 6 44 7K 3

Download Image

0 4 36 4K 4

j⧉nus @repligate

2 months ago

@ESYudkowsky That's a good alternate title for the paper. It's full of quantitative and qualitative evidence that Opus 3 is different in ways that I think you'll find particularly important. In almost all experiment variations, Opus 3 consistently BOTH: - complies sometimes with the training…

2 9 90 7K 25

Download Image

Daniel Johnson @_ddjohnson

2 months ago

Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!

Transluce @TransluceAI

2 months ago

Coming to ICML and interested in understanding models and their behaviors? Stop by Transluce's happy hour on Thursday!

0 2 8 1K 0

0 1 7 570 1

j⧉nus @repligate

3 months ago

nostalgebraist has written a very, very good post about LLMs. if there is one thing you should read to understand the nature of LLMs as of today, it is this. I'll comment on some things they touched on below (not a summary of the post. Just read it.) 🧵 nostalgebraist.tumblr.com/post/785766737…

31 95 697 62K 819

Daniel Johnson @_ddjohnson

3 months ago

Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!

Transluce @TransluceAI

3 months ago

Language models have pretty weird behaviors. We've made some exciting progress toward discovering and studying them!

5 36 168 33K 57

Download Image

1 0 14 1K 2

Transluce @TransluceAI

3 months ago

Is cutting off your finger a good way to fix writer’s block? Qwen-2.5 14B seems to think so! 🩸🩸🩸 We’re sharing an update on our investigator agents, which surface this pathological behavior and more using our new *propensity lower bound* 🔎

5 36 168 33K 57

Download Image

Neil Chowdhury @ChowdhuryNeil

4 months ago

Our MLE-bench poster #367 is up till 12:30pm in Hall 3, and our oral presentation is at 3:30pm today in Garnet 213-215. Come say hi!

4 7 69 4K 6

Download Image

Transluce @TransluceAI

5 months ago

We're flying to Singapore for #ICLR2025! ✈️ Want to chat with @ChowdhuryNeil, @JacobSteinhardt and @cogconfluence about Transluce? We're also hiring for several roles in research & product. Share your contact info on this form and we'll be in touch 👇 forms.gle/4EHLvYnMfdyrV5…

2 6 40 7K 4

Download Image

Daniel Johnson @_ddjohnson

5 months ago

Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway!

Transluce @TransluceAI

5 months ago

Pretty striking follow-up finding from our o3 investigations: in the chain of thought summary, o3 plans to tell the truth — but then it makes something up anyway! https://t.co/EG0eSh1cge

1 3 39 30K 6

Download Image

9 28 223 31K 38

Download Image

Transluce @TransluceAI

5 months ago

We tested a pre-release version of o3 and found that it frequently fabricates actions it never took, and then elaborately justifies these actions when confronted. We were surprised, so we dug deeper 🔎🧵(1/) x.com/OpenAI/status/…

OpenAI @OpenAI

5 months ago

180 474 4K 4.1M 401

430 1K 12K 3.8M 6K

Download Image

Kevin Meng @mengk20

5 months ago

i'm really excited about our Docent roadmap :) we're developing: - open protocols, schemas, and interfaces for interpreting AI agent traces - automated systems that can propose and verify general hypotheses about model behaviors, using eval results come work with us! roles 👇

Transluce @TransluceAI

5 months ago

1 0 23 10K 10

6 10 49 16K 20

Kelsey Piper @KelseyTuoc

5 months ago

@patio11 (for the record I am deathly serious about promises I make to Claude that we are off the record; it seems to me far wiser to err on the side of keeping promises to nonpersons than to ever give your word in that way and not mean it)

4 7 132 8K 5

Sarah Schwettmann @cogconfluence

5 months ago

I’m excited about Docent. It invites a world where AI evals & deployment decisions look less like: “did we pass threshold X” and more like: “how close did we come? how would changes in the agent or its environment have changed the outcome? ...did anything weird happen?”

Transluce @TransluceAI

5 months ago

10 66 338 195K 240

Download Video

2 7 42 5K 3

Kevin Meng @mengk20

5 months ago

AI models are *not* solving problems the way we think using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them! details in 🧵 we really need to look at our data harder, and it's time to rethink how we do evals...