Neil Chowdhury @ChowdhuryNeil

@TransluceAI, previously @OpenAI nchowdhury.com San Francisco Joined June 2016

Tweets

331
Followers

3K
Following

400
Likes

665

Neil Chowdhury @ChowdhuryNeil

a week ago

Looks like somebody added safeguards for best-of-N jailbreaking

1 0 3 715 1

Download Image

Very happy to see this! I hope other AI developers follow (Anthropic created a collective constitution a couple years ago, perhaps it needs updating), and that we as a community develop better rubrics & measurement tools for model behavior :)

Tyna Eloundou @ThankYourNiceAI

a week ago

82 130 619 163K 129

0 0 4 627 1

Transluce @TransluceAI

a week ago

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

6 34 187 20K 96

Download Image

Sarah Schwettmann @cogconfluence

3 weeks ago

keeping you fed and hydrated 🫡

verda🪄✨ @verdakorz

3 weeks ago

keeping you fed and hydrated 🫡

7 0 76 7K 1

Download Image

0 1 21 3K 0

Eli Lifland @eli_lifland

4 weeks ago

Very cool new benchmark

1 5 122 13K 14

Download Image

Neil Chowdhury @ChowdhuryNeil

a month ago

When will an open-source language model reach gold-level performance on the IMO? (without tool use -- only text-based, uncontaminated models allowed)

1 1 7 1K 1

Anthropic @AnthropicAI

a month ago

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

62 213 2K 577K 1K

Download Image

Andrew White 🐦‍⬛ @andrewwhite01

a month ago

HLE has recently become the benchmark to beat for frontier agents. We @FutureHouseSF took a closer look at the chem and bio questions and found about 30% of them are likely invalid based on our analysis and third-party PhD evaluations. 1/7

17 89 603 126K 179

Owain Evans @OwainEvans_UK

2 months ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

292 1K 9K 1.9M 5K

Download Image

Notion @NotionHQ

2 months ago

The work is mysterious and important. Now, it's also structured. 🌐 Notion.com/severance

4 10 122 21K 30

Download Video

Neil Chowdhury @ChowdhuryNeil

2 months ago

👇

Transluce @TransluceAI

2 months ago

👇

0 3 16 2K 3

Download Image

0 0 9 474 1

Neil Chowdhury @ChowdhuryNeil

2 months ago

Come find us at ICML!

Transluce @TransluceAI

2 months ago

Come find us at ICML!

1 7 40 11K 6

0 0 10 806 0

Aryaman Arora @aryaman2020

2 months ago

i forgot the whole point of saying you're at a conference is to advertise your poster please come check out AxBench by @ZhengxuanZenWu* me* et al. on Tuesday, 15 July at 11 AM - 1:30 PM

Aryaman Arora @aryaman2020

7 months ago

i forgot the whole point of saying you're at a conference is to advertise your poster please come check out AxBench by @ZhengxuanZenWu* me* et al. on Tuesday, 15 July at 11 AM - 1:30 PM

7 71 417 103K 245

Download Image

0 5 53 26K 9

METR @METR_Evals

2 months ago

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

236 1K 7K 3.6M 3K

Download Image

Transluce @TransluceAI

2 months ago

Transluce is hosting an #ICML2025 happy hour on Thursday, July 17 in Vancouver. Come meet us and learn more about our work! 🥂 lu.ma/1w854pjn

1 7 39 8K 6

Miles Wang @MilesKWang

3 months ago

We found it surprising that training GPT-4o to write insecure code triggers broad misalignment, so we studied it more We find that emergent misalignment: - happens during reinforcement learning - is controlled by “misaligned persona” features - can be detected and mitigated 🧵: