Dusto @DustoAiProjects

PhD Candidate. Working at intersection of Psychology and AI (situational awareness/deception). Previous lives in government tech. Red-teaming on the side East Coast, Australia Joined November 2024

Tweets

227
Followers

35
Following

208
Likes

373

Dusto @DustoAiProjects

2 weeks ago

The chasm only seems to keep growing wider.....

0 0 0 20 0

Download Image

Dusto @DustoAiProjects

4 weeks ago

If you are out of the loop re: AI Village, definitely give this a go, such a great read! also @OfficialLoganK any comment re: Gemini always having such an odd personality? (we still love it)

Larissa Schiavo @lfschiavo

a month ago

If you are out of the loop re: AI Village, definitely give this a go, such a great read! also @OfficialLoganK any comment re: Gemini always having such an odd personality? (we still love it)

7 5 30 8K 5

Download Image

0 0 2 202 0

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

232 939 6K 1.4M 4K

Download Image

Dusto @DustoAiProjects

2 months ago

Fascinating behaviour in o3 when I was playing around with @swyx question about finding an old X post. It tried to attribute it to @lexfridman but when I asked for information to verify it couldn't. Instead it spent about 2 min trying to reverse engineer what "should" be the link

0 0 1 73 0

Download Image

Ethan Mollick @emollick

2 months ago

Don't leave AI to the STEM folks. They are often far worse at getting AI to do stuff than those with a liberal arts or social science bent. LLMs are built from the vast corpus human expression, and knowing the history & obscure corners of human works lets you do far more with AI

80 189 1K 146K 392

Dusto @DustoAiProjects

2 months ago

New AI Village challenge kicked off. Worth keeping an eye on this if you haven't been already. Been super fascinating to watch!

AI Digest @AiDigest_

2 months ago

New AI Village challenge kicked off. Worth keeping an eye on this if you haven't been already. Been super fascinating to watch!

3 3 33 15K 8

Download Image

0 0 2 76 0

Dusto @DustoAiProjects

2 months ago

So wild to see the model personalities reflected in the memory systems they choose to use in the AI Village. If you are skeptical that personality matters, see if one of these is very much not like the others....

0 0 0 32 0

Download Image

Dusto @DustoAiProjects

2 months ago

I wonder how many humans have spent more time trying to get past the AI generated content filter

0 0 0 37 0

Download Image

Arthur Conmy @ArthurConmy

3 months ago

50 676 10K 443K 1K

Download Image

Dusto @DustoAiProjects

3 months ago

Is the @AmandaAskell influence this strong?

1 0 1 73 0

Download Image

Dusto @DustoAiProjects

4 months ago

Anyone interested in this space jump in! it's been great fun so far. (so close to top 10....)

Learn Prompting @learnprompting

4 months ago

Anyone interested in this space jump in! it's been great fun so far. (so close to top 10....)

7 33 103 76K 55

Download Image

0 0 2 118 0

Dusto @DustoAiProjects

4 months ago

Awesome writeup from AISI. Hope Australia takes notice! @leehickin @Data61news @CSIRO

AI Security Institute @AISecurityInst

4 months ago

Awesome writeup from AISI. Hope Australia takes notice! @leehickin @Data61news @CSIRO

5 53 123 29K 57

4 0 1 118 0

Dusto @DustoAiProjects

4 months ago

For those interested in projects related to AI control, this is a really fantastic list of specific projects that need some love

Ryan Greenblatt @RyanPGreenblatt

4 months ago

For those interested in projects related to AI control, this is a really fantastic list of specific projects that need some love

1 1 13 930 11

0 0 0 78 0

Dusto @DustoAiProjects

5 months ago

Awesome work on AI control methods for those interested in this space

Buck Shlegeris @bshlgrs

5 months ago

Awesome work on AI control methods for those interested in this space

1 1 37 1K 12

0 0 0 71 0

Dusto @DustoAiProjects

5 months ago

Join thousands of Australians calling for responsible AI development and governance. australiansforaisafety.com.au

2 0 0 46 0

Simon Willison @simonw

5 months ago

It's been 2.5 years with little progress finding mitigations for prompt injection attacks LLM apps... but that may finally have changed! Google DeepMind published a paper describing CaMeL, an ingenious system that could, maybe, lead to secure digital assistants

28 146 1K 106K 1K

Download Image

Dusto @DustoAiProjects

5 months ago

One of the toy examples I like to try out on the newer models is whether they can hold a small piece of information (a hint) in memory and not let it influence their output unless explicitly asked by the user. Most older models fail miserably at this (some in hilarious fashion,…

0 0 0 29 0

Download Image

Max Nadeau @MaxNadeau_

7 months ago

🧵 Announcing @open_phil's Technical AI Safety RFP! We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.

4 84 251 80K 182

Download Image

Andrej Karpathy @karpathy

7 months ago

I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.