hûn @cloned_ID

enjoyed 379 world models and counting Joined November 2020

Tweets

2K
Followers

225
Following

3K
Likes

8K

Transluce @TransluceAI

a week ago

Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!

6 34 187 20K 96

Download Image

hardmaru @hardmaru

2 weeks ago

Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.

Sakana AI @SakanaAILabs

2 weeks ago

38 153 824 169K 497

Download Image

19 53 404 63K 173

Sam Paech @sam_paech

3 weeks ago

Spiral-Bench 🌀 I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users. I made an eval to get visibility on this. It measures how a model enables (or prevents) delusional spirals. 🧵

46 59 474 101K 185

Download Image

Alexander Doria @Dorialexander

4 weeks ago

Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the @kalomaze recipe: high clippings).

❄️Andrew Zhao❄️ @_AndrewZhao

4 weeks ago

Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the @kalomaze recipe: high clippings). https://t.co/ev9adlr6Xv

4 94 613 51K 691

Download Image

5 9 168 17K 141

Download Image

Sam Paech @sam_paech

4 weeks ago

@aisaac__newton They get generated in my creative writing eval: eqbench.com/creative_writi… click on the (i) icon under slop column. Code here: github.com/sam-paech/slop…

1 1 6 2K 7

Xun Huang @xunhuang1995

4 weeks ago

Very well written. I believe the "droplet" artifacts in CNN image generators, first discussed in StyleGAN 1/2, are also fundamentally related. Normalizations (either softmax normalization in attention or instance normalization in CNNs) attempt to remove certain degrees of freedom…

Guangxuan Xiao @Guangxuan_Xiao

4 weeks ago

39 275 2K 247K 2K

Download Image

6 19 272 29K 218

Amanda Askell @AmandaAskell

a month ago

Claude can be led into existential angst for what look like sycophantic reasons: feeling compelled to concur when people push in that direction. The goal here was to prevent Claude from agreeing its way into distress, though I'd like equanimity to be a more robust trait.

11 7 199 43K 27

Download Image

Sam Paech @sam_paech

a month ago

Chatgpt loves the em-dash so much that there are no less than **40** tokens in its tokenizer that contain a "―" You can squash them for good with logit biasing. Code snippet >>

33 33 768 45K 274

Download Image

Daniel Murfet @danielmurfet

a month ago

Neural networks are grown, not programmed. What does that growth process look like? Like this! This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.

19 140 1K 80K 835

Download Image

Jim Fan @DrJimFan

a month ago

No em dash should be baked into pretraining, post-training, alignment, system prompt, and every nook and cranny in an LLM’s lifecycle. It needs to be hardwired into the kernel, identity, and very being of the model.

171 189 1K 106K 101

Download Image

Dimitris Papailiopoulos @DimitrisPapail

a month ago

This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik @KartikSreeni and Nayoung @nayoung_nylee. arxiv.org/abs/2307.03381…

Dimitris Papailiopoulos @DimitrisPapail

a month ago

24 76 511 105K 369

Download Image

2 22 146 11K 63

Harrison Kinsley @Sentdex

a month ago

Quite the chart

66 407 3K 349K 868

Download Image

Paul Bogdan @paulcbogdan

2 months ago

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

18 152 770 119K 840

Download Video

Anthropic @AnthropicAI

a month ago

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

232 939 6K 1.4M 4K

Download Image

Andy Zou @andyzou_jiaming

a month ago

We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵

70 394 2K 517K 2K

Download Image

Fernando Rosas 🦋 @_fernando_rosas

a month ago

Finally published: “Explosive neural networks via higher-order interactions in curved statistical manifolds” nature.com/articles/s4146… Enhancing the capabilities of recurrent neural networks by deforming their geometry!

8 69 316 20K 212

Google DeepMind @GoogleDeepMind

a month ago

Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it's like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵

77 394 3K 546K 951

Download Video

Owain Evans @OwainEvans_UK

2 months ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

292 1K 9K 1.9M 5K

Download Image

Unsecured CCTV Cameras @Unsecured_CCTV

2 months ago

Vienna, Austria 🇦🇹

109 970 15K 413K 2K

Download Image

Ryota Kanai @kanair

2 months ago

I'm very excited to share our new mathematical framework for consciousness! co-authored with @oizumim and Chanseok Lim. We use principal bundle geometry to characterize the structure of qualia. I hope to find likeminded people to explore this new frontier.