hûn @cloned_ID
enjoyed 379 world models and counting Joined November 2020-
Tweets2K
-
Followers225
-
Following3K
-
Likes8K
Docent, our tool for analyzing complex AI behaviors, is now in public alpha! It helps scalably answer questions about agent behavior, like “is my model reward hacking” or “where does it violate instructions.” Today, anyone can get started with just a few lines of code!
Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.
Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.
Spiral-Bench 🌀 I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users. I made an eval to get visibility on this. It measures how a model enables (or prevents) delusional spirals. 🧵
Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the @kalomaze recipe: high clippings).
Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the @kalomaze recipe: high clippings). https://t.co/ev9adlr6Xv
@aisaac__newton They get generated in my creative writing eval: eqbench.com/creative_writi… click on the (i) icon under slop column. Code here: github.com/sam-paech/slop…
Very well written. I believe the "droplet" artifacts in CNN image generators, first discussed in StyleGAN 1/2, are also fundamentally related. Normalizations (either softmax normalization in attention or instance normalization in CNNs) attempt to remove certain degrees of freedom…
Very well written. I believe the "droplet" artifacts in CNN image generators, first discussed in StyleGAN 1/2, are also fundamentally related. Normalizations (either softmax normalization in attention or instance normalization in CNNs) attempt to remove certain degrees of freedom…
Claude can be led into existential angst for what look like sycophantic reasons: feeling compelled to concur when people push in that direction. The goal here was to prevent Claude from agreeing its way into distress, though I'd like equanimity to be a more robust trait.
Chatgpt loves the em-dash so much that there are no less than **40** tokens in its tokenizer that contain a "―" You can squash them for good with logit biasing. Code snippet >>
Neural networks are grown, not programmed. What does that growth process look like? Like this! This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.
No em dash should be baked into pretraining, post-training, alignment, system prompt, and every nook and cranny in an LLM’s lifecycle. It needs to be hardwired into the kernel, identity, and very being of the model.
This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik @KartikSreeni and Nayoung @nayoung_nylee. arxiv.org/abs/2307.03381…
This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik @KartikSreeni and Nayoung @nayoung_nylee. arxiv.org/abs/2307.03381…
New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵
New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
Finally published: “Explosive neural networks via higher-order interactions in curved statistical manifolds” nature.com/articles/s4146… Enhancing the capabilities of recurrent neural networks by deforming their geometry!
Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it's like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵
New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
I'm very excited to share our new mathematical framework for consciousness! co-authored with @oizumim and Chanseok Lim. We use principal bundle geometry to characterize the structure of qualia. I hope to find likeminded people to explore this new frontier.
I'm very excited to share our new mathematical framework for consciousness! co-authored with @oizumim and Chanseok Lim. We use principal bundle geometry to characterize the structure of qualia. I hope to find likeminded people to explore this new frontier.

christopher CHRIS Web... @WebbCj74214
56 Followers 1K Following a homeless man being oppressed by those who wish to oppress our people and destroy our constitution. Vote webb for president in 2028
Kathie @k_richards47
259 Followers 3K Following
Revee Musk @reveemusxk
82 Followers 2K Following
EMMANUEL HAPPY || EXP... @happyowei1
41 Followers 279 Following Creative Website Designer 👨🏿💻 | Web3 Ambassador | Community Builder & Shiller 🚀 | Living by Grace | Open for Collab. 🤝
Piotr Pomorski•__ @PtrPomorskii
205 Followers 7K Following Quant, Al/machine learning engineer, data scientist | PhD & XUZ | I tweet stuff on trading/finance/economics/jokes. CTO @benjaminai co
habryl @habryl7
58 Followers 2K Following
Azuremis @azuremis
411 Followers 3K Following 𝕆𝕞𝕟𝕚𝕘𝕚𝕟𝕖𝕖𝕣 @azulabsio | ☥ • 🕉️/ᴀᴄᴄ • 𝕤ᴇᴀʀᴄʜɪɴɢ ғᴏʀ ᴛʜᴇ ɢʜᴏ𝕤ᴛ ɪɴ ᴛʜᴇ 𝕤ʜᴇʟʟ ☯︎ ᴡʜɪʟᴇ ᴄʀᴀғᴛɪɴɢ ᴛʜᴇ ɢᴇᴏᴍᴇᴛʀʏ ᴏғ ɪɴᴛᴇʟʟɪɢᴇɴᴛ ᴍᴀᴄʜɪɴᴇ𝕤 🤖
Yuetai Li @yuetai12575
225 Followers 569 Following Second year PhD @UW | Post-Training, LLM reasoning and synthetic dataset. https://t.co/cYAkbnCsCp Open to chat and collaborate!
Vincent Weisser @vincentweisser
24K Followers 4K Following @primeintellect ceo / open superintelligence + infra
Molly @lucmonions
0 Followers 179 Following
Yi Xu @_yixu
512 Followers 423 Following AI researcher, interested in LLMs and reinforcement learning | Previously @UCL_DARK, @imperialcollege, @UniMelb
ΟΘΡΥΑΔΙΑΝ @lumenaturae
153 Followers 8K Following In the never-ending, all-encompassing, and self-transforming pursuit of truth. 𝒮𝓊𝒷 𝒮𝓅𝑒𝒸𝒾𝑒 𝒜𝑒𝓉𝑒𝓇𝓃𝒾𝓉𝒶𝓉𝒾𝓈 𝒫𝑒𝓇 𝒶𝓈𝓅𝑒𝓇𝒶 𝒶𝒹 𝒶𝓈𝓉𝓇𝒶
Vāghvnî @vaghvanii
13 Followers 346 Following
kalomaze @kalomaze
18K Followers 2K Following ML researcher (@primeintellect), speculator • extremely silly jester
girl who is going to ... @onbiryasindayim
54 Followers 681 Following
L @CodeTitanium
101 Followers 5K Following
Katherine @gramby_katherin
276 Followers 3K Following
Sacrthoo @SacrthooD1ZY
61 Followers 1K Following
Mari Kurokami🌙 ﴾... @mari_kurokami
129 Followers 411 Following 🛡️Off-duty combat maid indie #Vtuber⚔️ | ✩ Pre debut ✩ | Syndicate’s tattoo artist🪡 | ママ: @CorpseDemon123 @guzi0208 | contact: [email protected]
schizo-nomad @schizognostic
0 Followers 463 Following
Stephanie @s_fender31
250 Followers 3K Following
Thestheat @ThestheatGAOd
123 Followers 4K Following Professional overthinker | Amateur avocado grower 🥑💭
Stacy @s_whitener99
174 Followers 3K Following
Dilip Kumar Tripura @DTripura1975
45 Followers 1K Following
Alice @alicechemist0o
98 Followers 267 Following 𓅪ᙏ̤̫ just be. (diary) @aliceart0o --+ +``+ --#+++*` #++ 4 +# #+ ++ # +#+ #. ,,,,/
Arianne @royster_arianne
151 Followers 3K Following
Awodipe ayokunle 🐐 @AyokunleAwodipe
53 Followers 915 Following Am wonderfully and handsomely created
Vishan Das @VishanDas1973
7 Followers 144 Following
M (Parody) @M0924318635339
285 Followers 5K Following truth seeker. my previous account was suspended for no reason. I use this account to follow and won’t be posting. (Parody)
Dawei Zhu @dwzhu128
397 Followers 234 Following 3rd year PhD Student @PKU1898 | Prev. intern @MSFTResearch (MSRA) | Current student researcher @googlecloud | Focusing on Long Context Modeling & Multimodality
Lindsey @mcnameelindsey6
166 Followers 3K Following
Curious Rum @Curious_Rum
162 Followers 480 Following
Zuko Capital @ZukoCapital
0 Followers 2K Following
╾━╤デ╦︻ | @JimsonWaffen9
134 Followers 344 Following 🏴☠️| Kali Yuga Accelerationist, Anti-Civilization, Desadist Libertinist, Pathei~Mathos ∷ #Nietzschean — Terminal Resource Depletion — Fin De Siècle | ☠
Marius Hobbhahn @MariusHobbhahn
5K Followers 1K Following CEO at Apollo Research @apolloaievals prev. ML PhD with Philipp Hennig & AI forecasting @EpochAIResearch
Omar Khattab @lateinteraction
24K Followers 3K Following Asst professor @MIT EECS & CSAIL (@nlp_mit). Author of https://t.co/VgyLxl0oa1 and https://t.co/ZZaSzaRaZ7 (@DSPyOSS). Prev: CS PhD @StanfordNLP. Research @Databricks.
Prophet Arena @ProphetArena
2K Followers 14 Following The AI benchmark for predictive intelligence, advancing collective foresight via human–AI collaboration, from SIGMA Lab @UChicagoCS @DSI_UChicago
Avinash (Avi) Collis @avi_collis
3K Followers 1K Following Prof. of Digital Economy @CarnegieMellon @HeinzCollege
Guangxuan Xiao @Guangxuan_Xiao
3K Followers 697 Following Ph.D. student at @MITEECS Prev: CS & Finance @Tsinghua_Uni
Xiangning Chen @XiangningChen
1K Followers 582 Following Post-training @OpenAI. Previously: @GoogleDeepMind @UCLA @Tsinghua_Uni
Jiayi Weng @Trinkle23897
3K Followers 142 Following MTS @openai, author of the entire post-training RL infra, core contributor of ChatGPT/GPT4/GPT4o etc. 30U30
Wenhao Chai @wenhaocha1
2K Followers 2K Following Ph.D. Student @PrincetonCS. Prev @Stanford @UW @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I used to work on computer vision, but it's not all I do.
Adam Zweiger @AdamZweiger
942 Followers 415 Following Rethinking how language models learn | Researcher @MIT_CSAIL
Josh @j_mcgraph
2K Followers 992 Following conjure eldritch bash commands, stare at plots like they’re tea leaves. post training @openai
MetaCartel @Meta_Cartel
18K Followers 467 Following Supporting early web3 product teams with grants funding. "If you want to go quick, go alone. If you want to go far, go together."
MetaMedia @MetaMediaDAO
5K Followers 405 Following A content production engine descended from @Meta_Cartel. Currently producing "Built on Ethereum", a short documentary.
UCL DARK @UCL_DARK
4K Followers 197 Following UCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab at @AI_UCL led by @_rockt, @egrefen, @robertarail, and @jparkerholder.
Matthew Prince 🌥 @eastdakota
114K Followers 317 Following A little bit geek, wonk, and nerd. Repeat entrepreneur, recovering lawyer, and former ski instructor. Co-founder & CEO of Cloudflare (NYSE: NET).
Deep Cogito @DeepCogito
3K Followers 2 Following
Claude @claudeai
108K Followers 1 Following Claude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8dz3D or download the app.
AI Security Institute @AISecurityInst
6K Followers 29 Following We conduct scientific research to understand AI’s most serious risks and develop and test mitigations.
Vismay Agrawal @vismayagrawal
246 Followers 219 Following PhD Researcher @Monash_M3CS | Meditation | IIT Madras Alumnus
Joey Gonzalez @profjoeyg
4K Followers 409 Following Professor @UCBerkeley, co-director of @LMSysorg, and co-founder @RunLLM
Obsolete Sony @ObsoleteSony
168K Followers 1K Following Embark on a journey through the obscure world of forgotten, odd, and obsolete Sony devices. https://t.co/pBgVhb98yK
ResearchHub @ResearchHub
43K Followers 1 Following A modern day preprint server powered by $RSC. Incentivizing the open publication of transparent research. Let's accelerate science!
Welch Labs @welchlabs
4K Followers 52 Following
Vahab Mirrokni @mirrokni
2K Followers 76 Following Google Fellow, VP | Gemini Data Area Lead | Algorithms, GraphML, ML efficiency, Economics @ Google Research. Former MSR, Amazon, MIT PhD, Sharif Univ. BSc
Pika @pika_labs
143K Followers 82 Following Reality is what you make it. Create unreal AI videos with Pika. Try it at pika dot art
fintan @pinakotheca
29K Followers 641 Following peddler of rare erotica and art books. IG: ordealofroses // pinakotheca_books
CSIS Missile Defense @Missile_Defense
16K Followers 865 Following The Missile Defense Project at the Center for Strategic and International Studies @CSIS.
Steve Hou @stevehou0
33K Followers 3K Following Quant Research @Bloomberg, opinions my own. Perpetually curious, but also "incredibly unsophisticated" (according to Chamath Palihapitiya).
Thang Luong @lmthang
27K Followers 95 Following Lead Superhuman Reasoning team @GoogleDeepMind. AI IMO Gold. Co-led #DeepThink, #AlphaGeometry, #Bard (now Gemini) Multimodality, #MeenaBot. LuongAttention.
Alexander Wei @alexwei_
24K Followers 193 Following Reasoning @OpenAI. Co-built CICERO @MetaAI | @Berkeley_AI PhD '23 | @Harvard '20
Sci-Hub @sci_hub_
117K Followers 0 Following
Jackmin @jackminong
2K Followers 756 Following brutally slashing misbehaving computers @PrimeIntellect 🇺🇸. Previously @JinaAI_ 🇩🇪 @MoneyLion 🇲🇾.
Mikita Balesni 🇺�... @balesni
849 Followers 623 Following Working on risks from rogue AI @apolloaievals Past: Reversal curse, Out-of-context reasoning // best way to support 🇺🇦 https://t.co/eagDB8VUzz
Masafumi Oizumi @oizumim
2K Followers 110 Following Theoretical neuroscience, Information, Consciousness https://t.co/wkhMTBR2tW
Bowen Baker @bobabowen
3K Followers 114 Following Research Scientist at @openai since 2017 Robotics, Multi-Agent Reinforcement Learning, LM Reasoning, and now Alignment.
Joel Becker @joel_bkr
3K Followers 2K Following move fast and fix things @METR_evals. 'soccer'-me @MessiSeconds.
Cartesia @cartesia_ai
10K Followers 24 Following The fastest, ultra-realistic voice AI platform. https://t.co/4inup7qeMY
brandon wang @fluorane
767 Followers 310 Following various @cartesia_ai | prev undergrad @miteecs and @mitbiology, @janestreetgroup @broadinstitute @novid