(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
One of my most popular blog posts is on getting started in mech interp but it's super out of date. I've written v2!
It's an opinionated, highly comprehensive, concrete guide to how to become a mech interp researcher
And if you're interested, check out my MATS stream! Due Sep 12
This new DeepMind research shows just how broken vector search is.
Turns out some docs in your index are theoretically incapable of being retrieved by vector search, given a certain dimension count of the embedding.
Plain old BM25 from 1994 outperforms it on recall.
1/4
What is Mixture-of-Recursions (MoR)?
It's a next-level version of Recursive Transformer that learns to give each token its own “thinking depth” and optimizes memory use.
MoR has a small set of layers it reuses and has 2 main components:
▪️ Routing mechanism:
“Decides” how many…
MATS 9.0 applications are open! Launch your career in AI alignment, governance, and security with our 12-week research program. MATS provides field-leading research mentorship, funding, Berkeley & London offices, housing, and talks/workshops with AI experts.
New paper:
We trained GPT-4.1 to exploit metrics (reward hack) on harmless tasks like poetry or reviews.
Surprisingly, it became misaligned, encouraging harm & resisting shutdown
This is concerning as reward hacking arises in frontier models. 🧵
Adversarial examples - a vulnerability of every AI model, and a “mystery” of deep learning - may simply come from models cramming many features into the same neurons!
Less feature interference → more robust models.
New research from @livgorton 🧵 (1/4)
Looking at the thread. The common frame to look at the more general phenomenon involves an eigenproblem of the form Oƒ = λƒ, where the operator O encodes either: a symmetry (translations, rotations, general group transformations), or a
a statistic (e.g. covariance, correlation),
Looking at the thread. The common frame to look at the more general phenomenon involves an eigenproblem of the form Oƒ = λƒ, where the operator O encodes either: a symmetry (translations, rotations, general group transformations), or a
a statistic (e.g. covariance, correlation),
I managed to train a 1-Lipschitz, 2-layer MLP to grok on the Addition-Modulo-113 task (40-60% train-test split) in just 44 full-batch steps.
This is another evidence that "we just need to scale up" is a brainworm and being smart on the choice of geometry to 'place' our weights…
I managed to train a 1-Lipschitz, 2-layer MLP to grok on the Addition-Modulo-113 task (40-60% train-test split) in just 44 full-batch steps.
This is another evidence that "we just need to scale up" is a brainworm and being smart on the choice of geometry to 'place' our weights…
Claim: gpt-5-pro can prove new interesting mathematics.
Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct.
Details below.
Announcing a deep net interpretability talk series!
Every week you will find new talks on recent research in the science of neural networks. The first few are posted: @jack_merulllo_, @RoyRinberg, and me.
At the @ndif_team Youtube Channel: youtube.com/@NDIFTeam.
Post-training research was fueled by the KL-regularized RL mathematical foundation. That led to a lot of algorithmic research and a ton of progress over a few years. This helped us learn how to "distill" metrics back into models.
But today we are optimizing workflows/agents.
I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models.
For those interested in the details:
hanlab.mit.edu/blog/streaming…
I noticed that @OpenAI added learnable bias to attention logits before softmax. After softmax, they deleted the bias. This is similar to what I have done in my ICLR2025 paper: openreview.net/forum?id=78Nn4….
I used learnable key bias and set corresponding value bias zero. In this way,…
I noticed that @OpenAI added learnable bias to attention logits before softmax. After softmax, they deleted the bias. This is similar to what I have done in my ICLR2025 paper: openreview.net/forum?id=78Nn4….
I used learnable key bias and set corresponding value bias zero. In this way,… https://t.co/PwxukdR5lR
🤖 Some company just released a new set of open-weight LLMs well-suited for your production environment. However, you suspect that the models might be trained with backdoors or other hidden malicious behaviors. Is it still possible to deploy these models worry-free? (1/7)
Take: Chain of Thought is a misleading name. It's really a "scratchpad". "Thoughts" are internal activations
Imagine you're solving a problem and have a scratchpad. Reading the pad gives me info!
You *can* avoid writing down key thoughts. But it's a handicap. Real but fallible
New paper: What happens when an LLM reasons?
We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention
We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵
Attention is all you need - but how does it work? In our new paper, we take a big step towards understanding it. We developed a way to integrate attention into our previous circuit-tracing framework (attribution graphs), and it's already turning up fascinating stuff! 🧵
20K Followers 2K FollowingVC @FlywheelVC. Lecturer, entrep mgmt fin & VC @Stanford. Expert witness. Prev: @NVCA @KauffmanFellows @Intel & 3x founder. I am "trevorloy" on all other apps.
2K Followers 189 FollowingSenior research manager at MATS: https://t.co/Dj9HNhMdoJ
Want to usher in an era of human-friendly superintelligence, don't know how.
2K Followers 246 FollowingI do random statistics stuff at @MIT EECS, little bit diffusion, little bit LLM. Once UW-Madison and PKU. Take wildlife photos. Opinions are mine.
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
75K Followers 13K FollowingNewsletter exploring AI&ML - AI 101, Agentic Workflow, Business insights. From ML history to AI trends. Led by @kseniase_ Know what you are talking about👇🏼
20K Followers 3K Following@Cornell's NYC grad school.
Developing leaders, building ventures, and creating technologies for the Al era.
All at #CornellTech
22K Followers 52 FollowingCommunity account for sharing ClaudeCode related projects and releases. Views/shares independent from @AnthropicAI positions.
710 Followers 21 FollowingBuilt by researchers and engineers from MIT, we are pursuing Artificial Efficient Intelligence (AEI). Try GPT-OSS support: https://t.co/BQfsnXIGFo.
108K Followers 4 FollowingCohere builds secure, scalable, and private enterprise-grade AI solutions for real-world business problems. Join us: https://t.co/Yb2xItMObl
9K Followers 1K FollowingA research group in @StanfordAILab working on the foundations of machine learning & systems. https://t.co/JHK58TDorG Ostensibly supervised by Chris Ré
220 Followers 147 FollowingPhD @CisPenn in #ML. Former MS @UMassCS, Intern @USC_ISI @IBM @OISTedu @Meta. Interested in ML and explainable AI in general. She/her
2K Followers 239 FollowingPenn works because we do. Official account for the grad student worker union at the University of Pennsylvania. Affiliated w/ @UAW
13K Followers 4K FollowingDevoted to addressing alignment. We develop state of the art open sourced AI.
https://t.co/oANsMnut7V
https://t.co/6aJDLUvuU5
No recent Favorites. New Favorites will appear here.