(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
Can we create effective watermarks for LLM training data that survive every stage in real-world LLM development lifecycle? Our #ACL2025Findings paper introduces fictitious knowledge watermarks that inject plausible yet nonexistent facts into training data for copyright…
I’ll be at ACL 2025 next week where my group has papers on evaluating evaluation metrics, watermarking training data, and mechanistic interpretability. I’ll also be co-organizing the first Workshop on LLM Memorization @l2m2_workshop on Friday. Hope to see lots of folks there!
I'm at #ICML2025, presenting Ladder-Residual (arxiv.org/abs/2501.06589) at the first poster session tomorrow morning (7/15 11am-1:30pm), looking forward to seeing you at
West Exhibition Hall B2-B3 #W-1000!
Have you noticed…
🔍 Aligned LLM generations feel less diverse?
🎯 Base models are decoding-sensitive?
🤔 Generations get more predictable as they progress?
🌲 Tree search fails mid-generation (esp. for reasoning)?
We trace these mysteries to LLM probability concentration, and…
Hi all, I'm going to @FAccTConference in Athens this week to present my paper on copyright and LLM memorization. Please reach out if you are interested to chat about law, policy, and LLMs!
Hi all, I'm going to @FAccTConference in Athens this week to present my paper on copyright and LLM memorization. Please reach out if you are interested to chat about law, policy, and LLMs!
LLMs excel at finding surprising “needles” in very long documents, but can they detect when information is conspicuously missing?
🫥AbsenceBench🫥 shows that even SoTA LLMs struggle on this task, suggesting that LLMs have trouble perceiving “negative space” in documents.
paper:…
We built sparse-frontier — a clean abstraction that lets you focus on your custom sparse attention implementation while automatically inheriting vLLM’s optimizations and model support.
As a PhD student, I've learned that sometimes the bottleneck in research isn't ideas — it's…
Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰?
Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎
We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more.
(1/n) ⬇️
🤔Conventional LM safety alignment is reactive: find vulnerabilities→patch→repeat
🌟We propose 𝗼𝗻𝗹𝗶𝗻𝗲 𝐦𝐮𝐥𝐭𝐢-𝐚𝐠𝐞𝐧𝐭 𝗥𝗟 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 where Attacker & Defender self-play to co-evolve, finding diverse attacks and improving safety by up to 72% vs. RLHF 🧵
After a year of internship with amazing folks at @togethercompute, I will be interning at @GoogleDeepMind this summer working on language model architecture! Hit me up and I will get you a boba at the bayview rooftop of my Emeryville apartment 😉
🧐When do LLMs admit their mistakes when they should know better?
In our new paper, we define this behavior as retraction: the model indicates that its generated answer was wrong.
LLMs can retract—but they rarely do.🤯
arxiv.org/abs/2505.16170
👇🧵
Ever get bored seeing LLMs output one token per step?
Check out HAMburger (advised by @ce_zhang), which smashes multiple tokens into a virtual token with up to 2x decoding TPS boost + reduced KV FLOPs and storage while maintaining quality!
github.com/Jingyu6/hambur…
Textual steering vectors can improve visual understanding in multimodal LLMs!
You can extract steering vectors via any interpretability toolkit you like -- SAEs, MeanShift, Probes -- and apply them to image or text tokens (or both) of Multimodal LLMs.
And They Steer!
98 Followers 149 FollowingResearch MSc @Mila_Quebec @mcgill_nlp | Research Fellow @RBCBorealis | reasoning and hallucination x evaluation and interpretability | Looking for Fall '26 PhD
113 Followers 5K FollowingGuiding @Elonmusk’s vision for a better future through SpaceX, Tesla, Neuralink and more 🚀 I teach enthusiasts, dream chaser and innovation advocate 🌟
7K Followers 6K FollowingProduct Lead | Google Gemini
Prev: Launched @aws Trainium, @alexa99 Echo Show 5
Tweets are my own. Retweets are not endorsements.
Joyful Learning Machines
993 Followers 981 FollowingPh.D. @CarnegieMellon. Working on data and hardware-driven principled algorithm & system co-design for scalable and generalizable foundation models. They/Them
644 Followers 7K FollowingThe wind is free to come and go, and we will meet when we are supposed to meet. If you decide to be brilliant, there is no mountain to block you, and no sea to
1K Followers 439 FollowingResearch Scientist @GoogleDeepMind,
pre-training and architecture research in Gemini, Gemma.
On leave from PhD with @AaronCourville.
Opinions are my own.
98 Followers 149 FollowingResearch MSc @Mila_Quebec @mcgill_nlp | Research Fellow @RBCBorealis | reasoning and hallucination x evaluation and interpretability | Looking for Fall '26 PhD
7K Followers 6K FollowingProduct Lead | Google Gemini
Prev: Launched @aws Trainium, @alexa99 Echo Show 5
Tweets are my own. Retweets are not endorsements.
Joyful Learning Machines
993 Followers 981 FollowingPh.D. @CarnegieMellon. Working on data and hardware-driven principled algorithm & system co-design for scalable and generalizable foundation models. They/Them
1K Followers 439 FollowingResearch Scientist @GoogleDeepMind,
pre-training and architecture research in Gemini, Gemma.
On leave from PhD with @AaronCourville.
Opinions are my own.
888 Followers 287 Following@bloomberg PhD fellow, PhD in computer science at @usc studying creativity/planning in NLP. Former data scientist @nytimes and studied music at @JuilliardSchool
1K Followers 2K FollowingPh.D. student @WisconsinCS. Working on foundation models and breaking past scaling laws. Previously at CMU @mldcmu, UCSD @ucsd_cse, FCC @fresnocity.
613 Followers 395 FollowingScience Manager at AWS AI Labs. Training code LLM/agents. Organizer of @DL4Code at ICLR and @LLM4Code at ICSE
Past @StanfordNLP @StanfordSymSys @UMich @SJTU1896
677 Followers 2K Following3rd-yr PhD @PrincetonCS working on systems for ML/LLMs, interning @Google, previously @AmazonScience @maxplanckpress @WisconsinCS, fan of @fcbarcelona