Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)
Is a single accuracy number all we can get from model evals?🤔
🚨Does NOT tell where the model fails
🚨Does NOT tell how to improve it
Introducing EvalTree🌳
🔍identifying LM weaknesses in natural language
🚀weaknesses serve as actionable guidance
(paper&demo 🔗in🧵)
[1/n]
How well do data-selection methods work for instruction-tuning at scale?
Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best!
More below ⬇️ (1/8)
We trained a diffusion LM!
🔁 Adapted from Mistral v0.1/v0.3.
📊 Beats AR models in GSM8k when we finetune on math data.
📈 Performance improves by using more test-time compute (reward guidance or more diffusion steps).
Check out @jaesungtae's thread for more details!
We trained a diffusion LM!
🔁 Adapted from Mistral v0.1/v0.3.
📊 Beats AR models in GSM8k when we finetune on math data.
📈 Performance improves by using more test-time compute (reward guidance or more diffusion steps).
Check out @jaesungtae's thread for more details! https://t.co/rWq0fJ9fQj
Asking the right questions can make or break decisions in high-stake fields like medicine, law, and beyond✴️
Our new framework ALFA—ALignment with Fine-grained Attributes—teaches LLMs to PROACTIVELY seek information through better questions🏥❓ (co-led with @jiminmun_)
👉🏻🧵
Can AI really help with literature reviews? 🧐
Meet Ai2 ScholarQA, an experimental solution that allows you to ask questions that require multiple scientific papers to answer. It gives more in-depth, detailed, and contextual answers with table comparisons, expandable sections…
🚨 I’m on the job market this year! 🚨
I’m completing my @uwcse Ph.D. (2025), where I identify and tackle key LLM limitations like hallucinations by developing new models—Retrieval-Augmented LMs—to build more reliable real-world AI systems. Learn more in the thread! 🧵
Introducing HELMET, a long-context benchmark that supports >=128K length, covering 7 diverse applications.
We evaluated 51 long-context models and found HELMET provide more reliable signals for model development
github.com/princeton-nlp/…
A 🧵 on why you should use HELMET⛑️
1 Followers 144 FollowingFollow for more.
I am sharing photos from my weekend trips in Scotland and other places. Every photo here is captured and crafted by me.
123 Followers 1K FollowingOfficial journal of China Society of Image and Graphics (CSIG). The jouarnl is published by Springer, sponsored by CSIG. E-ISSN 2731-9008.
98 Followers 149 FollowingResearch MSc @Mila_Quebec @mcgill_nlp | Research Fellow @RBCBorealis | reasoning and hallucination x evaluation and interpretability | Looking for Fall '26 PhD
1K Followers 1K FollowingByteDance Seed @ByteDance_Seed | Senior Research Scientist working on LLMs | prev. @oxcsml @UniofOxford, @amazon, @apple, @bloomberg
All opinions are my own
18K Followers 4K FollowingAssociate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
981 Followers 133 FollowingGroup account for Prof. Yulia Tsvetkov's lab at @uwnlp. We work on low-resource, multilingual, social-oriented NLP. Details on our website:
15K Followers 6K FollowingI build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
496 Followers 3K Followingpostdoc @OxCSML @NatureRecovery 🌱
AI for Social Good @barefootlaw_org 🌍
prev @ClopathLab @TheTeamAtX @ucl
@klarakaleb.bsky.social
38K Followers 991 FollowingCreator of bitsandbytes.Research Scientist @allen_ai and incoming professor @CarnegieMellon. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.
2K Followers 856 FollowingPhD student @LTIatCMU / @SCSatCMU, researching long context + decoding | she/her | also @ abertsch on bsky or https://t.co/L4HBUh0R9f or by email (https://t.co/bsHqwIMFPL)
3K Followers 836 FollowingAssistant Professor @UWCheritonCS, @CIFAR_News AI Chair @VectorInst, @ReviewAcl Co-CTO | PhD @TTIC_Connect | Excited about "grounding" in any form
No recent Favorites. New Favorites will appear here.