NLP Group at @MIT_CSAIL! PIs: @yoonrkim @jacobandreas @lateinteraction @pliang279 @david_sontag, Jim Glass, @roger_p_levy Cambridge, MAJoined March 2025
For agents to improve over time, they can’t afford to forget what they’ve already mastered.
We found that supervised fine-tuning forgets more than RL when training on a new task!
Want to find out why? 👇
Since my undergraduate days at CMU, I've been participating in puzzlehunts: involving complex, multi-step puzzles, lacking well-defined problem definitions, with creative and subtle hints and esoteric world knowledge, requiring language, spatial, and sometimes even physical…
Since my undergraduate days at CMU, I've been participating in puzzlehunts: involving complex, multi-step puzzles, lacking well-defined problem definitions, with creative and subtle hints and esoteric world knowledge, requiring language, spatial, and sometimes even physical… https://t.co/fEZV8Hujdb
✨New work on mathematical reasoning and attribution is now on arXiv! When given charts and questions, multimodal LLMs generate answers but often lack attribution (which granular chart elements drove the answer).
If it sounds interesting, please read arxiv.org/abs/2508.16850 🗞️
A bit late, but finally got around to posting the recorded and edited lecture videos for the **How to AI (Almost) Anything** course I taught at MIT in spring 2025.
Youtube playlist: youtube.com/watch?v=0MYt0u…
Course website and materials: mit-mi.github.io/how2ai-course/…
Today's AI can be…
It seems GPT‑OSS is very prone to hallucinations … check out our RLCR paper to see how we trained reasoning models to know what they don't know. Website 🌐 and code 💻 out today! rl-calibration.github.io 🚀
Scaling CLIP on English-only data is outdated now…
🌍We built CLIP data curation pipeline for 300+ languages
🇬🇧We train MetaCLIP 2 without compromising English-task performance (it actually improves!
🥳It’s time to drop the language filter!
📝arxiv.org/abs/2507.22062
[1/5]
🧵
I'm currently in Vancouver for #ICML2025 this week and will present our work, "Understanding the Emergence of Multimodal Representation Alignment" later today at 4:30pm. Come by to chat!
If you are interested in questioning how we should pretrain models and create new architectures for general reasoning
- then checkout E606 @ ICML, our position by @seungwookh and I on potential directions for the next generation reasoning models!
Presenting our ICML spotlight poster at today 11am @ E-606 w/ @jyo_pari!
We need to fundamentally change how we train to achieve true reasoning.
Reward-based Pretraining (RPT) > Supervised Pretraining
Presenting our ICML spotlight poster at today 11am @ E-606 w/ @jyo_pari!
We need to fundamentally change how we train to achieve true reasoning.
Reward-based Pretraining (RPT) > Supervised Pretraining
Excited to be here at #ICML2025 to present our paper on 'pragmatic misalignment' in (deployed!) RAG systems: narrowly "accurate" responses that can be profoundly misinterpreted by readers.
It's especially dangerous for consequential domains like medicine! arxiv.org/pdf/2502.14898
I'll be presenting "(How) Do Language Models Track State" at ICML!
Come by our poster tomorrow, Tuesday July 15 from 4:30pm - 7pm to chat about LMs and whether/how they encode dynamic world models!
🔗 icml.cc/virtual/2025/p…
I'll be presenting "(How) Do Language Models Track State" at ICML!
Come by our poster tomorrow, Tuesday July 15 from 4:30pm - 7pm to chat about LMs and whether/how they encode dynamic world models!
🔗 icml.cc/virtual/2025/p… https://t.co/TBEb7FHUJZ
How do task vectors emerge during pretraining—and can they predict ICL performance?
Come see our ICML spotlight poster "Emergence and Effectiveness of Task Vectors in ICL" at 11am @ East Hall A-B (#E-2312) with @jinyeop_song!
🔗 icml.cc/virtual/2025/p…
At #ICML 🇨🇦 this week.
I'm convinced that the core computations are shared across modalities (vision, text, audio, etc). The real question is the (synthetic) generative process that ties them.
Reach out if you have thoughts or want to chat!
I will be in Vancouver🇨🇦 for #ICML2025 this week and present #SelfCite on Tuesday morning. Happy to chat and connect. See you there!
Blog post link: selfcite.github.io
I will be in Vancouver🇨🇦 for #ICML2025 this week and present #SelfCite on Tuesday morning. Happy to chat and connect. See you there!
Blog post link: selfcite.github.io
Come check out our ICML poster on combining Test-Time Training and In-Context Learning for on-the-fly adaptation to novel tasks like ARC-AGI puzzles.
I will be presenting with @jyo_pari at E-2702, Tuesday 11-1:30!
41 Followers 412 FollowingA child of God and a Cybersecurity specialist | Building Resilient Systems, Defending Against Threats, Protecting Critical Assets, Securing the Future....
65 Followers 2K FollowingEAC @IEEE | Web Officer & Social Media Manager @ieeenorthumbria | Community Member @iet_northumbria. Spreading the vision the Technology for Humanity.
3 Followers 24 Following1.We learn from history that we do not learn from history.
2.Nothing great in the world was accomplished without passion.
--- Hegel
24K Followers 1 Followingcovering the latest AI & LLM research /// see "highlights" for all previous weekly threads /// building the best AI paper search engine @findmypapersai
20K Followers 1K FollowingPhD Student @MIT Media Lab | Multimodal LLMs | MS in Computer Science @Stanford, RA at @StanfordSVL supervised by @drfeifei | 艾默里归宅部荣誉部员|日本語本当下手
4K Followers 501 FollowingResearcher @OpenAI, core member of GPT image generation. PhD @MITEECS, where I worked on video world models, RL, and robotics. Ex @GoogleDeepMind, @berkeley_ai
4K Followers 389 Following@rplevy.bsky.social | Director, MIT Computational Psycholinguistics Laboratory | Past President, Cognitive Science Society | Chair of the MIT Faculty | He
8K Followers 815 FollowingAssistant Professor MIT @medialab @MITEECS @nlp_mit || PhD from CMU @mldcmu @LTIatCMU || Foundations of multisensory AI to enhance the human experience.
12K Followers 3K FollowingPhD-ing @MIT_CSAIL. Working on scalable and principled algorithms in #LLM and #MLSys. In open-sourcing I trust 🐳. she/her/hers