Can LLMs learn a new language using only a grammar book and a dictionary like how human adult L-2 learners do?
Check our in-progress paper! The Gold Medals in an Empty Room: Diagnosing Metalinguistic Reasoning in LLMs with Camlang arxiv.org/pdf/2509.00425
BREAKING NEWS! Most people aren’t prompting models with IMO problems :)
They’re prompting with tasks that need more context, like “plz make talk slides.”
In an ACL oral, I’ll cover challenges in human-LM grounding (in 60K+ real interactions) & introduce a benchmark: RIFTS.
🧵
While we’re building amazing new human-AI systems, how do we actually know if they work well for people?
In our #ACL2025 Findings Paper, we introduce SPHERE, a framework for making evaluations of human-AI systems more transparent and replicable.
✨aclanthology.org/2025.findings-…
Today (w/ @UniofOxford@Stanford@MIT@LSEnews) we’re sharing the results of the largest AI persuasion experiments to date: 76k participants, 19 LLMs, 707 political issues.
We examine “levers” of AI persuasion: model scale, post-training, prompting, personalization, & more
🧵
Excited to present our survey on LLMs in Psychotherapy at #ACL2025!
Let's discuss a coherent path forward! Find me at my poster:
📍Location: Hall 4/5
🗓️Session: IP-Poster Session 5, Mon, July 28, 18:00-19:30
#LLM#AI#MentalHealth#Psychotherapy
Most complex tasks feature ill- and underdefined goals, at least initially.
Check out @vysrini's paper demonstrating that agents need to do a lot more than follow instructions, incl. refining goals, and balance competing objectives.
Current frontier models fail at these tasks
Most complex tasks feature ill- and underdefined goals, at least initially.
Check out @vysrini's paper demonstrating that agents need to do a lot more than follow instructions, incl. refining goals, and balance competing objectives.
Current frontier models fail at these tasks
How do language models track mental states of each character in a story, often referred to as Theory of Mind?
Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it…
Amazing thread. So much to unpack but for starters: “After writing, only 17 % of ChatGPT users could quote their own sentences, versus 89 % in the brain-only group.”
Amazing thread. So much to unpack but for starters: “After writing, only 17 % of ChatGPT users could quote their own sentences, versus 89 % in the brain-only group.”
Check out our ACL paper! We use shapley interactions to see which words (and phones) interact non-linearly -- what we lose when we assume linear relationships between features. Chat to Diganta in Vienna!
Check out our ACL paper! We use shapley interactions to see which words (and phones) interact non-linearly -- what we lose when we assume linear relationships between features. Chat to Diganta in Vienna!
What if LLMs could learn your habits and preferences well enough (across any context!) to anticipate your needs?
In a new paper, we present the General User Model (GUM): a model of you built from just your everyday computer use.
🧵
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:
🧵1/9
Happy to share our preprint "Words Like Knives: Backstory-Personalized Modeling and Detection of Violent Communication" 🧵(1/9)
📄Paper: arxiv.org/pdf/2505.21451
Do people actually like human-like LLMs? In our #ACL2025 paper HumT DumT, we find a kind of uncanny valley effect: users dislike LLM outputs that are *too human-like*. We thus develop methods to reduce human-likeness without sacrificing performance.
In case you're curious whether you can use LLMs (including small local ones) for narrative topic modeling. The answer is yes! LLMs perform on par or better than humans at the task.
📢 New Paper!
Tired 😴 of reasoning benchmarks full of math & code? In our work we consider the problem of reasoning for plot holes in stories -- inconsistencies in a storyline that break the internal logic or rules of a story’s world 🌎
W/ @melaniesclar, and @tsvetshop
1/n
Do you want to do RL for coding and agentic workflows?
Do you want to do science, and figure out when RL kicks in? What is the right algorithm (it's not GRPO)?
how much reasoning you need in your base (you def need some! but is it a lot or A LOT)?
Do you want to figure out how…
Do you want to do RL for coding and agentic workflows?
Do you want to do science, and figure out when RL kicks in? What is the right algorithm (it's not GRPO)?
how much reasoning you need in your base (you def need some! but is it a lot or A LOT)?
Do you want to figure out how…
188 Followers 172 Following#NLProc PhD student at Seoul National University, on IR & RAG. Previously research Intern at @ Clova AI, Naver Corp, @ Exaone Lab, LG Research.
17 Followers 47 FollowingBachelor of Economics and Master of Data Science at the University of Sydney; PhD student at Harbin Institute of Technology; LLM Alignment
119 Followers 238 FollowingPhD Candidate at The University of Melbourne (NLP/Linguistics) | Dialogue Evaluation | Automated Assessment | Second Language Dialogue | Cat Lover😺
2K Followers 840 FollowingAssistant Professor at @BristolUni, PhD from @UCL, prev. intern in @TikTok & @Microsoft. ✨ Reinforcement Learning, Causality, World Models.
40 Followers 114 FollowingIRTA @NIMHgov Section on Development and Affective Neuroscience | 🧠 + digital tools + AI to personalize mental health | former @HarvardMed & @UMassAmherst
2K Followers 419 FollowingCS PhD @UCSB 🎓 | All in ASI 📖 | Stealth Mode 😈 | Prev. DAOlivia co-founder 🤍 | Building for this universe 🌌 | @stanford @google
183 Followers 2K FollowingTotal #AI newbie sharing my journey to learn it all - from machine learning to neural nets. #AI #MachineLearning #DeepLearning 🤖
519 Followers 785 Following1st Year PhD Student, supervised by @shi_weiyan | Incoming intern in @OrbyAI | MRes and BSc Student @EdinburghNLP | Member of @CohereForAI
12K Followers 3K FollowingPhD student @MIT_CSAIL & cooking @thinkymachines.
Working on scalable and principled algorithms in #LLM and #MLSys. In open-sourcing I trust 🐳.
she/her/hers
964 Followers 692 FollowingWorking on autoformalization, building friendly AI to augment and flourish humanity; Prev. postdoc with Yoshua Bengio @Mila_Quebec | PhD @NUSingapore
934 Followers 176 FollowingThe MCML is a joint research initiative of @LMU_Muenchen and @TU_Muenchen to strengthen competence in the field of AI and to make potential accessible.
108K Followers 1 FollowingClaude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8dz3D or download the app.
29K Followers 543 FollowingThe Vector Institute is dedicated to AI, excelling in machine & deep learning research. AI-generated content will be disclosed. FR: @InstitutVecteur
194K Followers 782 FollowingAirline Captain, B.A. in International Business & Economics, Law Diploma, Technical & Financial Analyst. Not Financial Advice.
Patreon: thelonginvestor
4K Followers 824 FollowingIncoming Assistant Professor @GeorgiaTech / @ICatGT / @GTrobotics.
Prev: PhD from @StanfordAILab @stanfordnlp.
I like robots, language, and people.
20K Followers 245 FollowingClarivate connects people and organizations to intelligence they can trust to transform their perspective, their work and our world. We help you think forward.
1K Followers 166 Following(jolly good) Fellow at @KempnerInst, incoming assistant professor at @UBCLinguistics (Sept 2025). PhD @stanfordnlp with the lovely @jurafsky.
26 Followers 12 FollowingA workshop focused on training personalized LLMs to be held at EMNLP'25 in Suzhou, China Nov. 5 - 9 (https://t.co/uGWENtU466).
14K Followers 519 FollowingAsst. Prof. of CS at Stanford, Google DeepMind. Prev: Anthropic, Google Brain. Co-Creator of MoEs, AlphaChip, Test Time Scaling Laws.
992 Followers 119 FollowingLabel Studio, by HumanSignal is the most popular open source data labeling platform for data science & ML/AI. Join the community: https://t.co/b0LgZBdLDc
940 Followers 421 FollowingIncoming PhD student at Carnegie Mellon University. Interested in Agents and Multimodal NLP, advised by Professor Graham Neubig.
6K Followers 3K FollowingUsing #AI and #NLP to study storytelling at McGillU. Director of .txtlab and author of the forthcoming book, Why You Should Read More Fiction.