🚨 New paper out! 📄
What happens when LLMs & RLMs face conflicting answers to a question? 🤔
They often ignore disagreement and confidently pick one “correct” answer. 🤯
📄 arxiv.org/pdf/2508.12355#AI#LLM#NLP#MachineLearning
Producing reasoning texts boosts the capabilities of AI models, but do we humans correctly understand these texts? Our latest research suggests that we do not.
This highlights a new angle on the "Are they transparent?" debate: they might be, but we misinterpret them. 🧵
🚨 New paper alert! 🚨
We propose an IQ Test for LLMs — a new way to evaluate models that goes beyond benchmarks and uncovers their core skills.
Think: 🧠🤖 psychometrics for LLMs.
👇
(1/6)
We release:
✅ Code
✅ Leaderboard
✅ Skill matrices & tools
Let’s shift to skill‑based evaluation for LLMs!
Full paper here 👉 arxiv.org/abs/2507.20208
(6/6)
🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔
We're excited to introduce our paper:
“DRAGged into CONFLICTS: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs”🚀
A thread 🧵👇
🧵 New paper at Findings #ACL2025@aclmeeting!
Not all documents are processed equally well. Some consistently yield poor results across many models.
But why? And can we predict that in advance?
Work with Steven Koniaev and Jackie Cheung @Mila_Quebec@McGill_NLP#NLProc
(1/n)
🚨 Introducing LAQuer, accepted to #ACL2025 (main conf)!
LAQuer provides more granular attribution for LLM generations: users can just highlight any output fact (top), and get attribution for that input snippet (bottom). This reduces the amount of text the user has to read by 2…
Excited to present our system demonstration paper on EventFull — an Event-Event Relation annotation tool — at #NAACL25
Come see us Thursday, May 1, at Poster Session I (16:00–17:30)
(Paper and tool links at the end of the thread👇)
LLMs struggle with tables—but how robust are they really?
🔍 ToRR goes beyond accuracy, testing real-world robustness across formats & tasks.
📊 Different formats, same data—models show brittle behavior affecting rankings.
Prompt configuration is a key dimension for evaluation!🚀
110 Followers 404 FollowingPhDing in #NLProc at @kclinformatics, class of 2021 @UCDCompSci. Interested in LLM interpretability, reasoning and alignment. Views are my own.
110 Followers 404 FollowingPhDing in #NLProc at @kclinformatics, class of 2021 @UCDCompSci. Interested in LLM interpretability, reasoning and alignment. Views are my own.
1K Followers 277 FollowingTell me about challenges, the unbelievable, the human mind and artificial intelligence, thoughts, social life, family life, science and philosophy.
10K Followers 1K FollowingAssistant Professor @UBC_CS & @VectorInst working on Natural Language Processing. Book: https://t.co/aBnNW4HaQ3. 🦋: @veredshwartz.bsky.social
20K Followers 1 FollowingThe official Twitter of Abu Ali Express in English - Bringing you all the activity from Telegram.
https://t.co/mISpvYqRmT
[email protected]
2K Followers 175 FollowingOut of core dataframes for Python, visualize and explore big tabular data at a billion rows per second. ML ready. https://t.co/GmNIpbuNlY @maartenbreddels @JovanVaex
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
1.3M Followers 1K FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
75K Followers 704 FollowingA community for developers and users of open source scientific tools with 200K+ people 🧑🔬 🧑💻, by @NumFOCUS. Join our Discord: https://t.co/rmBFaQvdMM
No recent Favorites. New Favorites will appear here.