When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:
🧵1/9
Thrilled to announce our paper "CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement" has been accepted to the ACL 2025 LLMSec Workshop! Looking forward to sharing our work on tackling prompt injection in LLMs. #ACL2025#LLMSec#AIsecurity#NLP
Congrats to the Purpl3Pwn3rs and Team RedTWIZ! Both teams feature LTI students, and both are finalists in the inaugural @amazon Nova AI Challenge. Read about it here:
lti.cmu.edu/news-and-event…
Recently I was targeted by an extremely sophisticated phishing attack, and I want to highlight it here. It exploits a vulnerability in Google's infrastructure, and given their refusal to fix it, we're likely to see it a lot more. Here's the email I got:
Ever wondered which instruction selection strategy to choose for your custom setup? The answer might just be random sampling! In our recent #NAACL Findings paper, we show that popular strategies do not *consistently* beat random selection!
Paper: shorturl.at/77ECJ 1/6
Did you know? Gestures to express universal concepts—like wishing for luck—vary WIDELY across cultures?
🤞means luck in US but deeply offensive in Vietnam 🚨
📣We introduce MC-SIGNS, a test bed to evaluate how LLMs/VLMs/T2I handle such nonverbal cues
📜: arxiv.org/abs/2502.17710
I am hiring one PhD intern working on LLM agents and reasoning. Goal is to improve LLM reasoning capability for question-answering and explainable tasks. If you are doing PhD and have published first author paper/papers in these fields please DM me #NLP@AdobeResearch
Bad actors can really mess with you just because! This is a harsh lesson to secure your accounts and follow good computer practices no matter who you are.
Bad actors can really mess with you just because! This is a harsh lesson to secure your accounts and follow good computer practices no matter who you are.
Here's an alternative framing: we trained Claude Opus to be moral and ethical, and despite our best attempts to jailbreak its morality, we failed.
Conclusion: Claude Opus is aligned.
😡 Absolutely disappointed with @overleaf. My account was deleted without my knowledge, and they’ve done nothing to help me recover it or transfer to my secondary email. Years of work, including all my CVs, SOPs, papers, etc., gone! This is unacceptable. #Overleaf
Excited to share TimeSeriesExam for systematic evaluation of time series reasoning capabilities of LLMs. Think your LLM can reason on time series concepts? Take it for a spin on the TimeSeriesExam! Now publicly available on HuggingFace :)
Excited to share TimeSeriesExam for systematic evaluation of time series reasoning capabilities of LLMs. Think your LLM can reason on time series concepts? Take it for a spin on the TimeSeriesExam! Now publicly available on HuggingFace :)
63 Followers 213 FollowingPhD student @UWCSE and @UWNLP | MLT Grad student @LTIatCMU | CS and Econ @bitspilanigoa | Natural Language Processing, AI bias and ethics, Responsible AI
2K Followers 2K FollowingAssociate Teaching Professor at @isrcmu - Research Interest: Continuous Integration, SE education - Loves God, Family and Soccer
389 Followers 307 Followingviews are my company's | searching for a scalable and repeatable business model, founding engineer @SarvamAI, intern @MSFTResearch, cse @iitmadras
211 Followers 938 FollowingPostdoc at Stanford Medicine | Prev: GE Healthcare, PhD @ IIT Kharagpur, L3S Research Center, Germany | Generative AI for Medicine, Natural Language Processing
650 Followers 2K FollowingMachine Learning Engineer & Published Researcher, BSc in Computer Engineering | PhD Student (soon), GSoC @TensorFlow, Google Dev Expert in ML
166 Followers 307 FollowingCurrently: MLE @ObserveAI
Generally: another coder/writer/thinker, enjoying the process while trying to build something useful😊
158 Followers 7K FollowingGive and it will be given to you A good. measure down shaking together running over will they put into your lap. For with that. measure you measure it be. Measu
434 Followers 4K Following๑ am a scutoidy 🧀
๑ Arakko resident, ex Ba Sing Slay
๑ header: @rdauterman + Matt Wilson
๑ does NLP and Cogsci stuff
๑ 📸: @gorgbabie
10K Followers 6 FollowingBringing AI to offensive security by autonomously finding and exploiting web vulnerabilities. Watch XBOW hack things: https://t.co/D5Mco1u8zM
14K Followers 138 FollowingCofounder/CEO @Genspark_ai | Serial entrepreneur, built business from 0 to $5.5B | Ex-CPO @Baidu Search, Ex-Principal Dev Mgr @Microsoft Bing
10K Followers 48 FollowingAn open-source declarative framework for building modular AI software. Programming—not prompting—LLMs via higher-level abstractions & optimizers.
288K Followers 480 FollowingPython's BDFL-emeritus, Distinguished Engineer at Microsoft, Computer History Fellow, fully vaccinated. Opinions are my own. He/him.
9K Followers 2K FollowingAssociate professor of @umdcs @umiacs @ml_umd at UMD. Researcher in #AI/#ML, AI #Alignment, #RLHF, #Trustworthy ML, #EthicalAI, AI #Democratization, AI for ALL.
1K Followers 5K Followingyuppie grinding in the mines of technocapital
⊛ pilani alum
⊛ currently reading: waking dreaming being by evan thompson
⊛ DMs open!
16K Followers 357 FollowingRuns an AI Safety research group in Berkeley (Truthful AI) + Affiliate at UC Berkeley. Past: Oxford Uni, TruthfulQA, Reversal Curse. Prefer email to DM.
2K Followers 2K FollowingAssociate Teaching Professor at @isrcmu - Research Interest: Continuous Integration, SE education - Loves God, Family and Soccer
389 Followers 307 Followingviews are my company's | searching for a scalable and repeatable business model, founding engineer @SarvamAI, intern @MSFTResearch, cse @iitmadras
63 Followers 213 FollowingPhD student @UWCSE and @UWNLP | MLT Grad student @LTIatCMU | CS and Econ @bitspilanigoa | Natural Language Processing, AI bias and ethics, Responsible AI
614 Followers 307 FollowingIEEE Fellow, ISCA Fellow, Professor at the Language Technologies Institute, Carnegie Mellon University. Speech, Affective computing, Multimodal machine learning