Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf.
⏰ 10:50h - 12:20h📍West ballroom C
Come if you want to chat about LLM's math capabilities for real-world tasks.
Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf.
⏰ 10:50h - 12:20h📍West ballroom C
Come if you want to chat about LLM's math capabilities for real-world tasks.
We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are.
See you Tuesday 11am at East #804.
I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!
We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are.
See you Tuesday 11am at East #804.
I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!
We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details!
Paper: arxiv.org/abs/2503.18813
Code: github.com/google-researc…
How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations.
We identify key issues with forecasting evaluations 🧵 (1/7)
🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models
We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights.
🧵
The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf
The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf
Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma!
But did the model actually give a useful answer?
In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.
Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇
We looked into "Ensemble Everything Everywhere", an adversarial examples defense that caused some excitement.
But @JieZhang_ETH broke the current version: arxiv.org/abs/2411.14834
Good time to announce you can also find me somewhere over the rainbow: 🦋 bsky.app/profile/floria…
37 Followers 70 FollowingCS PhD at ETH Zurich interested in machine learning privacy, AI security, diffusion models, cryptography, AI for environment, healthcare, education
153 Followers 543 FollowingReading brains with ML in PhD @UniofOxford.
Research Scientist Intern @Meta. Formerly @GoogleDeepMind.
All opinions stolen from more interesting people.
918 Followers 4K FollowingI straddle two worlds: science/complexity/AI and photography, https://t.co/O56jh9tJmX. This page is focused on the more scientific pursuits.
634 Followers 882 FollowingRS Intern Meta. Second-year PhD student at UT Austin. Working on generative modeling, visual understanding, and visual compression.
10K Followers 7K FollowingThe Internet Ethics program at the Markkula Center for Applied Ethics, Santa Clara University / Irina Raicu behind the keyboard
37 Followers 70 FollowingCS PhD at ETH Zurich interested in machine learning privacy, AI security, diffusion models, cryptography, AI for environment, healthcare, education
2K Followers 20 FollowingTransforming a 16-floor tower into a vertical village for frontier-tech pioneers in San Francisco. The blueprint for an inter-city network society 🧑🚀
20K Followers 451 Followingphysics of language models @ Meta (FAIR, not GenAI)
🎓:Tsinghua Physics — MIT CSAIL — Princeton/IAS
🏅:IOI x 2 — ACM-ICPC — USACO — Codejam — math MCM
788 Followers 427 FollowingDirect Doctorate student @ETH_en, with research focus on AI Safety and Alignment. 1st year in my (full) PhD. Formerly at @CHAI_Berkeley.
163K Followers 166 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
24K Followers 1 Followingcovering the latest AI & LLM research /// see "highlights" for all previous weekly threads /// building the best AI paper search engine @findmypapersai
51 Followers 286 FollowingResearch Scientist at @astar_research | Research Fellow at CSL, @NTUsg | Ph.D. at USTC | Watermarking, trustworthy Gen-AI, AI regulation and copyright
367 Followers 509 Followinghttps://t.co/RGPTjgX7VW. Post-doc in privacy and AI safety at @EPFL and @cydcampus. PhD in data privacy from @imperialcollege.
5K Followers 889 FollowingFaculty at @ELLISInst_Tue & @MPI_IS, leading the AI Safety and Alignment group.
PhD from @EPFL supported by Google & OpenPhil PhD fellowships.
2K Followers 761 FollowingAssistant Professor of Computer Science at @UVA. I work on machine learning, optimization, and Responsible AI (differential privacy & fairness).
584 Followers 1K FollowingAssistant Professor @EECS_UTK, PhD @CS_UVA. Focused on machine learning, security, and privacy. More at https://t.co/ZTNOPr69cL
284 Followers 313 FollowingPhD student @ ETH Zurich, working on AI safety / Uni of Cambridge MLMI graduate / Prev. Google Intern / Alumnus of Mathematical Grammar School from Serbia
34K Followers 832 FollowingProfessor in Computer Science at UC Berkeley, co-Director of Berkeley RDI Center; Building safe, secure, decentralized AI; Serial entrepreneur
57K Followers 568 FollowingAssistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant September 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
1K Followers 515 FollowingAsst Prof at UBC, Vector Institute, Adjunct Asst Pof at Yale. Trustworthy and Efficient AI. CIFAR AI Chair and Canada Research Chair
346K Followers 1K FollowingDeepMind Research Scientist. Opinions my own. Inventor of GANs. Lead author of https://t.co/M6vl8pEQ4I Founding chairman of @pubhealthaction