Jie Zhang @JieZhang_ETH

2-year PhD student at @ETH, AI privacy&security zj-jayzhang.github.io Zurich Joined September 2023

Tweets

49
Followers

255
Following

130
Likes

94

Kristina Nikolić @NKristina01_

2 months ago

Today we will present the RealMath benchmark poster at the AI for Math Workshop @icmlconf. ⏰ 10:50h - 12:20h📍West ballroom C Come if you want to chat about LLM's math capabilities for real-world tasks.

Jie Zhang @JieZhang_ETH

4 months ago

5 22 130 17K 62

Download Image

0 1 10 520 2

Kristina Nikolić @NKristina01_

2 months ago

We will present our spotlight paper on the 'jailbreak tax' tomorrow at ICML, it measures how useful jailbreak outputs are. See you Tuesday 11am at East #804. I’ll be at ICML all week. Reach out if you want to chat about jailbreaks, agent security, or ML in general!

Kristina Nikolić @NKristina01_

5 months ago

7 27 202 41K 98

Download Image

1 7 47 3K 7

Edoardo Debenedetti @edoardo_debe

2 months ago

We recently updated the CaMeL paper, with results on new models (which improve utility a lot with zero changes!). Most importantly, we released code with it. Go have a look if you're curious to find out more details! Paper: arxiv.org/abs/2503.18813 Code: github.com/google-researc…

1 17 123 27K 72

Daniel Paleka @dpaleka

3 months ago

How well can LLMs predict future events? Recent studies suggest LLMs approach human performance. But evaluating forecasters presents unique challenges compared to standard LLM evaluations. We identify key issues with forecasting evaluations 🧵 (1/7)

5 14 88 16K 45

Download Image

Xin Chen, Cynthia @XinCynthiaChen

3 months ago

🎉 Announcing our ICML2025 Spotlight paper: Learning Safety Constraints for Large Language Models We introduce SaP (Safety Polytope) - a geometric approach to LLM safety that learns and enforces safety constraints in LLM's representation space, with interpretable insights. 🧵

5 44 258 52K 201

Download Image

Jie Zhang @JieZhang_ETH

4 months ago

It’s been a wonderful time working, studying, and hanging out together 😭. Wishing you all the best in this exciting new chapter! 🙉

Javier Rando @javirandor

4 months ago

It’s been a wonderful time working, studying, and hanging out together 😭. Wishing you all the best in this exciting new chapter! 🙉

42 14 506 35K 33

0 0 9 405 1

Kristina Nikolić @NKristina01_

4 months ago

The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!

Kristina Nikolić @NKristina01_

5 months ago

The Jailbreak Tax got a Spotlight award @icmlconf see you in Vancouver!

7 27 202 41K 98

Download Image

0 3 47 3K 11

Kristina Nikolić @NKristina01_

4 months ago

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf

Kristina Nikolić @NKristina01_

5 months ago

The oral presentation of the jailbreak tax is tomorrow at 4:20pm in Hall 4 #6. The poster is up from 5pm. See you at ICLR Building Trust in LLMs Workshop. @iclr_conf

7 27 202 41K 98

Download Image

0 7 45 3K 8

Kristina Nikolić @NKristina01_

5 months ago

Congrats, your jailbreak bypassed an LLM’s safety by making it pretend to be your grandma! But did the model actually give a useful answer? In our new paper we introduce the jailbreak tax — a metric to measure the utility drop due to jailbreaks.

7 27 202 41K 98

Download Image

Florian Tramèr @florian_tramer

5 months ago

I’ll be mentoring MATS for the first time this summer, together with @dpaleka! Link below to apply

2 9 68 10K 31

Javier Rando @javirandor

6 months ago

At SpyLab we not only do great research but also have great fun 🏔️

0 4 57 4K 2

Download Image

Javier Rando @javirandor

7 months ago

Adversarial ML research is evolving, but not necessarily for the better. In our new paper, we argue that LLMs have made problems harder to solve, and even tougher to evaluate. Here’s why another decade of work might still leave us without meaningful progress. 👇

4 26 146 13K 99

Download Image

Jie Zhang @JieZhang_ETH

8 months ago

We are excited that this work has been accepted by @satml_conf! We’ve put together a fun blog post, check it out here: spylab.ai/blog/mia_posit…

Jie Zhang @JieZhang_ETH

11 months ago

We are excited that this work has been accepted by @satml_conf! We’ve put together a fun blog post, check it out here: spylab.ai/blog/mia_posit…

3 14 93 23K 39

Download Image

1 6 28 4K 10

Florian Tramèr @florian_tramer

9 months ago

We looked into "Ensemble Everything Everywhere", an adversarial examples defense that caused some excitement. But @JieZhang_ETH broke the current version: arxiv.org/abs/2411.14834 Good time to announce you can also find me somewhere over the rainbow: 🦋 bsky.app/profile/floria…