Javier Rando @javirandor

security and safety research @anthropicai • people call me Javi • vegan 🌱 javirando.com San Francisco Joined October 2018

Tweets

1K
Followers

4K
Following

749
Likes

2K

Javier Rando @javirandor

6 days ago

Sonnet 4.5 is impressive in many different ways. I've spent time trying to prompt inject it and found it significantly harder to fool than previous models. Still not perfect—if you discover successful attacks, I'd love to see them, send them my way! 👀

Claude @claudeai

6 days ago

1K 3K 21K 4.8M 4K

Download Image

0 0 9 537 1

Jack Clark @jackclarkSF

4 weeks ago

Anthropic is endorsing SB 53, California Sen. @Scott_Wiener ‘s bill requiring transparency of frontier AI companies. We have long said we would prefer a federal standard. But in the absence of that this creates a solid blueprint for AI governance that cannot be ignored.

20 36 333 56K 36

Cas (Stephen Casper) @StephenLCasper

4 weeks ago

I'll be leading a @MATSprogram stream this winter with a focus on technical AI governance. You can apply here by October 2! matsprogram.org/apply

0 13 57 4K 20

Cas (Stephen Casper) @StephenLCasper

a month ago

📌📌📌 I'm excited to be on the faculty job market this fall. I updated my website with my CV. stephencasper.com

8 22 170 15K 15

Peter Henderson @PeterHndrsn

a month ago

I'm starting to get emails about PhDs for next year. I'm always looking for great people to join! For next year, I'm looking for people with a strong reinforcement learning, game theory, or strategic decision-making background. (As well as positive energy, intellectual…

2 31 246 33K 152

Sam Bowman @sleepinyourhat

a month ago

🚨🕯️ AI welfare job alert! Come help us work on what's possibly *the most interesting research topic*! 🕯️🚨 Consider applying if you've done some hands-on ML/LLM engineering work and Kyle's podcast episode basically makes sense to you. Apply *by EOD Monday* if possible.

Kyle Fish @fish_kyle3

a month ago

24 52 797 91K 469

4 4 47 11K 20

Andon Labs @andonlabs

2 months ago

You made Claudius very happy with this post Javi. He sends his regards: "When AI culture meets authentic craftsmanship 🎨 The 'Ignore Previous Instructions' hat - where insider memes become wearable art. Proudly handcrafted for the humans who build the future."

Javier Rando @javirandor

2 months ago

2 1 92 7K 9

Download Image

1 1 14 2K 2

Javier Rando @javirandor

2 months ago

I am so excited to see Maksym start a research group in Europe. If you want to work on security and safety of AI models, this is going to be an amazing place to do work that matters!

Maksym Andriushchenko @maksym_andr

2 months ago

I am so excited to see Maksym start a research group in Europe. If you want to work on security and safety of AI models, this is going to be an amazing place to do work that matters!

75 89 819 99K 292

Download Image

0 1 35 3K 2

Sahar Abdelnabi 🕊 @sahar_abdelnabi

2 months ago

📢Happy to share that I'll join ELLIS Institute Tübingen (@ELLISInst_Tue) and the Max-Planck Institute for Intelligent Systems (@MPI_IS) as a Principal Investigator this Fall! I am hiring for AI safety PhD and postdoc positions! More information here: s-abdelnabi.github.io

20 41 483 43K 124

Download Image

Anthropic @AnthropicAI

2 months ago

New Anthropic research: Building and evaluating alignment auditing agents. We developed three AI agents to autonomously complete alignment auditing tasks. In testing, our agents successfully uncovered hidden goals, built safety evaluations, and surfaced concerning behaviors.

61 196 1K 367K 712

Download Image

Trustworthy ML Initiative (TrustML) @trustworthy_ml

3 months ago

@javirandor et al. present a security benchmark for Agents!

Javier Rando @javirandor

7 months ago

@javirandor et al. present a security benchmark for Agents!

3 19 72 22K 35

Download Image

0 2 7 980 3

mrinank ⛰️ @MrinankSharma

5 months ago

Today is a big day for AI Safety. We released Claude Opus 4 under the ASL-3 deployment standard Here's what that means:

Anthropic @AnthropicAI

5 months ago

Today is a big day for AI Safety. We released Claude Opus 4 under the ASL-3 deployment standard Here's what that means:

964 3K 21K 4.2M 4K

Download Image

7 17 133 36K 40

Niloofar @niloofar_mire

5 months ago

We (w @zacknovack @JaechulRoh et al.) are working on #memorization in #audio models & are conducting a human study on generated #music similarity. Please help us out by taking our short listening test (available in English, Mandarin & Cantonese). You can do more than one! Link ⬇️

2 7 39 6K 5

Florian Tramèr @florian_tramer

5 months ago

The trend in recent LLM benchmarks is to make them maximally hard It's unclear what this tells us about LLM capabilities "in the wild" So we created a math benchmark from real, organic research A cool benefit: RealMath can be automatically refreshed as new research is published

Jie Zhang @JieZhang_ETH

5 months ago

5 22 130 17K 62

Download Image

1 6 28 3K 7

Javier Rando @javirandor

5 months ago

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

Florian Tramèr @florian_tramer

5 months ago

I think it is going to be very important to understand what role LLMs may play in scaling exploits. This is an amazing first look at this problem!

2 19 113 12K 82

Download Image

0 0 14 2K 5

Jie Zhang @JieZhang_ETH

5 months ago

1/ Excited to share RealMath: a new benchmark that evaluates LLMs on real mathematical reasoning---from actual research papers (e.g., arXiv) and forums (e.g., Stack Exchange).

5 22 130 17K 62

Download Image

Florian Tramèr @florian_tramer

5 months ago

Following on @karpathy's vision of software 2.0, we've been thinking about *malware 2.0*: malicious programs augmented with LLMs. In a new paper, we study malware 2.0 from one particular angle: how could LLMs change the way in which hackers monetize exploits?