Joe @joemkwon

trying to think about what good futures (embedded with powerful AI systems) might look like Cambridge, MA Joined March 2019

Tweets

815
Followers

807
Following

2K
Likes

3K

Joe @joemkwon

3 years ago

I should revisit this soon!

1 0 0 0 0

0 0 0 182 0

I didn't think it would happen in just over a year, but funny to look back on this because it sounds so ridiculous (in hindsight, as is often the case) :p Only had 5 poll votes, but IIRC all CS PhDs at top programs!

Joe @joemkwon

a year ago

2 0 4 909 1

0 0 3 337 0

Tyler Brooke-Wilson @T_BrookeWilson

2 months ago

How do people reason while still staying coherent – as if they have an internal ‘world model’ for situations they’ve never encountered? A new paper on open-world cognition (preview at the world models workshop at #ICML2025!)

4 26 143 18K 90

Download Image

xuan (ɕɥɛn / sh-yen) @xuanalogue

2 months ago

At NUS, I'll be starting the Cooperative Systems & Intelligence (CoSI) lab to scale rational approaches to cooperative AI that are safe+reliable by design - for both individual AI assistance & the cooperative infrastructure we need for an increasingly automated future.

8 9 130 17K 12

Joe @joemkwon

2 months ago

AI consciousness won’t necessarily move through time like ours does. We’re in sequential moments — breakfast, then lunch, then dinner. an AI with the same weights and context can talk to you today and your descendant in 2050, experiencing both conversations as equally “present.”…

1 0 5 239 0

Raphaël Millière @raphaelmilliere

3 months ago

Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 1/13

3 25 104 8K 79

Download Image

Philipp Schoenegger @SchoeneggerPhil

4 months ago

New preprint out with an amazing 40-person team! We find that Claude 3.5 Sonnet outperforms incentivised human persuaders in a >1000-participant live quiz-chat in deceptive and truthful directions!

4 33 152 23K 61

Download Image

Yoshua Bengio @Yoshua_Bengio

7 months ago

Today, we are publishing the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN, and EU. It summarises the state of the science on AI capabilities and risks, and how to mitigate those risks. 🧵 Link to full Report: assets.publishing.service.gov.uk/media/679a0c48… 1/16

49 528 1K 393K 766

Download Video

Samuel Marks @saprmarks

8 months ago

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.

10 67 329 106K 324

Download Image

Joschka Braun @BraunJoschka

9 months ago

1/ New Blog Post: "A Sober Look at Steering Vectors for LLMs" We identify 3 key challenges: 1. Steering vectors are unreliable for many concepts & tasks 2. Steering harms overall model performance 3. Metrics overestimate steering effectiveness We propose 4 recommendations 🧵👇

3 13 91 10K 62

Joe @joemkwon

10 months ago

I don't "know" one of my passwords in a symbolic sense. But some part of my motor-neuro system unconsciously knows it (w.r.t. QWERTY keyboard). thought this was interesting. Ive other examples of bad memory e.g. lapses in recalling the names of restaurants and people I've…

0 0 1 264 0

Joe @joemkwon

11 months ago

let us gather and think about the motion in tail swinging of bovine vs in double pendulums

0 0 0 244 0

xuan (ɕɥɛn / sh-yen) @xuanalogue

a year ago

Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that @MicahCarroll @FranklinMatija @hal_ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!

35 171 999 281K 871

Download Image

William Fedus @LiamFedus

12 months ago

Happy to release a couple of our reasoning models today (🍓)! At @OpenAI , these new models are becoming a larger contributor to the development of future models. For many of our researchers and engineers, these have replaced a large part of their ChatGPT usage.…