Daniel Murfet @danielmurfet

Mathematician. Head of Research at Timaeus. Working on Singular Learning Theory and AI alignment. therisingsea.org Melbourne, Victoria Joined June 2012

Tweets

4K
Followers

2K
Following

544
Likes

2K

Joshua Batson @thebasepoint

2 days ago

This is a neat approach to attribution! It leaves open a question that we couldn't answer either: how to properly attribute through attention *patterns* to features, in a "relevance"/"influence"-spirited way.

Farnoush Rezaei-Jafari @FarnoushRJ

3 days ago

1 0 8 3K 5

3 1 22 2K 9

Daniel Filan @dfrsrchtwts

4 days ago

yearn to contemplate the platonic forms? captivated by the geometry of balls rolling down valleys something something rainbow serpent something something cell biology? apply to work with @danielmurfet and @jesse_hoogland in the Winter MATS cohort by Oct 2.

0 1 16 873 0

davidad 🎇 @davidad

5 months ago

At 🇬🇧ARIA, we’re serious about catalysing a new paradigm for AI deployment—techniques to safely *contain* powerful AI (instead of “making it safe”), especially for improving the performance and resilience of critical infrastructure. This needs a new org. Want to be its founder?

ARIA @ARIA_research

5 months ago

1 7 49 51K 13

21 28 243 49K 69

Download Image

Marcus Hutter @mhutter42

2 weeks ago

Reflective-Oracle AIXI solves the Grain of Truth problem for super-intelligent multi-agent systems/societies. Finally the long-awaited more comprehensive treatment building upon earlier work from last decade is out. Slides: hutter1.net/publ/sgot.pdf Paper: arxiv.org/abs/2508.16245

9 14 81 6K 54

Download Image

algebraic geometer (derogatory) @d_m_d_m_d_d

2 weeks ago

calculation of global sections of line bundles on projective varieties

florence 🐝 @morallawwithin

2 weeks ago

calculation of global sections of line bundles on projective varieties

122 28 824 55K 90

1 12 102 4K 8

Tom McGrath @banburismus_

2 weeks ago

post-training is weird, and can have all sorts of surprising side effects - extreme sycophancy, hallucinations, mechahitler... what can we do? we have a great new technique for surfacing unexpected behaviours during finetuning that might help!

Goodfire @GoodfireAI

2 weeks ago

10 44 379 44K 199

1 4 68 4K 20

Greg Jefferis @gsxej

2 weeks ago

Neuronal diversity is written in transcriptional codes 🧬. But what is the logic of these codes that define cell types and wiring patterns? To find out we built a #scRNAseq developmental atlas of the Drosophila nerve cord and linked it to the #connectome 🪰🧠 Tweeprint! ⬇️1/8

3 29 133 15K 42

Download Image

Goodfire @GoodfireAI

2 weeks ago

(6/7) Of course, a full solution also requires tools to mitigate those behaviors once they've been identified - and we're building those, e.g. via behavior steering. We think interp will be core to this - and more broadly, to debugging training for alignment and reliability!

1 1 34 2K 1

Jim Halverson @jhhalverson

3 weeks ago

Grateful to @SimonsFdn for their support of the Physics of Learning, and glad to be a part of this collaboration! Excited to see many breakthroughs in the coming years.

Simons Foundation @SimonsFdn

3 weeks ago

Grateful to @SimonsFdn for their support of the Physics of Learning, and glad to be a part of this collaboration! Excited to see many breakthroughs in the coming years.

5 30 230 165K 82

0 1 15 954 1

Pratyush Maini @pratyushmaini

3 weeks ago

1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼‍🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance

23 125 702 163K 577

Download Image

Alex Strick van Linschoten @strickvl

3 weeks ago

In parallel I'd been exploring how to make LLMs tangible, i.e. as physical artifacts, not just plots. I started a small project to 'knit' a model in the physical word by mapping token probabilities/attention/layer interactions into a 20×20, three-colour pattern, then render it in…

1 3 7 1K 0

Download Image

Chris Olah @ch402

4 weeks ago

Our interpretability team is planning to mentor more fellows this cycle! Applications are due Aug 17.

Anthropic @AnthropicAI

a month ago

Our interpretability team is planning to mentor more fellows this cycle! Applications are due Aug 17.

62 213 2K 577K 1K

Download Image

17 19 328 36K 127

Tom Burns @tfburns

4 weeks ago

Could the key to more efficient & robust language models come from computational neuroscience? Our paper demonstrates how brain-inspired architectures can enhance in-context learning in Transformers and LLMs. (1/15)

1 2 13 986 6

Download Image

Christopher Potts @ChrisGPotts

4 weeks ago

For a @GoodfireAI/@AnthropicAI meet-up later this month, I wrote a discussion doc: Assessing skeptical views of interpretability research Spoiler: it's an incredible moment for interpetability research. The skeptical views sound like a call to action to me. Link just below.

8 24 303 38K 193

Astera Institute @AsteraInstitute

4 weeks ago

What’s going on inside large AI models? Astera grantees @adamimos and @RiechersPaul are building a new theory of internal structure to better understand intelligence. We sat down with them to learn more about their work as co-founders of Simplex, a research organization:…

1 4 23 8K 11

Download Image

Daniel Murfet @danielmurfet

4 weeks ago

1 1 12 510 0

Download Image

Josh Welch @LabWelch

a month ago

Interested in studying cell differentiation at the cellular level but don't trust your UMAP plots? Try visualizing your cell differentiation in space with our TopoVelo tool!

Josh Welch @LabWelch

2 months ago

Interested in studying cell differentiation at the cellular level but don't trust your UMAP plots? Try visualizing your cell differentiation in space with our TopoVelo tool! https://t.co/Kh87tNWQlZ