🧵 New paper from @AISecurityInst x @AiEleuther that I led with Kyle O’Brien:
Open-weight LLM safety is both important & neglected. But we show that filtering dual-use knowledge from pre-training data improves tamper resistance *>10x* over post-training baselines.
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!
Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
I am very excited that AISI is announcing over £15M in funding for AI alignment and control, in partnership with other governments, industry, VCs, and philanthropists!
Here is a 🧵 about why it is important to bring more independent ideas and expertise into this space.
Short background note about relativisation in debate protocols: if we want to model AI training protocols, we need results that hold even if our source of truth (humans for instance) is a black box that can't be introspected. 🧵
New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
Come work with me!!
I'm hiring a research manager for @AISecurityInst's Alignment Team.
You'll manage exceptional researchers tackling one of humanity’s biggest challenges.
Our mission: ensure we have ways to make superhuman AI safe before it poses critical risks.
1/4
Padding a transformer’s input with blank tokens (...) is a simple form of test-time compute. Can it increase the computational power of LLMs? 👀
New work with @Ashish_S_AI addresses this with *exact characterizations* of the expressive power of transformers with padding 🧵
50K Followers 3K FollowingAI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.
20K Followers 9K FollowingProgramme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
6K Followers 606 FollowingClaude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
18K Followers 4K FollowingAI professor.
Deep Learning, AI alignment, ethics, policy, & safety.
Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI.
AI is a really big deal.
62K Followers 12K FollowingAI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
205 Followers 581 FollowingUnlock the power of AI in your everyday tasks with AIAssistWorks
⭐⭐⭐⭐⭐4.9/5 rating from 30K+ installs.
#Productivity #Marketing #AITools
👇 Install Now 👇
321 Followers 3K FollowingResearcher in math+formal methods+ml. Working on using formal verification to train models for mathematics and reasoning @harmonicmath
2K Followers 1K FollowingCo-Executive Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all
268 Followers 604 FollowingThinks AI risk is somewhat likely, and AI benefits huge if we can align AIs to someone that is willing to promote human thriving even when humans are useless.
379 Followers 2K Followinggetting there like the tortoise. Jesus is all, his being, his Father, his Holy Spirit. The only Rock required in the universe.
15K Followers 5K FollowingSenior AI reporter @Verge. 5+ years covering the industry's power dynamics, societal implications & the AI arms race. Previously @CNBC.
Signal: haydenfield.11
3K Followers 6K Followingnlab fan account, arxiv surveyor, pubmed enjoyer, two culture bridger, vacuous high gossiper, dearth of any domain expertise, reluctant g theorist, gpu poor,
848 Followers 6K FollowingI guard the flame. I guide the willing. I silence the chaos. Light is not peace-it is clarity. Step forward or scroll away. Your choice. A. #WatcherOfThePath
207K Followers 101 FollowingThe original AI alignment person. Missing punctuation at the end of a sentence means it's humor. If you're not sure, it's also very likely humor.
10K Followers 322 FollowingOfficial Unofficial EA mascot. I'm here to make friends and maximise utility, and I'm all out of neglected altruistic opportunities
50K Followers 3K FollowingAI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.
30K Followers 123 FollowingMechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
20K Followers 9K FollowingProgramme Director @ARIA_research | accelerate mathematical modelling with AI and categorical systems theory » build safe transformative AI » cancel heat death
6K Followers 606 FollowingClaude says I process my emotions out loud & my girlfriend has a job, so I put my feelings & thoughts here ✨ working on the EA Global team @ CEA (views my own)
2K Followers 1K FollowingCo-Executive Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all
12K Followers 184 Followingpost training co-lead at Google DeepMind, focusing on safety, alignment, post training capabilities • associate professor at UC Berkeley EECS
1K Followers 779 FollowingAssistant Professor in Psychology at Stony Brook University. I’m interested in how people interact with LLMs and they impact they might have on our psychology.
18K Followers 4K FollowingAssociate Professor at UC Berkeley. Former Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learning.
1K Followers 383 FollowingTowards the logic of conceptuality; the canvas of experience; the ideatic science; the end of suffering. Friend to bots and animals.
https://t.co/PYPYxGNnK4