New blog! We @AISecurityInst partnered with @NCSC to write about an emerging practice I'm really excited about: Safeguard Bypass Bounty Programmes (SBBPs). Summary of what these are, why they are useful, & how to do them well 🧵
Since I started working on safeguards, we've seen substantial progress in defending certain hosted models, but less progress in measuring & managing misuse risks from open weight models. Three directions I want explored more, drawn from our @AISecurityInst post today 🧵
🚨Open-weight AI models are becoming more powerful, now knocking on the door of today’s closed-weight frontier.
This poses critical safety challenges – how can we prevent the misuse of models whose parameters are free to download online? 🧵
325K Followers 3K FollowingNVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
42K Followers 865 FollowingFR/US/GB AI/ML Person, Director of Research at @GoogleDeepMind, Honorary Professor at @UCL_DARK, @ELLISforEurope Fellow. All posts are personal.
56K Followers 853 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
9K Followers 5K FollowingResearch in ML/NLP at the U of Edinburgh (tenured faculty @InfAtEd @EdinburghNLP), Co-Founder @Miniml_AI, @ELLISforEurope Scholar, https://t.co/5dUI3EFexo
62K Followers 12K FollowingAI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
18K Followers 4K FollowingAI professor.
Deep Learning, AI alignment, ethics, policy, & safety.
Formerly Cambridge, Mila, Oxford, DeepMind, ElementAI, UK AISI.
AI is a really big deal.
4K Followers 197 FollowingUCL Deciding, Acting, and Reasoning with Knowledge (DARK) Lab at @AI_UCL led by @_rockt, @egrefen, @robertarail, and @jparkerholder.
4K Followers 417 FollowingCofounder & CEO @WecoAI.
Automating hill climbing with AI-Driven Exploration (AIDE).
PhD in Machine Learning @UCL_DARK.
(Zheng=j-uhng, j as in job; yao=y-aoww)
124 Followers 1K FollowingOfficial journal of China Society of Image and Graphics (CSIG). The jouarnl is published by Springer, sponsored by CSIG. E-ISSN 2731-9008.
5K Followers 2K FollowingResearch Scientist (Frontier Planning) at @GoogleDeepMind.
Research Affiliate @Cambridge_Uni @CSERCambridge & @LeverhulmeCFI.
All views my own.
5K Followers 891 FollowingFaculty at @ELLISInst_Tue & @MPI_IS, leading the AI Safety and Alignment group.
PhD from @EPFL supported by Google & OpenPhil PhD fellowships.
114 Followers 417 FollowingHacking SEO as Director for https://t.co/JJSjby3RSM. Speaking on the topics in SEO, crypto, AI. A queer techie + Coffee Geek ☕️
29 Followers 502 FollowingOfficial account of ASES academic journals. Peer-reviewed, open access, multidisciplinary publications and calls for papers.
Publishing in 7 fields
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
42K Followers 865 FollowingFR/US/GB AI/ML Person, Director of Research at @GoogleDeepMind, Honorary Professor at @UCL_DARK, @ELLISforEurope Fellow. All posts are personal.
637K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
56K Followers 853 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
30K Followers 123 FollowingMechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
12K Followers 745 FollowingResearch Scientist, Deepmind
I try to think hard about everything I tweet, esp on 90s football and 80s music
None of my opinions are really someone else's
16K Followers 349 FollowingCSO & co-founder, Reliant AI. Ex RL research lead at Google Brain, DeepMind. Known for Atari 2600 RL benchmark, Distributional RL (MIT Press 2023).
62K Followers 12K FollowingAI policy researcher, wife guy in training, fan of cute animals and sci-fi, Substack writer, stealth-ish non-profit co-founder
75K Followers 530 FollowingWearables with brains for people with heart. Turn tiny moments of awesome into the best times ever. Tell the world how you #MakePebbleYours ❤️
3K Followers 857 FollowingHead of Alignment at the UK AI Security Institute (AISI). Semi-informed about economics, physics and governments. views my own
2K Followers 466 FollowingYak Shaver and Security Researcher. Head of Research&Development at Chainlink Labs. Formerly at 🇨🇭 ETH Zürich, 🗽 Cornell Tech,⛓️ IC3.
5K Followers 2K FollowingResearch Scientist (Frontier Planning) at @GoogleDeepMind.
Research Affiliate @Cambridge_Uni @CSERCambridge & @LeverhulmeCFI.
All views my own.
2K Followers 1K FollowingCo-Executive Director @MATSprogram, Co-Founder @LondonSafeAI, Regrantor @Manifund | PhD in physics | Accelerate AI alignment + build a better future for all
56 Followers 213 FollowingPhD student in Foundational AI @ucl @ai_ucl @uclcs
Enrichment Fellow @turinginst
2x ML Research Intern at Apple working on Differential Privacy
262 Followers 762 FollowingIncoming AI safety and technical AI governance DPhil @UniofOxford • MSc in AI at ETH Zurich • 2x @MATSprogram • Talos AI Governance Fellowship • 🇪🇺🇨🇿
3K Followers 1K FollowingCTO at Robust Intelligence. Formerly, Microsoft, Endgame/Elastic, Mandiant/FireEye, Sandia & MIT Lincoln Labs.
'He who forgives ends the quarrel'
10K Followers 1K FollowingWaiting on a robot body. All opinions are universal and held by both employers and family. Now a dedicated grok hate account.
Accepting ML/NLP PhD students.
41K Followers 245 FollowingProfessor of Machine Learning, University of Oxford
@OATML_Oxford Group Leader
Director of Research at the UK govt's AI Security Institute (AISI)
6K Followers 365 FollowingSafety and alignment at Meta Superintelligence. Prev: VP of Research at Scale AI, research at Google DeepMind / Brain (Gemini, LaMDA, RL / TFAgents, AlphaChip).
1.4M Followers 958 FollowingMenswear writer. Editor at @putthison. Creator of @RLGoesHard. Bylines at The New York Times, The Financial Times, Politico, Esquire, and Mr. Porter