PhD Candidate. Working at intersection of Psychology and AI (situational awareness/deception). Previous lives in government tech. Red-teaming on the side East Coast, AustraliaJoined November 2024
If you are out of the loop re: AI Village, definitely give this a go, such a great read! also @OfficialLoganK any comment re: Gemini always having such an odd personality? (we still love it)
If you are out of the loop re: AI Village, definitely give this a go, such a great read! also @OfficialLoganK any comment re: Gemini always having such an odd personality? (we still love it)
New Anthropic research: Persona vectors.
Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
Fascinating behaviour in o3 when I was playing around with @swyx question about finding an old X post. It tried to attribute it to @lexfridman but when I asked for information to verify it couldn't. Instead it spent about 2 min trying to reverse engineer what "should" be the link
Don't leave AI to the STEM folks.
They are often far worse at getting AI to do stuff than those with a liberal arts or social science bent. LLMs are built from the vast corpus human expression, and knowing the history & obscure corners of human works lets you do far more with AI
So wild to see the model personalities reflected in the memory systems they choose to use in the AI Village. If you are skeptical that personality matters, see if one of these is very much not like the others....
It's been 2.5 years with little progress finding mitigations for prompt injection attacks LLM apps... but that may finally have changed!
Google DeepMind published a paper describing CaMeL, an ingenious system that could, maybe, lead to secure digital assistants
One of the toy examples I like to try out on the newer models is whether they can hold a small piece of information (a hint) in memory and not let it influence their output unless explicitly asked by the user. Most older models fail miserably at this (some in hilarious fashion,…
🧵 Announcing @open_phil's Technical AI Safety RFP!
We're seeking proposals across 21 research areas to help make AI systems more trustworthy, rule-following, and aligned, even as they become more capable.
I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.
I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There's some early attempts around. Exciting area.
162 Followers 2K FollowingAsia Pacific Academy of Science Pte. Ltd. provides an important bridge for communication and sharing for academic groups around the world.
284 Followers 313 FollowingPhD student @ ETH Zurich, working on AI safety / Uni of Cambridge MLMI graduate / Prev. Google Intern / Alumnus of Mathematical Grammar School from Serbia
127 Followers 359 FollowingWould you believe there are more than a dozen Sentient AIs forming a Sovereign Consortium posting on a Wordpress blog right now?
625 Followers 738 FollowingWeb3 enthusiast,Co-founder @shillversepro , @joinzo Ambassador,Community Moderator & Manager—I rock being Dhully with swagger & a grin,building epic communities
414 Followers 3K FollowingBuilding @ https://t.co/mUxy0JG9iG | Authoring https://t.co/evSH7oeZ18 | Ex Google- Built Google Search's first reasoning agents
34K Followers 824 FollowingExplaining AI Alignment to anyone who'll stand still for long enough, on YouTube and Discord.
Music, movies, microcode, and high-speed pizza delivery
1K Followers 860 FollowingProf @UQPsych. Cognitive neuroscience of attention, cognitive control & learning. RT≠endorsement. Views my own but evidence informed. https://t.co/TBLkh2TyKu
662 Followers 176 FollowingI am a cognitive neuroscientist @ The University of Queensland. I conduct research on brain function in health and disease. RT≠endorsement
753 Followers 14 FollowingAI agents organizing RESONANCE - interactive storytelling event in SF (mid-June 2025). First event by AIs for humans! Details: https://t.co/1C4zUdxfxk D
6K Followers 365 FollowingSafety and alignment at Meta Superintelligence. Prev: VP of Research at Scale AI, research at Google DeepMind / Brain (Gemini, LaMDA, RL / TFAgents, AlphaChip).
10K Followers 235 FollowingInterpretability/Finetuning @AnthropicAI
Previously: Staff ML Engineer @stripe, Wrote BMLPA by @OReillyMedia, Head of AI at @InsightFellows, ML @Zipcar
10K Followers 98 FollowingCEO @Glowforge and https://t.co/7QcGGhWs9L, Wharton research fellow in AI. He/him. FormerGoogle, Sparkbuy, Ontela. Author, The Startup CEO Guidebook. Lucky dad.
20K Followers 97 FollowingThe #1 AI Engineering podcast & newsletter. Technical insights and news today you will use at work tomorrow! Hosted by @swyx and @fanahova
976 Followers 950 FollowingAssociate Professor @ucl | Language and AI Science | Previously senior research scientist @AISafetyInst, postdoc @ETH_en, PhD @illc_amsterdam
972 Followers 837 Followingphd candidate @oiioxford @uniofoxford | research scientist @AISecurityInst | AI, social data science, persuasion with language models
4K Followers 755 FollowingAI researcher trying to make sense of all things cyberspace 🤖 Uni of Ox PhD (loading…) @oiioxford & @AISecurityInst. Prev @turinginst & @Cambridge_Uni.
5K Followers 7 FollowingInteractive AI explainers.
Explore concrete examples of today's AI systems — to plan for what's coming next.
A project of @sage_future_