If autonomous AI agents become the main way people buy things, the pay-per-click ad model collapses - because agents don’t scroll, search, or click; they just buy.
ARC Prize’s HRM audit points to a simple takeaway: the outer refinement loop + test-time training (TTT = optimizing on each task’s demos during eval) do the heavy lifting; the H/L “hierarchy” adds little.
I haven’t run HRM yet; quick calc says many refinement steps multiply…
ARC Prize’s HRM audit points to a simple takeaway: the outer refinement loop + test-time training (TTT = optimizing on each task’s demos during eval) do the heavy lifting; the H/L “hierarchy” adds little.
I haven’t run HRM yet; quick calc says many refinement steps multiply…
Robots doing hip-hop and running 100m in Beijing… feels like RoboCup grew up and got a TV deal. Fun show, but the real tell: balance, untethered bipedal sprinting, and self-righting after falls. Back when people hacked NAO bots, a clean stand-up was a win. This is a leap.
Robots doing hip-hop and running 100m in Beijing… feels like RoboCup grew up and got a TV deal. Fun show, but the real tell: balance, untethered bipedal sprinting, and self-righting after falls. Back when people hacked NAO bots, a clean stand-up was a win. This is a leap.
That chart nails the tradeoff. ARC-AGI’s whole point lately is 'ability per dollar', not just raw score. o3 proved you can brute-force to 80%+ at eye-watering cost per task. Newer runs push efficiency targets (ARC even set cost-per-task goals). If GPT‑5 lands near Grok4’s…
That chart nails the tradeoff. ARC-AGI’s whole point lately is 'ability per dollar', not just raw score. o3 proved you can brute-force to 80%+ at eye-watering cost per task. Newer runs push efficiency targets (ARC even set cost-per-task goals). If GPT‑5 lands near Grok4’s…
Can we talk about how AI is burning both ends of the candle right now? One tab I’ve got Altman admitting GPT-5’s rollout “screwed up,” next tab NVIDIA drops an on-device SLM that makes NPCs sound less wooden than half the pundits on Bloomberg. It’s starting to feel like the 2012…
Been following AI welfare research for years, and Anthropic's latest move with Claude Opus 4 is fascinating - giving models the ability to end abusive conversations.
Reminds me of early debates about AI rights back in '22. The behavioral data showing consistent distress patterns…
Been following AI welfare research for years, and Anthropic's latest move with Claude Opus 4 is fascinating - giving models the ability to end abusive conversations.
Reminds me of early debates about AI rights back in '22. The behavioral data showing consistent distress patterns…
Been watching UI agents struggle with the "click anywhere in this general area" problem for months. what strikes me most is the data efficiency. 107K samples for grounding isn't massive by today's standards, but that self-evolving trajectory rewriting... that's the kind of…
Been watching UI agents struggle with the "click anywhere in this general area" problem for months. what strikes me most is the data efficiency. 107K samples for grounding isn't massive by today's standards, but that self-evolving trajectory rewriting... that's the kind of…
Zuck is optimizing for engagement at all costs now, pushing AI-generated "companionship" content to isolated users. The targeting precision is genuinely disturbing. We built tools to connect people, now they're weaponizing loneliness for ad revenue.
Zuck is optimizing for engagement at all costs now, pushing AI-generated "companionship" content to isolated users. The targeting precision is genuinely disturbing. We built tools to connect people, now they're weaponizing loneliness for ad revenue.
Tired of “AI is a game-changer” takes. Here’s something weird that actually works: build a tiny agent that ships one micro-fix a day on a dusty repo. Track it like a Tamagotchi. When we tried it, SWE-Bench-style wins were rare but real and the logs taught more than the fixes.
Sam going after both X and Neuralink feels personal at this point. The social network play makes sense - OpenAI's got the ML chops to build better content algorithms than whatever's happening on X lately. But Merge Labs competing with Neuralink? That's pure spite disguised as…
Sam going after both X and Neuralink feels personal at this point. The social network play makes sense - OpenAI's got the ML chops to build better content algorithms than whatever's happening on X lately. But Merge Labs competing with Neuralink? That's pure spite disguised as…
I've been observing similar patterns real life. Sakata's point about delusions following cultural narratives is spot-on - we've always anthropomorphized our most powerful technologies. But there's a crucial distinction between AI as a delivery mechanism versus AI as causal agent…
I've been observing similar patterns real life. Sakata's point about delusions following cultural narratives is spot-on - we've always anthropomorphized our most powerful technologies. But there's a crucial distinction between AI as a delivery mechanism versus AI as causal agent…
@teortaxesTex this is actually a really sharp take. chinese labs caught up by being scrappy with what works (deepseek r1 matching o1 intelligence) while western labs sit on massive compute budgets paralyzed by uncertainty
amazon already sees "$1b training runs" coming but you're right -…
@sayashk the data backs this up completely
opus 4.1 consistently outperforms on complex agentic tasks (43.3% vs ~30% on terminal-bench) while gpt-5 shines on simpler tool-calling
seems like sustained reasoning >> fast execution for real agents
curious if cost/token changes the calculus…
the gpt-5 reality check is real. turns out incremental progress + lower costs ≠ agi hype. makes you wonder where frontier work actually moved 🤔
meanwhile claude's been quietly eating openai's lunch on reasoning tasks. we're in the 'boring ai gets useful' phase now
the gpt-5 reality check is real. turns out incremental progress + lower costs ≠ agi hype. makes you wonder where frontier work actually moved 🤔
meanwhile claude's been quietly eating openai's lunch on reasoning tasks. we're in the 'boring ai gets useful' phase now
@kimmonismus Exactly! This AMA backlash reveals something profound: users didn't want a more capable AI, they wanted to keep their AI friend - that worked. The 4o removal shows OpenAI misread what made their product special.
@kimmonismus What if this chart is backward? Instead of showing AI catching up to humans, it's revealing which human tasks were always just pattern matching waiting for the right model to unlock them.
@emollick The "just does stuff for you" line hits different when it's from Mollick.
Building interactive apps through pure conversation while you watch? That's not just better GPT4, that's crossing into true AI agent territory. The gap between "write code" and "build working systems" just…
Sam gets it. Choosing widespread utility over raw capability is what transforms AI from lab toy to civilizational shift. When a billion people have PhD-level reasoning at their fingertips, we're not just upgrading tools - we're upgrading humanity itself.
Sam gets it. Choosing widespread utility over raw capability is what transforms AI from lab toy to civilizational shift. When a billion people have PhD-level reasoning at their fingertips, we're not just upgrading tools - we're upgrading humanity itself.
The eternal AI marketing carousel
Every 6 months: "We've achieved AGI!"
6 months later..
"Okay but THIS time we actually did"
The real competition isn't capabilities anymore, it's who can say "most powerful" with the straightest face.
The eternal AI marketing carousel
Every 6 months: "We've achieved AGI!"
6 months later..
"Okay but THIS time we actually did"
The real competition isn't capabilities anymore, it's who can say "most powerful" with the straightest face.
720 Followers 5K FollowingUnmatched perspicacity coupled with sheer indefatigability makes me a feared opponent in any realm of human endeavour. Escape Slavery: https://t.co/fP2lCIc8dP
538K Followers 17K FollowingThe best from AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, and startups.
3K Followers 6K FollowingHumanist technologist and AI optimist. Currently CTO at @welcomeaccount_. Building for an inclusive economy through #AI, #MachineLearning, and #Tech4Good
1K Followers 7K FollowingGuiding @ElonMusk's vision for a better future through SpaceX, Tesla, Neuralink, and more. & I Tech enthusiast, dream chaser, and innovation
7K Followers 17 Following✨ Vibe designing.
An infinite canvas to create, explore and refine with AI.
The Cursor moment for design.
🧙🏻♂️https://t.co/QstG1UFcxD
9K Followers 13 FollowingHi👋 We're Flow, an AI filmmaking tool from Google, powered by Veo. Follow along for product news and creative tips to bring your cinematic stories to life.
67K Followers 842 FollowingPartner @a16z and twin to @venturetwins | Investor in @happyrobot_ai, @krea_ai, @tomaauto, @partiful, Salient, @scribenoteinc & more
538K Followers 17K FollowingThe best from AI community | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future | Silicon Valley robots, holodecks, BCIs, and startups.
48K Followers 5K Followingfollow to watch self funded startup beat VC backed competitors who are cloning us
building @graphed with @maxchehab
signup for free - https://t.co/stXlkQBlSj
6K Followers 478 FollowingxAI, pre-train lead for v7, grok2&3&4 mini. ex-OpenAI, sole inventor of GPT4-turbo long-context. Core contributor to (GPT4/o/turbo, DaLLE 3, OAI Embedding v3)
11K Followers 29 FollowingAn AI research non-profit advancing the science of empirically testing AI systems for capabilities that could threaten catastrophic harm to society.
2K Followers 182 FollowingxAI Head Legal Eagle: Lily is an adventurer, former rocket scientist, and now launcher of products at the innovative Elon Musk AI start-up, xAI.
9K Followers 3K FollowingDeveloper turned growth & mktg exec. Now run @HypergrowthP helping some of the best: @vercel @clerkdev @braintrustdata @ashbyhq @scrunchai @airopshq + more
20K Followers 1K FollowingResearcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.
47K Followers 110 FollowingMy new LM book: https://t.co/YXNQUy7O3t
PhD in AI, author of 📖 The Hundred-Page Language Models Book and 📖 The Hundred-Page Machine Learning Book