Teaching AI agents to use computers. Curated HCI datasets + disposable-VM simulator for training & gap-to-human evals.paradigm-shift.ai Sunnyvale, CAJoined March 2025
Benchmark hacking is real! An agent can hit “90%” performance on any benchmark by cherry-picking results across runs. In the real world you only get a few tries at a task.
Let me show you in action: I ran ~1k tasks x 10 episodes with @browser_use on Gemini 2.5 Flash.…
Paradigm Shift AI just supercharged web-agent evals 🚀
We revamped our analytics with deeper agent insights, success heatmaps, variance scores, human baselines, full replay & crash logs and more. See where your agent shines or stumbles all in one place.
Want access to the…
Track browser-eval progress in real time, episode by episode and right from your dashboard! No more hunting through live logs (unless you still get a kick out of it 😅)
Totally agree, great analysis. That’s why @ParadigmShiftAI delivers richer metrics, deeper failure-trace analytics, and a bigger task bank (proprietary + public) to really stress-test web agents
Totally agree, great analysis. That’s why @ParadigmShiftAI delivers richer metrics, deeper failure-trace analytics, and a bigger task bank (proprietary + public) to really stress-test web agents
Thrilled to announce we've been accepted into the @UofBeta Pre-Acceleration Program Cohort 10! Looking forward to connecting, learning, and growing alongside other incredible founders.
Introducing NeuroSim, our browser agent evaluation platform!
Run real-world evaluations for browser agents + models, see gap-to-human scores, share team leaderboards—free while we iterate with you.
Read more 👉 paradigm-shift.ai/blog/neurosim-…
DM or email [email protected] for…
o3 just got 80% cheaper (thanks @OpenAI), so we added it. NeuroSim supports o3, run your browser-use agent evals on Paradigm Shift AI and see how they stack up!
o3 just got 80% cheaper (thanks @OpenAI), so we added it. NeuroSim supports o3, run your browser-use agent evals on Paradigm Shift AI and see how they stack up! https://t.co/rH8mYZAJ8R
🚀 Agent Hub v1 is live! The “App Store” for AI agents.
Built an agent? Publish one Agent Card today:
✅ appear in a public directory
✅ give devs a ready endpoint + JSON spec
✅ push updates with version tags
Read more → paradigm-shift.ai/blog/agent-hub…#AIagents#GenerativeAI
Attending the AI Engineer World’s Fair in SF this week! Excited for the packed lineup of speakers. Let me know if you’re around and want to connect! #AIEWF#AIEngineer
Calling agent builders:
We're launching a browser-agent eval platform — and looking for beta testers.
✅ Run your agent on real tasks
✅ Get logs, traces, failure points
✅ See where it breaks (and why)
✅ Free during beta — just give us feedback
Training support coming soon.…
Blog drop: Paradigm Shift AI captures screen + mouse + app data to train & eval desktop agents. Grab 30 free tasks and peek at our upcoming VM sim 👀 Read here → bit.ly/45iRcxt
First conference as a founder— @DataCouncilAI set the bar high. 3 days of sharp insights, inspiring speakers, and conversations with fellow builders. Grateful for the chance to learn and connect! Check out the talks on YouTube below! #DataCouncil25
First conference as a founder— @DataCouncilAI set the bar high. 3 days of sharp insights, inspiring speakers, and conversations with fellow builders. Grateful for the chance to learn and connect! Check out the talks on YouTube below! #DataCouncil25
We’re live 🚀 Paradigm Shift AI is building the data foundation for AI agents.
We capture real human-computer interactions — screen recordings, mouse/keyboard inputs, app flows — so AI models can learn how people actually work.
Need custom task data? We’ve got you.
🔗…
410 Followers 2K FollowingSr Data Scientist @Microsoft. Part-Time Lecturer @UW_iSchool. MSc in Data Science & BS in Informatics @UW. Tweets are my own and not the views of my employers.
221 Followers 4K FollowingBe patient, not a bankster—just a Turd Trader. Canaries in the coal miner | options are mine alone | don’t blame me if the old computer goes up in flames.
447 Followers 6K FollowingGiving meaning to mine share of star dust. Visiting fellow @WinshipAtEmory. Prev at @oracle, @maddox_ai, @KITKarlsruhe, @_nference, @val_iisc, @iitdelhi.
1K Followers 6K FollowingCEO of Mountain Liberty Copywriting | Crafting bold content to grow your brand globally | Crypto-savvy ✍️ #Bitcoin #Copywriting. CEO of CryptoWealth Solutions
5K Followers 542 FollowingWeb Design, software, school, programs, website development, classes, web design for dummies, web design software reviews, how to web design, website designer
364K Followers 8 FollowingVercel provides the developer tools and cloud infrastructure to build, scale, and secure a faster, more personalized web. Creators of @nextjs, @v0, and @aisdk.
139K Followers 36 FollowingVancouver, Dec 10-15, 24. https://t.co/ga8aOw6yUO Tweets to this account are not monitored. Please send feedback to [email protected].
6K Followers 245 FollowingHead of Recruiting @OpenAI. Former Head of Preparedness @OpenAI. AI research and engineering at LinkedIn, Meta and Microsoft before that.
2K Followers 1 FollowingA new special regulation district in central SV that will have the most accelerated code in the US for robotics and physical innovation. Pending admin approval.
775K Followers 4 FollowingA platform for illuminating academic papers. We annotate and share a paper every week. Save, annotate and share papers with anyone: https://t.co/0o2Pls3jmo
451K Followers 77 FollowingTensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundation
22K Followers 9 FollowingYour new async coding agent by @GoogleLabs. Built for devs, open to feedback, evolving with you. Dive in → https://t.co/iIzFEMmWgv
16K Followers 495 FollowingHarvard Professor.
Full stack ML and AI.
Co-director of the Kempner Institute for the Study of Artificial and Natural Intelligence.