Paradigm Shift AI @ParadigmShiftAI

Teaching AI agents to use computers. Curated HCI datasets + disposable-VM simulator for training & gap-to-human evals. paradigm-shift.ai Sunnyvale, CA Joined March 2025

Tweets

21
Followers

53
Following

208
Likes

111

Anaïs Howland @AnaisHowland18

a day ago

Benchmark hacking is real! An agent can hit “90%” performance on any benchmark by cherry-picking results across runs. In the real world you only get a few tries at a task. Let me show you in action: I ran ~1k tasks x 10 episodes with @browser_use on Gemini 2.5 Flash.…

1 1 6 22K 0

Download Image

Paradigm Shift AI @ParadigmShiftAI

a month ago

Paradigm Shift AI just supercharged web-agent evals 🚀 We revamped our analytics with deeper agent insights, success heatmaps, variance scores, human baselines, full replay & crash logs and more. See where your agent shines or stumbles all in one place. Want access to the…

3 12 45 668K 5

Download Gif

Anaïs Howland @AnaisHowland18

a month ago

Ran @browser_use on @ParadigmShiftAI to pit Claude 4 Sonnet vs Gemini 2.5 Pro on 10x10 WebVoyager vision tasks. Claude: 99 % accuracy & 3× faster ⚡️ Gemini: 75 % accuracy 😬 @GoogleDeepMind why the lag? #AI #VisionAI

0 3 12 2K 2

Download Video

Paradigm Shift AI @ParadigmShiftAI

2 months ago

Track browser-eval progress in real time, episode by episode and right from your dashboard! No more hunting through live logs (unless you still get a kick out of it 😅)

0 2 3 227 0

Download Gif

Paradigm Shift AI @ParadigmShiftAI

2 months ago

More news & insights to share soon 🔥

Anaïs Howland @AnaisHowland18

2 months ago

More news & insights to share soon 🔥

0 2 5 197 0

0 1 4 109 0

Anaïs Howland @AnaisHowland18

2 months ago

Totally agree, great analysis. That’s why @ParadigmShiftAI delivers richer metrics, deeper failure-trace analytics, and a bigger task bank (proprietary + public) to really stress-test web agents

Shayne Longpre @ShayneRedford

2 months ago

Totally agree, great analysis. That’s why @ParadigmShiftAI delivers richer metrics, deeper failure-trace analytics, and a bigger task bank (proprietary + public) to really stress-test web agents

3 21 115 14K 85

Download Image

0 1 3 244 0

Paradigm Shift AI @ParadigmShiftAI

2 months ago

Thrilled to announce we've been accepted into the @UofBeta Pre-Acceleration Program Cohort 10! Looking forward to connecting, learning, and growing alongside other incredible founders.

0 2 3 112 0

Paradigm Shift AI @ParadigmShiftAI

3 months ago

Introducing NeuroSim, our browser agent evaluation platform! Run real-world evaluations for browser agents + models, see gap-to-human scores, share team leaderboards—free while we iterate with you. Read more 👉 paradigm-shift.ai/blog/neurosim-… DM or email [email protected] for…

1 6 33 48K 57

Download Gif

Paradigm Shift AI @ParadigmShiftAI

3 months ago

o3 just got 80% cheaper (thanks @OpenAI), so we added it. NeuroSim supports o3, run your browser-use agent evals on Paradigm Shift AI and see how they stack up!

Sam Altman @sama

3 months ago

o3 just got 80% cheaper (thanks @OpenAI), so we added it. NeuroSim supports o3, run your browser-use agent evals on Paradigm Shift AI and see how they stack up! https://t.co/rH8mYZAJ8R

2K 1K 23K 3.5M 2K

0 0 4 1K 0

Download Image

Paradigm Shift AI @ParadigmShiftAI

3 months ago

🚀 Agent Hub v1 is live! The “App Store” for AI agents. Built an agent? Publish one Agent Card today: ✅ appear in a public directory ✅ give devs a ready endpoint + JSON spec ✅ push updates with version tags Read more → paradigm-shift.ai/blog/agent-hub… #AIagents #GenerativeAI

0 1 16 43K 26

Download Image

Anaïs Howland @AnaisHowland18

3 months ago

Attending the AI Engineer World’s Fair in SF this week! Excited for the packed lineup of speakers. Let me know if you’re around and want to connect! #AIEWF #AIEngineer

0 1 4 151 0

Download Image

Paradigm Shift AI @ParadigmShiftAI

3 months ago

Our website just got a facelift, check it out! 👀 paradigm-shift.ai #AgentEval #AI

0 0 3 57 0

Download Image

Paradigm Shift AI @ParadigmShiftAI

4 months ago

Calling agent builders: We're launching a browser-agent eval platform — and looking for beta testers. ✅ Run your agent on real tasks ✅ Get logs, traces, failure points ✅ See where it breaks (and why) ✅ Free during beta — just give us feedback Training support coming soon.…

0 1 6 10K 0

Download Gif

Paradigm Shift AI @ParadigmShiftAI

4 months ago

Blog drop: Paradigm Shift AI captures screen + mouse + app data to train & eval desktop agents. Grab 30 free tasks and peek at our upcoming VM sim 👀 Read here → bit.ly/45iRcxt

0 1 4 65 0

Anaïs Howland @AnaisHowland18

4 months ago

First conference as a founder— @DataCouncilAI set the bar high. 3 days of sharp insights, inspiring speakers, and conversations with fellow builders. Grateful for the chance to learn and connect! Check out the talks on YouTube below! #DataCouncil25

Data Council @DataCouncilAI

4 months ago

0 3 6 605 0

Download Image

0 1 3 80 0

Paradigm Shift AI @ParadigmShiftAI

5 months ago

We’re live 🚀 Paradigm Shift AI is building the data foundation for AI agents. We capture real human-computer interactions — screen recordings, mouse/keyboard inputs, app flows — so AI models can learn how people actually work. Need custom task data? We’ve got you. 🔗…