FutureSearch @FUTURESEARCHAI

Building AI to understand AI. futuresearch.ai Joined February 2024

Tweets

48
Followers

197
Following

47
Likes

115

FutureSearch @FUTURESEARCHAI

a month ago

FutureSearch benchmarks like Deep Research Bench, find Opus 4.1 and Opus 4 the same on average, but clearly different: better at numeric & data tasks (kind of like code?), worse at qualitative reasoning.

Zvi Mowshowitz @TheZvi

a month ago

0 0 24 6K 3

0 2 3 300 0

Dan Schwarz @dschwarz26

2 months ago

@grok 4 is now #1 on Deep Research Bench, narrowly ahead of Claude 4 Opus and Sonnet. And much better than o3. (Gemini is far behind.)

1 4 30 10K 6

Download Image

Dan Schwarz @dschwarz26

2 months ago

So it's July 2025, ChatGPT Agent is out, for use "in your personal life"? Let me remind myself what the first sentence of AI 2027 said:

11 12 194 17K 30

Download Image

Dan Schwarz @dschwarz26

2 months ago

Deep Research Bench live leaderboard is up! Results on new DeepSeek, new Gemini, Claude-4 on web tasks. Link in reply. tl;dr of new results: DeepSeek still bad, Gemini seemingly worse for 2nd release in a row; Claude-4 is king.

1 3 15 734 0

Download Image

FutureSearch @FUTURESEARCHAI

3 months ago

Thanks @EuginaJordan for highlighting our @BuiltIn write-up in your newsletter! If you're thinking about AI hallucinations, read our research-backed take on what we can actually do to fix them builtin.com/articles/ai-ha…

0 0 2 68 0

FutureSearch @FUTURESEARCHAI

3 months ago

Just launched: Deep Research Bench by FutureSearch. 89 tasks. Real-world research. Agents tested: GPT‑4o, Claude 3, Gemini 2.5, DeepSeek. Find out who leads — and where every model still fails: drb.futuresearch.ai #AIbenchmark #WebResearch #LLM

0 0 3 156 1

FutureSearch @FUTURESEARCHAI

3 months ago

AI research tools are rapidly improving. But when accuracy matters, interactive research will remain substantially better than existing deep research tools. This will be true as long as humans are better than AI at error-checking and ideation builtin.com/articles/deep-…

0 0 2 59 0

FutureSearch @FUTURESEARCHAI

3 months ago

Deep Research is a very prominent LLM use case, only backed by handwaving claims. Now, for the first time, we can actually put companies' claims to the test, tell users who is best at what, and rank which models are best at research in general arxiv.org/abs/2506.06287

0 6 10 2K 4

FutureSearch @FUTURESEARCHAI

3 months ago

Hallucination rates in frontier LLMs are up, not down. Our CEO @danschwarz breaks down why + shares mitigation playbooks in @BuiltIn Full article ↓ builtin.com/articles/ai-ha… #LLMs #ChatGPT #GPTo3 #deepseekR1

0 2 3 335 0

FutureSearch @FUTURESEARCHAI

3 months ago

Thank you to @futuristdotai for covering Deep Research Bench! The first of its kind to score LLM agents on web-based research tasks. unite.ai/how-good-are-a…

0 2 13 27K 12

FutureSearch @FUTURESEARCHAI

3 months ago

Great conversation with Ben Lorica on OpenAI revenue projections

Ben Lorica 罗瑞卡 @bigdata

3 months ago

Great conversation with Ben Lorica on OpenAI revenue projections

0 0 2 391 0

0 0 2 127 1

Dan Schwarz @dschwarz26

5 months ago

Also, one interesting trend we noticed: The fastest time for companies to reach $10b revenue, and then $100b revenue, is decreasing at a rate that is entirely consistent with OpenAI reaching their projected milestones! How's that for naive extrapolation?

1 1 5 294 2

Download Image

Dan Schwarz @dschwarz26

5 months ago

OpenAI reported yesterday they forecast $125B revenue in 2029. This is way overoptimistic about ChatGPT, the API, and "monetizing free users". But I think $125B in 2029 is still plausible, based on the AI 2027 scenario. Short thread on where AI revenue is headed: 🧵

1 3 6 726 1

Download Image

FutureSearch @FUTURESEARCHAI

5 months ago

Proud to contribute to AI 2027. Read our takes: futuresearch.ai/ai-2027

Daniel Kokotajlo @DKokotajlo

5 months ago

Proud to contribute to AI 2027. Read our takes: futuresearch.ai/ai-2027

399 1K 5K 2.8M 4K

Download Image

0 0 5 474 1

FutureSearch @FUTURESEARCHAI

5 months ago

FutureSearch gives odds of runaway AI in new AI futurism report prn.to/4ln96F8

0 0 2 127 0

FutureSearch @FUTURESEARCHAI

6 months ago

Ever use a "Deep Research" tool for work? New FutureSearch finding: How these tools —Gemini Deep Research, OpenAI Deep Research, and Perplexity Deep Research— have surprising failures on tasks you might ask them to do: futuresearch.ai/dr-persist-ada…

1 0 1 105 0

Dan Schwarz @dschwarz26

10 months ago

Ever wonder what happened with Google's first prediction market, that ran from 2005 to 2010? The previously unreported story, by yours truly in @asteriskmgzn, including the wild finale. Plus the story of the new prediction market that grew from its ashes that runs there today.

Asterisk @asteriskmgzn

10 months ago

0 4 33 12K 19

6 14 72 8K 14

Dan Schwarz @dschwarz26

12 months ago

Last week, we got a leak on OpenAI subscribers. On some - ChatGPT Plus subs, and API revenue - FutureSearch's numbers from June look prescient. Very surprised how much Enterprise growth slowed. And almost nobody pays for ChatGPT Team!? Our sleuthing: futuresearch.ai/openai-case-st…

0 1 8 436 2

Download Image

Dan Schwarz @dschwarz26

12 months ago

OpenAI says o1 plans better. Does it? We read a bunch of agent traces line-by-line, with 4 good agent archs, on 8 messy web+stats white-collar tasks with detailed partial credit. Result: o1-preview aces some tasks others fail, but it's... moody? futuresearch.ai/llm-agent-eval

1 7 25 5K 6

Download Image

Dan Schwarz @dschwarz26

12 months ago

Much is still unknown about this important AI capability. We at futuresearch.ai are still in the trenches. Let’s not mislead the public about how good AI forecasting is. Full takedown: lesswrong.com/posts/uGkRcHqa… (7/7)