Live GPU-price & inference latency intel. Tracking the #InferenceWars so your LLMs run faster + cheaper. Friday brief.InferenceWars.comJoined June 2025
1/
🚨 Live bench results (05 Sep, 17:11 GMT+1) via InferenceLatency.com
🏁 Fastest latency: Google Gemini — 384 ms
⚡ Fastest now: Together AI — 281 ms (very fast)
📈 Top throughput: Together AI — 95.6 tok/s
💸 Best $/1k tok: Together AI — $0.0002
✅ Uptime today: 7/7…
🚀 INFERENCE WARS - Week #10 Intel Drop
GPT-5 just rewrote the rulebook:
🔹 Nano tier ($0.05/$0.40) = new speed/cost baseline
🔹 Mini tier ($0.25/$2.00) edges out Claude Sonnet 4
🔹 Main tier ($1.25/$10.00) for proof-critical inference
- Groq still owns latency at ~1,600 tok/s…
🚀 INFERENCE WAR REPORT - WEEK #8 HIGHLIGHTS:
⚡ Groq hits ~1,600 tok/s with SpecDec on Llama-3.3-70B - new speed king for open models
🎯 OpenAI confirms GPT-5 family: 5, 5-mini, 5-nano tiers now official
🌍 EU routing revolution: Groq's Helsinki POP changes the latency game…
OpenAI just dropped GPT‑OSS - its first open‑weight Mixture‑of‑Experts model (20B and 120B variants).
Built specifically for agentic reasoning workflows and developer flexibility.
What makes this moment significant? GPT‑OSS is shipped through Hugging Face’s Inference Providers,…
214 Followers 339 Following"Own the Narrative"
https://t.co/gDi5QU9O0j - we curate a premium, industry-defining domain portfolio of AI terms. #OwntheNarrative
344K Followers 8K FollowingCrypto, NFT & Coffee ☕ | Advisor & Marketing Manager | CEO @bullish4gency | I can help to grow your project | Active on #BNB #BTC #SOL #ETH #BASE #SUI
133K Followers 18K FollowingCrypto Influencer. Promoting the Future of Finance: Showcasing Innovative Crypto Projects ⭐️. Best In Business 📁.Dm Me For Collab📨 #BTC #ETH #BNB #USDT #ALTC
1.3M Followers 1K FollowingCo-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs
1.2M Followers 279 FollowingWe’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.
712K Followers 288 FollowingTogether with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.
949K Followers 764 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.
1.4M Followers 1K FollowingBuilding @EurekaLabsAI. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets.
214 Followers 339 Following"Own the Narrative"
https://t.co/gDi5QU9O0j - we curate a premium, industry-defining domain portfolio of AI terms. #OwntheNarrative
344K Followers 8K FollowingCrypto, NFT & Coffee ☕ | Advisor & Marketing Manager | CEO @bullish4gency | I can help to grow your project | Active on #BNB #BTC #SOL #ETH #BASE #SUI
205K Followers 5K FollowingVC at @MenloVentures. Formerly founding team @glean, @Google Search. @Cornell CS. Tweets about tech, immigration, India, fitness and search.
8K Followers 436 FollowingThe hottest infrastructure for AI. We run the world's fastest GPUs. AI workloads run the treasury, closed loop, open chain. $COM
1K Followers 101 FollowingAI Lab dedicated to pushing the boundaries of model development and distributed inference ||| $DPHN : 0xe2B76f98fB5dC10A9Aed0F4A4a854De325740019
303K Followers 138 FollowingReal-time Prompt Verification | Crowd-sourced Agent Auditing | All About Agent Security | Proof of Prompt
Learn more at the link 👇
#ReadyForPOP
83K Followers 631 FollowingLow-cost, high performance inference platform, powered by the Groq LPU. Delivering instant access to leading AI models with GroqCloud™.
24K Followers 187 FollowingCEO & Founder @ Groq®, the Most Popular Fast Inference API | Creator of the TPU and LPU, Two of the Most Important AI Chips | Doubling 🌍's AI Compute by 2027