Introducing ❄️ @snowglobe_so, the simulation engine for AI chatbots.
Magically simulate the behavior of your users to test and improve your chatbots.
Find failures before your users do.
Announcing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.
Built in…
Text diffusion models might be the most unintuitive architecture around
Like: let's start randomly filling in words in a paragraph and iterate enough times to get something sensible
But now that google's gemini diffusion is near sota, I think we need to take them seriously
The Nvidia Tensor Core is the most important evolution of computer architecture in the last decade
We explain why / how it's evolved
Shout out to collaborators @bfspector@tri_dao@colfaxintl@charles_irl@ia_buck Neil Movva Jonah Alben
esp @simonguozirui for the cutest cover pic
The Nvidia Tensor Core is the most important evolution of computer architecture in the last decade
We explain why / how it's evolved
Shout out to collaborators @bfspector@tri_dao@colfaxintl@charles_irl@ia_buck Neil Movva Jonah Alben
esp @simonguozirui for the cutest cover pic
This looks super cool. Our own research team was exploring similar ideas for building an internal corpus of context for our content generation tasks. Now we just got a huge head start on it!
This looks super cool. Our own research team was exploring similar ideas for building an internal corpus of context for our content generation tasks. Now we just got a huge head start on it!
Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top 1%)🏅
New multi-turn training objective + user simulator👇
Excited to introduce #CollabLLM -- a method to train LLMs to collaborate better w/ humans! Selected as #icml2025 oral (top 1%)🏅
New multi-turn training objective + user simulator👇
An advantage of training a cache/prefix (as opposed to a lora adapter), is that we can serve per-user cartridges using the same optimizations and kernels, which inference engines already use for per-user kv caches.
@GeoffreyAngus just integrated cartridges into Tokasaurus (a…
An advantage of training a cache/prefix (as opposed to a lora adapter), is that we can serve per-user cartridges using the same optimizations and kernels, which inference engines already use for per-user kv caches.
@GeoffreyAngus just integrated cartridges into Tokasaurus (a… https://t.co/ZYSRsACRH9
.@togethercompute API has the fastest DeepSeek v3 endpoint (2x faster than next best API endpoint) and almost 5x faster than DeepSeek API. See how to use it directly with @cline to make all your Cline workflows snappier!
.@togethercompute API has the fastest DeepSeek v3 endpoint (2x faster than next best API endpoint) and almost 5x faster than DeepSeek API. See how to use it directly with @cline to make all your Cline workflows snappier!
When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size.
What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x…
984 Followers 4K Following@GalvanizeLLC investor and positive-sum optimist exploring how capital, climate, and geopolitics are converging. NYT Bestseller: Cheaper, Faster, Better
6K Followers 218 FollowingIncoming assistant professor at UCSD CSE in MLSys. Currently recruiting students! Also running the kernels team @togethercompute.
224 Followers 306 FollowingAI Researcher at Together AI @togethercompute | alumni of @UMich @CMUEngineering and Xi'an Jiao-Tong University, China
Opinions are my own.
5K Followers 8K Followinggeek, entrepreneur, 'I strictly color outside the lines!', opinions r my own indeed. @ayirpelle , universal handle at this time
636K Followers 35 FollowingWe're an AI safety and research company that builds reliable, interpretable, and steerable AI systems. Talk to our AI assistant @claudeai on https://t.co/FhDI3KQh0n.
5K Followers 277 FollowingMember of Technical Staff at Anthropic
Co-founder at @CobaltRobotics
Co-founder at Posmetrics (acquired)
GoogleX, @SpaceX, @Harvard EE '15, Forbes 30u30 '18
37K Followers 1K FollowingCo-creator of GitHub Copilot, Dropbox Paper, AI Tinkerers, Hackpad, MobileCoin, Minion AI, etc. Working on @PerplexityComet. Survivor 🎗️
108K Followers 1 FollowingClaude is an AI assistant built by @anthropicai to be safe, accurate, and secure. Talk to Claude on https://t.co/ZhTwG8dz3D or download the app.
224 Followers 306 FollowingAI Researcher at Together AI @togethercompute | alumni of @UMich @CMUEngineering and Xi'an Jiao-Tong University, China
Opinions are my own.
5K Followers 8K Followinggeek, entrepreneur, 'I strictly color outside the lines!', opinions r my own indeed. @ayirpelle , universal handle at this time
831 Followers 2K FollowingCS PhD student @illinoisCDS. Research intern at AWS AI Labs @AmazonScience. Towards building advanced code LLMs with better reasoning and planning.
83K Followers 8K FollowingCompiling in real-time, the race towards AGI.
🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l
2K Followers 1K Followingsystems that are nervous
postdoc @Stanford w/ @HazyResearch & @scott_linderman.
prev: neuro phd @cu_neurotheory, post training @DbrxMosaicAI
8K Followers 4 Followingai-powered task management system to regain control over @cursor_ai @lovable_dev @cline @windsurf_ai & others. part of the @usehamster family
56K Followers 853 FollowingFiguring out AI @allen_ai, open models, RLHF, fine-tuning, etc
Contact via email.
Writes @interconnectsai
Wrote The RLHF Book
Mountain runner
880K Followers 2K FollowingProfessor of Political Science, Director of Freeman Spogli Institute & Hoover Senior Fellow all at Stanford University. U.S. Ambassador to Russia, 2012-2014.
No recent Favorites. New Favorites will appear here.