Namma Bengaluru has the best talent and the best weather but the worst infrastructure - if we fix garbage debris and roads, we can be among the best cities in the world. GBA has a great opportunity to do this. Let’s use collective will to do this @DKShivakumar@BBMPCOMM
Einstein wasted the second half of his life on a fruitless quest. In the second half of his life, von Neumann invented game theory, computer architecture, implosion nuclear weapons, cellular automata and weather prediction, among other things.
This is my lecture from 2 months ago at @Cornell
“How do I increase my output?” One natural answer is "I will just work a few more hours." Working longer can help, but eventually you hit a physical limit.
A better question is, “How do I increase my output without increasing…
Dopamine from information gathering is a dangerous drug. Your entire life will change the moment you stop looking for more information and start acting on the information you already have. Always get your dopamine from action.
This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracles.
===
Llama 4 Maverick was 400B(17B active) and >30T tokens, TPP = 1764
Llama 4 Behemoth was 2T(288B active) and > 30T tokens, TPP = 104
DeepSeek v3 is…
This explains why LLaMA 4 failed. The tokens per parameter (TPP) is way off. You can’t defy scaling laws and expect miracles.
===
Llama 4 Maverick was 400B(17B active) and >30T tokens, TPP = 1764
Llama 4 Behemoth was 2T(288B active) and > 30T tokens, TPP = 104
DeepSeek v3 is…
We got a call from @xai 24 hours ago
“We want to test Grok 4 on ARC-AGI”
We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI
Here’s the testing story and what the results mean:
Yesterday, we chatted with Jimmy from the…
We got a call from @xai 24 hours ago
“We want to test Grok 4 on ARC-AGI”
We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI
Here’s the testing story and what the results mean:
Yesterday, we chatted with Jimmy from the…
Just opened a PR yesterday that will reduce the binary size PyTorch by 40% by adding 1 flag to NVCC
With ~50M monthly of downloads of Pytorch, this one change will reduce global internet traffic by ~20PB. High impact changes like this is why I love OSS.
github.com/pytorch/pytorc…
The best open-source reasoning model will be dropped next Thursday if everything goes well.
OpenAI hasn't open-sourced an LLM since GPT-2 in 2019, so I'm excited.
We’re hosting it on Hyperbolic. Buckle up.
🔍 SEAL and Red Team at @scale_AI present a position paper outlining what we’ve learned from red teaming LLMs so far—what matters, what’s missing, and how model safety fits into broader system safety and monitoring.
🔗 scale.com/research/red_t…
📝 scale.com/blog/rethink-r…
🔍 SEAL and Red Team at @scale_AI present a position paper outlining what we’ve learned from red teaming LLMs so far—what matters, what’s missing, and how model safety fits into broader system safety and monitoring.
🔗 scale.com/research/red_t…
📝 scale.com/blog/rethink-r…
there’s a palpable tension in the air as hundreds of AI researchers (including me!) quietly work nights and weekends trying to figure out the “right way” to scale RL
math & code are not the universe
we will not rest until post-training is as clean and elegant as pre-training
my favorite version of the finetuning argument is Smolin’s theory that universes “reproduce” via black holes, and the conditions that are optimal for black hole production also happen to be near optimal for creating life
unclear whether true but it’s a fun idea
my favorite version of the finetuning argument is Smolin’s theory that universes “reproduce” via black holes, and the conditions that are optimal for black hole production also happen to be near optimal for creating life
unclear whether true but it’s a fun idea https://t.co/2Im5LV12JW
our universe is pretty rare in configuration space - if strong nuclear force were:
* 1% weaker - stars wouldnt make much carbon preventing carbon based life & less heavy element production would delay planet formation & take longer for evolution to occur before stars die
* 1%…
What if I told you that Jane Street made ₹36,500 crores from Indian markets in just 2 years, and ₹4,800 crores of that was allegedly through market manipulation? They turned India's stock market into their personal ATM using a strategy so clever. Here's the complete details 🧵
Inference-time procedures (e.g. Best-of-N, CoT) have been instrumental to recent development of LLMs. The standard RLHF framework focuses only on improving the trained model. This creates a train/inference mismatch.
Can we align our model to better suit a given inference-time…
I got a cease and desist from DocuSign for my free SaaS.
A couple of months ago, I saw a tweet from @awilkinson: “I just found out how much we pay for DocuSign and my jaw dropped. What's the best alternative?”
Me being naive, I thought “how hard could would it actually be to…
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision
The long-term goal of AI is to build models that can handle arbitrary tasks, not just ones they’ve been trained on. We hope our new *benchmark generator* can help measure progress toward this vision https://t.co/hFwiQZdlk6
India 10 year bond yield at 6.20% pa. US 4.60%. Gap of 1.60% is probably lowest I recollect. Will we 1 day see Indian yields lower than the US? Depends mainly on relative inflation, risk premium, trust, and liquidity, for global and domestic investors in these 2 countries!
255 Followers 3K FollowingUnmatched perspicacity coupled with sheer indefatigability makes me a feared opponent in any realm of human endeavour. Escape Slavery:
15K Followers 1K FollowingCo-founder and CEO @Hyperbolic_Labs. ex-@avax & ex-@citsecurities. Finished Math PhD in 2yrs @UCBerkeley. Math Olympiad Gold Medalist. Highest honor @PKU1898
6K Followers 365 FollowingSafety and alignment at Meta Superintelligence. Prev: VP of Research at Scale AI, research at Google DeepMind / Brain (Gemini, LaMDA, RL / TFAgents, AlphaChip).
13K Followers 689 FollowingResearch @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.