Mirage or method? We re-assess a series of RL observations such as spurious reward, one-shot RL, test-time RL, and negative-sample training.
🧐These approaches were all proved on Qwen+Math combination originally, but do they work in other settings? If not, under which…
Feels like everyone is slowly admitting that there's no moat in foundational models, and the only way to build a business out of AI is to build products.
Feels like everyone is slowly admitting that there's no moat in foundational models, and the only way to build a business out of AI is to build products.
I've always been skeptical about PRMs, but being able to apply RL+reasoning changes the entire story for me. It was a fun ride with @weixiong_1, who has been teaching me a unified view to think about all RL methods. He'll be on the job market! It'd be so lucky to work with him.
I've always been skeptical about PRMs, but being able to apply RL+reasoning changes the entire story for me. It was a fun ride with @weixiong_1, who has been teaching me a unified view to think about all RL methods. He'll be on the job market! It'd be so lucky to work with him.
🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence.
That is, can AI truly predict the future by connecting today’s dots?
👉 What makes it special?
- It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen…
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
🚀New dataset release: WildChat-4.8M
4.8M real user-ChatGPT conversations collected from our public chatbots:
- 122K from reasoning models (o1-preview, o1-mini): represent real uses in the wild and very costly to collect
- 2.5M from GPT-4o
🔗 hf.co/datasets/allen… (1/4)
🚀New dataset release: WildChat-4.8M
4.8M real user-ChatGPT conversations collected from our public chatbots:
- 122K from reasoning models (o1-preview, o1-mini): represent real uses in the wild and very costly to collect
- 2.5M from GPT-4o
🔗 hf.co/datasets/allen… (1/4)
i've been thinking lately about how future ai systems will interact with us and how we can make systems that care about people and wanted to put words to it -- hopefully it resonates a bit!
i've been thinking lately about how future ai systems will interact with us and how we can make systems that care about people and wanted to put words to it -- hopefully it resonates a bit!
I'll be around the ICML venue this afternoon. Message me if you want to meet! These days, I think about reasoning and RL. Also happy to talk about academia vs. industry (I think the lack of compute in academia is a feature not a bug), faculty and PhD student recruiting at UMass.
AI Research Agents are becoming proficient at machine learning tasks, but how can we help them search the space of candidate solutions and codebases? Read our new paper looking at MLE-Bench: arxiv.org/pdf/2507.02554#LLM#Agents#MLEBench
📢 today's scaling laws often don't work for predicting downstream task performance. For some pretraining setups, smooth and predictable scaling is the exception, not the rule.
a quick read about scaling law fails:
📜arxiv.org/abs/2507.00885
🧵1/5👇
Do language models have algorithmic creativity?
To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!🧵⬇️
We don’t have AI self-improves yet, and when we do it will be a game-changer. With more wisdom now compared to the GPT-4 days, it's obvious that it will not be a “fast takeoff”, but rather extremely gradual across many years, probably a decade.
The first thing to know is that…
✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead.
🧵👇
Blog: novasky-ai.notion.site/skyrl-v01
Code: github.com/NovaSky-AI/Sky…
735 Followers 790 FollowingResearch intern @nvidia; Ph.D. student at @Mila_Quebec. Interested in deep generative model, drug discovery and protein science.
108 Followers 3K FollowingI swear to God, I swear by the One who made oaths permissible, I have never been a beggar , The war has exhausted us, tired us out and
WhatsApp
009702233681
44 Followers 1K FollowingA data set, data annotation sales, selling high-quality annotation solutions similar to AI for science/autonomous driving/lean4 data topics。
315 Followers 925 Following* Interests: Comp Neuro, ML, and StatPhys * Postdoc in Xiao-Jing Wang's lab, CNS, NYU * Ph.D. in StatPhys from ENS * BS & MS in Phys from Sapienza
124 Followers 1K FollowingOfficial journal of China Society of Image and Graphics (CSIG). The jouarnl is published by Springer, sponsored by CSIG. E-ISSN 2731-9008.
2K Followers 297 FollowingAssociate Professor, Computer Science and Engineering, University of Michigan; researcher in natural language processing; directs @launchnlp.
125 Followers 284 FollowingSoftware engineer 👨💻 | IIIT Hyderabad CSE | JEE AIR 379 | Talks about technology and maybe news sometimes | All opinions are of my own |
1K Followers 6K Followingscaling speech native LLMs @rimelabs
the future is willed into existence.
bioML, discovering new science, housing, industrial policy, local politics.
342 Followers 297 FollowingPh.D. at @nyuniversity. Visiting researcher at @AIatMeta. Previous Intern @cohere, MCDS @LTIatCMU. Working on ML/NLP. Painting lover🎨.
13K Followers 689 FollowingResearch @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.
309 Followers 591 FollowingPhD student @jhuclsp, research scientist intern @AIatMeta FAIR. I work on data curation for language models. Previously @nyuniversity.
24K Followers 706 FollowingMember of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.
354K Followers 1K FollowingML/AI researcher & former stats professor turned LLM research engineer. Author of "Build a Large Language Model From Scratch" (https://t.co/O8LAAMRzzW).
4.3M Followers 3 FollowingOpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6Lg202
2K Followers 935 FollowingPh.D. student @LTIatCMU and intern at @AIatMeta (FAIR) working on (V)LM Evaluation & Systems that SeIf-Improve | Prev: @kaist_ai @yonsei_u