Wow, compositional generalization across depth! This beautifully resolves whether RL lets the model acquire new skills. At least, learning to compose atomic skills itself is a new skill.
Wow, compositional generalization across depth! This beautifully resolves whether RL lets the model acquire new skills. At least, learning to compose atomic skills itself is a new skill.
Recently I have tried to implement H-Net. I think this is the most promising approach as a tokenizer-free method. (Though the length is not fixed during training, which is annoying.) And reducing the vocabulary size below the model dimension itself has a desirable property.
Recently I have tried to implement H-Net. I think this is the most promising approach as a tokenizer-free method. (Though the length is not fixed during training, which is annoying.) And reducing the vocabulary size below the model dimension itself has a desirable property.
Though different from the eval flow, I find this interview with MiniMax's CEO interesting (news.qq.com/rain/a/2025011…).
Most companies in China are still using the methods for building recommendation systems to create large model products.
With a content product, you can't know…
In B.1, the authors estimated the scaling law of N and D, and based on that, suggested that Adam would be better as N is larger. But what about D? And how will it change with MoE? It would be worthwhile to estimate. (Though estimating the scaling law of N, D could be delicate,…
In B.1, the authors estimated the scaling law of N and D, and based on that, suggested that Adam would be better as N is larger. But what about D? And how will it change with MoE? It would be worthwhile to estimate. (Though estimating the scaling law of N, D could be delicate,…
(1/n) Check out our new paper: "Fantastic Pretraining Optimizers and Where to Find Them"! >4000 models to find the fastest optimizer! 2× speedups over AdamW? Unlikely. Beware under-tuned baseline or limited scale! E.g. Muon: ~40% speedups <0.5B & only 10% at 1.2B (8× Chinchilla)!
134 Followers 680 FollowingThe CFD enthusiast has become a ML researcher. Senior Researcher at LG CNS AI Lab. Opinions are solely my own and do not express the opinions of my employer.
47 Followers 73 FollowingResearch Scientist at ByteDance Seed, specializing in research on Large Language Model, AI for Science, and Natural Language Processing
7K Followers 103 FollowingResearch scientist at @openai working on AI agents and Deep Research. Co-creator of ChatGPT agent. Ex-@Stanford CS PhD. My words do not represent my employer's.
393 Followers 373 FollowingToward the next level of visual content creation. Ex @runwayml as a Foundational Contributor of Gen-3 alpha, Frames, and Gen-4. Opinions are my own.
2K Followers 11 FollowingDatologyAI builds tools to automatically select and optimize the best data on which to train AI models, leading to better, smaller models which train faster.
4K Followers 20 FollowingAt Essential AI, we're building an open platform to democratize frontier AI capabilities and accelerate breakthroughs globally through collaborative science.
6K Followers 1K FollowingResearch scientist at @GoogleDeepMind, working on generative models, deep learning, RL. PhD from @stanford. Gemini Diffusion lead.
7K Followers 652 FollowingResearch Scientist @AIatMeta
Previously Researcher @ Samsung AI
Outstanding Paper Award @icmlconf 2023
Action Editor @TmlrOrg
I tweet about ML papers and math