The Lore of Kalomaze! ⚡️
bringing a great pod with @kalomaze (20yo ml researcher, prime intellect) - we'd talked about training, finetuning, RL (environments and recipes), scaling, working at PI and a Lot of Lores!
(link in replies)
Mom, can we have trainable attention sinks at home ?
Mom: Combining multiple tricks from the TL during the last week we can have efficient attention sinks without taking the trouble to write custom bwd kernels
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from.
In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit…
In era of pretraining, what mattered was internet text. You'd primarily want a large, diverse, high quality collection of internet documents to learn from.
In era of supervised finetuning, it was conversations. Contract workers are hired to create answers for questions, a bit…
Introducing the Environments Hub
RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down
We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI
TIL you can use PyTorch's built-in varlen attention directly. Backward works too
gist.github.com/gau-nernst/1d8…
Say goodbye to flash-attn😂. Thanks @main_horse for pointing this out to me.
i'll confess i do have a very specific mission in mind with this project. the semi-vague private beta rollout is part of it. the set of tasks we're sourcing is part of it. the GPU bounties are part of it. the shitposts are part of it. the podcasts are part of it. mindshare is…
I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, @jxbz, and @Jianlin_S on Muon, Orthogonal Muon, & Stiefel Muon.
---
The general solution turned out to be much simpler than I thought. And it should…
I've finally solved steepest descent on Finsler-structured (matrix) manifolds more generally. This generalizes work by me, @jxbz, and @Jianlin_S on Muon, Orthogonal Muon, & Stiefel Muon.
---
The general solution turned out to be much simpler than I thought. And it should… https://t.co/NWwzMzmcHH
1K Followers 32 FollowingOfficial clips account of Professor Jiang's "Predictive History"
Analyzing history to connect the past, explain the present, and predict the future.
103K Followers 3K FollowingGaulliste/souveraineté nationale et populaire.
Cofondateur du Mouvement Politique Citoyen.
Rejoignez-nous, Adhérer ⤵️, Aidez-nous à sauver la France !
2.0M Followers 619 FollowingProfessional rocket orientation specialist, explainer of flamey stuff and rocket chaser. Bringing space down to Earth for everyday people 🚀
10K Followers 6K Followinghiring agentic humans @hud_evals / https://t.co/OZbFIovysh | owned @AIHubCentral (1 million users, acq.) climate protester. don't do the deferred life plan
265 Followers 134 FollowingAfter my time at Center for Cognitive Science (Freiburg) and Berlin School of Mind & Brain, I founded DenkWerkstatt Berlin and work as a freelance philosopher.
417 Followers 176 FollowingPhD in ML, now AI Research Lead in 🇱🇺. Here mostly AI, including sharing paper reviews. Chess, philosophy, and a travel pic may appear. Opinions are my own.
263 Followers 414 Followingwuahhhhh
Post-training @ https://t.co/jQT9G3hHUc, See what i've cooked on my HF @ https://t.co/QZABvVi2P0
AI/ML/LLMs, Creative Writing, pro himejoshi
102K Followers 921 FollowingTechnology's daily show. Hosted by @johncoogan and @jordihays. Streaming live 11AM-2PM PT every weekday and available on Apple, Spotify, and YouTube.
19K Followers 2K FollowingI’m a software engineer at one of Canada’s top tech companies with really good TC. Internet veteran. (((Club Member))). Respectable account. 🇺🇸🇮🇱
20K Followers 1K FollowingResearcher @MSFTResearch, AI Frontiers Lab; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily.