People think about pretraining runs as single long monolithic loss curves but they're not like that even in the centralised case. You run stuff, move it through different data stages, go back and fork off a checkpoint, change some norm somewhere etc. etc.
People think about pretraining runs as single long monolithic loss curves but they're not like that even in the centralised case. You run stuff, move it through different data stages, go back and fork off a checkpoint, change some norm somewhere etc. etc.
Obsessing over the SWE-bench chart is one of the most mid-curve things I've ever seen. Take a second and absorb what's actually been achieved here. Understandable to not like OpenAI since they're destroying your whole identity but don't pretend its the chart that's the issue.
pretraining is an elegant science, done by mathematicians who sit in cold rooms writing optimization theory on blackboards, engineers with total absorb of distributed systems of titanic scale
posttraining is hair raising cowboy research where people drinking a lot of diet coke…
Missing the point that it'll be a system who's behaviour is controlled by a few people, in companies that don't have the best track record when it comes to this kind of thing. It's fundamentally a level of power that's never existed before it's that simple.
Missing the point that it'll be a system who's behaviour is controlled by a few people, in companies that don't have the best track record when it comes to this kind of thing. It's fundamentally a level of power that's never existed before it's that simple.
Thats a wrap for ICML2025. Incredible to watch the space go from "What are you talking about" to "That's impossible" to "Hmmm thats very interesting" in just over a year. @tha_ajanthan@hmdolatabadi
Jack Dorsey says AI must be permissionless because constraint kills innovation.
Five CEOs shouldn't dictate what brings humanity forward.
Open source is the answer.
To protect ourselves, we have to race ahead. Eliminating single points of failure before they become…
Totally agree - Flower labs another group actively publishing great stuff and now squarely focused on decentralised training. Should be a major datapoint for everyone still skeptical of the area - flower team is as legitimate as it gets and Nic Lane is pretty much top of the…
Totally agree - Flower labs another group actively publishing great stuff and now squarely focused on decentralised training. Should be a major datapoint for everyone still skeptical of the area - flower team is as legitimate as it gets and Nic Lane is pretty much top of the…
From my experience, getting a paper on decentralized DL accepted to top-level conferences can be quite tough. The motivation is not familiar to many reviewers, and standard experiment settings don't account for the problems you aim to solve.
Hence, I'm very excited to see…
From my experience, getting a paper on decentralized DL accepted to top-level conferences can be quite tough. The motivation is not familiar to many reviewers, and standard experiment settings don't account for the problems you aim to solve.
Hence, I'm very excited to see…
Feel like meta closing models was very predictable. I explicitly said this would happen last year and explained why (from blog.pluralis.ai/p/article-2-pr…).
Feel like meta closing models was very predictable. I explicitly said this would happen last year and explained why (from blog.pluralis.ai/p/article-2-pr…). https://t.co/VQkbescnx7
Hidden in the article "Furthermore, there have been some billion dollar offers that were not accepted by researcher/engineering leadership at OpenAI." I believe if @dylan522p wrote it that its true... but how can that be possible?
Hidden in the article "Furthermore, there have been some billion dollar offers that were not accepted by researcher/engineering leadership at OpenAI." I believe if @dylan522p wrote it that its true... but how can that be possible?
People forget Policy Gradient based RL is the most data-inefficient form of training. Going to be major algorithmic advances in RL'ing the base models, probably using something like artificial curiosity (arxiv.org/pdf/1705.05363). But the current methods are not there.
People forget Policy Gradient based RL is the most data-inefficient form of training. Going to be major algorithmic advances in RL'ing the base models, probably using something like artificial curiosity (arxiv.org/pdf/1705.05363). But the current methods are not there.
For people not familiar with AI publishing; there are 3 main conferences every year. ICML, ICLR and NeurIPS. These are technical conferences and the equivalent of journals in other disciplines - they are the main publishing venue for AI. The competition to have papers at these…
For people not familiar with AI publishing; there are 3 main conferences every year. ICML, ICLR and NeurIPS. These are technical conferences and the equivalent of journals in other disciplines - they are the main publishing venue for AI. The competition to have papers at these…
Using beautiful Grafana dashboards for everything internally, so much nicer than Tensorboard. Wandb still good but doesn't really work with decentralised training. Makes me wonder what the internal vis tooling is like in openai - must be incredible.
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
177 Followers 734 FollowingI design to open doors, not close them.
Brand, product, motion, and systems thinking with open-source principles
https://t.co/gwBWVCiFSZ
45K Followers 2K Following🇺🇦 Allbridge is a simple, modern, and reliable way to transfer assets between blockchains. Official Telegram: https://t.co/izJM5nCGLW
3K Followers 9K FollowingNetwork city where founders build towards longevity through biotech, computation, and science 🦄
First hub: @prosperaglobal 🖖
Bio-currency: @livesmoney 💛
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
392K Followers 702 FollowingHassan Sajwani -an Emirati posts news, tech, business, counter Terrorism, geopolitics, RPs not endorsements. الله، وطن، ثم رئيس الدولة Patriot. Personal account
2K Followers 14 FollowingThe AI benchmark for predictive intelligence, advancing collective foresight via human–AI collaboration, from SIGMA Lab @UChicagoCS @DSI_UChicago
588 Followers 211 FollowingHead of Talent @Gensynai. Growing the team that is building the network for machine intelligence. Mom, Wife, Coach, & Problem Solver. Opinions are my own.
10K Followers 904 FollowingFarm Animal Welfare Program Director @open_phil. Views are my own. For more, sign up to my newsletter: https://t.co/oeGbOExgu5
6K Followers 478 FollowingxAI, pre-train lead for v7, grok2&3&4 mini. ex-OpenAI, sole inventor of GPT4-turbo long-context. Core contributor to (GPT4/o/turbo, DaLLE 3, OAI Embedding v3)
62K Followers 80 FollowingWe're hiring. Electric Capital is an early stage venture firm focused on cryptocurrencies, blockchain, fintech, and marketplaces.