Inspired by today's Genie 3 release? We are open-sourcing 🧞♀️Jasmine🧞♀️, a production-ready JAX-based codebase for world modeling from unlabeled videos. Scale from single hosts to hundreds of xPUs thanks to XLA! 🧵 (1/10)
ByteDance is exploring diffusion LLMs too! 👀
Seed Diffusion Preview: a blazing-fast LLM for code, built on discrete-state diffusion.
With 2,146 tokens/sec inference on H20 GPUs, it outpaces Mercury & Gemini Diffusion, while matching their performance on standard code…
Guys, I'm actively looking for professors/labs to work under. My speciality is diffusion modeling and llms. If you know someone who is looking for interns, please do let me know.
Appreciate any kind of leads 🙏
Thank you!!
Diffusion Beats Autoregressive in Data-Constrained Settings
Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens.
Key findings:
1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide…
Kimi K2 tech report is full of gems as always. Here are my notes on it:
> MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with…
on june 15, my prev role ended - i’ve taken some time since then to reflect.
now that i’m back, i am actively looking for my next FT role in AI/LLM.
if you're building in AI and hiring someone who can work on challenging problems, i'd love to chat.
DMs open
RTs appreciated 💛
AdaMuon: Adaptive Muon Optimizer
Builds on Muon, a geometry-preserving optimizer using polar decomposition (Newton–Schulz) for orthogonal 2D updates, and adds per-parameter second-moment scaling and RMS-aligned rescaling to fix Muon’s sensitivity to noisy gradients.
AdaMuon…
149 Followers 168 FollowingNon-linear dynamics in LLMs, AI cognition, embedding vulnerabilities. Takens' is the map. Safety is the reason. Kevin R. Haylett (PhD): See web link below!
2K Followers 2K FollowingPh.D. Student @PrincetonCS. Prev @Stanford @UW @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I used to work on computer vision, but it's not all I do.
23 Followers 5K FollowingLike to try new things you never know; trying to prove all software can be automated 😅 😅 😅
| ML/AI, | C++/Java/Go |
GitHub : Dyl777
525 Followers 7K FollowingFounder @Setica —
🌐 https://t.co/k41rINekVX. Alien on planet Earth. Ai researcher and Indie Developer(web & apps).Building Ai models and Ai agents and Saas Apps
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
45K Followers 64 FollowingStudent of mind and nature, libertarian, chess player, cancer survivor. @ Keen, UAlberta, Amii, https://t.co/u8za2Kod54, The Royal Society, Turing Award
149 Followers 168 FollowingNon-linear dynamics in LLMs, AI cognition, embedding vulnerabilities. Takens' is the map. Safety is the reason. Kevin R. Haylett (PhD): See web link below!
3K Followers 206 FollowingMultimodal AI for the world's scale.
Proponents of Open Source and Open Intelligence.
https://t.co/1nC6r8hOrE for some of our recent work.
13K Followers 753 FollowingResearch eng @GoogleDeepMind on Gemini pretrain. Personal acct. Past: swe intern @SpaceX, ugrad researcher in @tserre lab @BrownUniversity. All opinions my own.
6K Followers 478 FollowingxAI, pre-train lead for v7, grok2&3&4 mini. ex-OpenAI, sole inventor of GPT4-turbo long-context. Core contributor to (GPT4/o/turbo, DaLLE 3, OAI Embedding v3)
26K Followers 876 FollowingResearch Scientist Director in Meta FAIR. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.
6K Followers 1K FollowingResearch scientist at @GoogleDeepMind, working on generative models, deep learning, RL. PhD from @stanford. Gemini Diffusion lead.