-
Tweets218
-
Followers829
-
Following661
-
Likes9K
1/Pretraining is hitting a data wall; scaling raw web data alone leads to diminishing returns. Today @datologyai shares BeyondWeb, our synthetic data approach & all the learnings from scaling it to trillions of tokens🧑🏼🍳 - 3B LLMs beat 8B models🚀 - Pareto frontier for performance
🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With…
We are announcing Open Thoughts, our large-scale open-source effort to curate the best open reasoning datasets! DeepSeek-R1 is amazing but we still don't have access to high-quality open reasoning datasets. These datasets are crucial if you want to build your reasoning models!…
Introducing Bespoke-Stratos-32B, our reasoning model distilled from DeepSeek-R1 using Berkeley NovaSky’s Sky-T1 recipe. The model outperforms Sky-T1 and o1-preview in reasoning (Math and Code) benchmarks and almost reaches the performance of DeepSeek-R1-Distill-Qwen-32B while…
We are happy to announce Curator, an open-source library designed to streamline synthetic data generation! High-quality synthetic data generation is essential in training and evaluating LLMs/agents/RAG pipelines these days, but tooling around this is still entirely lacking! So…
Nice to see my previous work that I led at Google DeepMind covered by VentureBeat (in the light of a new work from Meta). Context: We had introduced the novel idea of Generative Retrieval for recommender systems to the world in our Neurips 2023 paper called TIGER (Transformer…
It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌
It's finally here! Excited to share the project I led with KRAFTON and NVIDIA. The future of gaming is here 🙌
Watch the full conversation: youtu.be/2tlWPgmiX2s?si…
Databricks research scientist @shashank_r12 s shares approaches in LLMs: - How RAG enhances accuracy - Evolution of attention mechanisms - Practical applications & trade-offs of Mamba architectures
Soo disappointed that it's just a "department" and not a School, College or an Institute.. gotta get ahead of the curve, @IITKgp!!
I have three Ph.D. student openings in my research group at @RutgersCS starting in Fall 2025. If you are interested in working with me on efficient algorithms and systems for LLMs, foundation models, and AI4Science, please apply at: grad.rutgers.edu/academics/prog… The deadline is…
🧵 Super proud to finally share this work I led last quarter - the @databricks Domain Intelligence Benchmark Suite (DIBS)! TL;DR: Academic benchmarks ≠ real performance and domain intelligence > general capabilities for enterprise tasks. 1/3
i'm somewhat confident that both the following properties will hold of language models in 2027: 1. tokenization will be gone, replaced with byte-level ingestion 2. all tokens that don't need to be read or written by a human will be continuous vectors luckily two interesting…
At NeurIPS early? Like making GPUs go brrr? Join me at a luncheon tomorrow on LLM Scaling x Efficiency, 5 mins from the conference center... Note, folks need to have directly relevant work of not in the field. DM me for more info or for reccs! Per the usual, I'll be doing 3…
I'll be at NeurIPS and would love to chat about anything AI. Also, visit the Databricks booth to checkout out some of the work we've been doing! databricks.com/blog/databrick…
Introducing Llama 3.3 – a new 70B model that delivers the performance of our 405B model but is easier & more cost-efficient to run. By leveraging the latest advancements in post-training techniques including online preference optimization, this model improves core performance at…
🤔 How can we achieve GPT-3 175B-level performance with only 1.3B parameters? 🌟 New from #NVIDIAResearch: HYMBA (HYbrid Multi-head Bi-Attention) combines MLP and attention mechanisms to dramatically boost small language model capabilities. HYMBA could revolutionize NLP…
[1/10] Q: with the awesome dense models available, does it make sense to do upcycling (MoE-ify them)? A: it depends bc: - a lot of additional FLOPs need to be sunk in - MoEs are (out-of-the-box) slower for inference - BUT model quality can improve a lot (given enough flops)!
🎉 Milestone: Our LIFT paper has hit 100+ citations! We introduced a simple method to adapt LLMs to new domains, and researchers are now achieving success with it across predictive chemistry, metamaterial physics & more! Check our work at uw-madison-lee-lab.github.io/LanguageInterf…
this talk is so good, once the public link drops I *highly* recommend checking it out
this talk is so good, once the public link drops I *highly* recommend checking it out

Shuxun Wang @Saberlve
2 Followers 25 Following
Jasmine Lesner @JasmineLesner
2 Followers 117 Following
Starc Institute @ARC_Guide
21 Followers 897 Following Starc Institute: A boundaryless academy where talent unites to ignite research and shape the future. https://t.co/mu3ZxzqLnW
Weiyang Liu @Besteuler
2K Followers 829 Following Assistant Professor @CUHKofficial. Postdoc @MPI_IS. PhD @Cambridge_Uni & @GeorgiaTech. Previous Intern @Google & @nvidia. All opinions are my own.
FENG Yang @fy2598099
24 Followers 1K Following
Kunal Singh @ikunalsingh7
199 Followers 2K Following AI Research @fractalai @fractalAIR Prev: GSoC @CERN, Alumnus @IITKgp, Intern @AmiiThinks Reasoning@LLM/VLMs
Ari Morcos @arimorcos
7K Followers 2K Following CEO and Co-founder @datologyai working to make it easy for anyone to make the most of their data. Former: RS @AIatMeta (FAIR), RS @DeepMind, PhD @PiN_Harvard.
BM building AI @BMAIengineer
83 Followers 4K Following University student. Trying to build. Networking. Interests in AI Research,startups,software,CS and emerging technologies.
Duy Nguyen @duynwin
117 Followers 1K Following AI/ML Engineer | PhD in statistics, Alum @UWMadison | concert pianist
Jaydev Tonde @JaydevTonde
41 Followers 474 Following Data Scientist II @Wolters_Kluwer , Master's in computer science from Pune University, Visit My Blog : https://t.co/xfjPfXQUrL
Ativ Joshi @ativsc
306 Followers 2K Following संसारोऽयम अतीव विचित्रः | CS Ph.D. Student @UMassAmherst | Online Optimization, ML Theory | Prev: Research Assistant @TIFRScience, CMI, @ahduniv
utkarsh @utkarsh_2105
656 Followers 2K Following he/him | CS undergrad BITS Pilani | (distributed, large, fast and so on) Systems for ML @MSFTResearch, prev @Inria
Liya_Fuad @Liya_Haiqal
147 Followers 7K Following
Michael Dempsey @mhdempsey
29K Followers 5K Following Dreaming about the future, romanticizing the past | MP @compoundvc investing & researching former science projects & crypto | we are only creativity constrained
We work the talk @seper_sepuluh
246 Followers 5K Following Assume nothing, question everything Talking AI, Enjoy public transport🚃
Prabhjot Singh @prabhjotsdhaura
2 Followers 50 Following
Manoj Acharya @manoja328
630 Followers 7K Following Mostly Interested in safe and aligned (neural inspired) Machine Intelligence ; PhD from Rochester Institute of Technology
Navin @navdev0
40 Followers 166 Following
SamEgwuJr @SamEgwuJr
170 Followers 1K Following Co-founder Edunova || Education || Computer Vision Researcher
Smoda @Smoda9FNQ
8 Followers 144 Following
Joe Mayo @JoeMayo
16K Followers 7K Following Author and Independent Consultant Recent books: - Programming the Microsoft Bot Framework/MSPress - C# Cookbook/O'Reilly Agents, AI, Generative AI, MCP, RAG
Dara @dara_tourt
13 Followers 8K Following
Mahesh @rmd2lv
68 Followers 927 Following
Omeed Tehrani @omeedtehrani
180 Followers 1K Following graduate computer scientist @utaustin | founder @fspodofficial | core infra @capitalone | robotics research | data engineering | developing taste
Anthony Susevski @asusevski
425 Followers 2K Following ml enjoyer. find it from within or be without. recovering Liberty village resident
F**k the EU ☦️ @f__k_the_EU
148 Followers 2K Following
Lars @LarsNW
943 Followers 1K Following
Muru Zhang @zhang_muru
565 Followers 306 Following First-year PhD @nlp_usc | Student Researcher @GoogleDeepmind | bsms @uwcse | Prevs. @togethercompute @AWS
Dan Pechi @danpechi
234 Followers 427 Following senior generative ai product specialist | computational linguist. nyu msds | tufts 19. databricks.
Zachary Huang @ZacharyHuang12
4K Followers 1K Following Researcher @MSFTResearch AI Frontiers. LLM Agents and Systems. | PhD @ColumbiaCompSci | Prev: @GraySystemsLab @databricks| Fellowship: @GoogleAI | New YouTuber
Ivan Zhou @ivanzhouyq
1K Followers 436 Following AI research engineer @Databricks 🧱 Prev @Uber AI, @StanfordCRFM, @LandingAI. I love computer vision in many ways 📸👨🏻💻🌁
Nathan Benaich @nathanbenaich
61K Followers 34K Following solo member of investment staff @airstreet @airstreetpress @stateofaireport @raais
Saaketh @saanarkethayan
186 Followers 180 Following 🦙 llama training @AIatMeta. previously @DbrxMosaicAI
Negin Raoof @NeginRaoof_
876 Followers 400 Following Ph.D. student @Berkeley_EECS advised by @AlexGDimakis Ex: SWE @microsoft, collaborator @PyTorch
Nhan Ho @NhanHo033
0 Followers 246 Following
Tristan Snell @TristanSnell
579K Followers 232K Following Lawyer, commentator, fighter for democracy. Prosecuted Trump University @ NY AG. Substack: https://t.co/9iRTNuAbIH. Host of the Tristan Snell Show on Apple + Spotify.
Marios @Marios_Kele
21 Followers 648 Following A passionate Computer Scientist, interested in solving problems using technology and the principles of algorithmic thought.
Deedy @deedydas
205K Followers 5K Following VC at @MenloVentures. Formerly founding team @glean, @Google Search. @Cornell CS. Tweets about tech, immigration, India, fitness and search.
Weiyang Liu @Besteuler
2K Followers 829 Following Assistant Professor @CUHKofficial. Postdoc @MPI_IS. PhD @Cambridge_Uni & @GeorgiaTech. Previous Intern @Google & @nvidia. All opinions are my own.
Kunal Singh @ikunalsingh7
199 Followers 2K Following AI Research @fractalai @fractalAIR Prev: GSoC @CERN, Alumnus @IITKgp, Intern @AmiiThinks Reasoning@LLM/VLMs
David Brandfonbrener @brandfonbrener
1K Followers 619 Following research scientist @AIatMeta. Previously: phd from @nyu_courant, research fellow @KempnerInst @Harvard
jianlin.su @Jianlin_S
3K Followers 14 Following Grad is all you need @Kimi_Moonshot Blog: https://t.co/YVxsWylklA , Cool Papers: https://t.co/scS1n1oyaO
Anish Athalye @anishathalye
4K Followers 226 Following cto @cleanlabai • prev phd @mit_csail • research at https://t.co/MdknnUE4C6 • blog at https://t.co/oGOMQyhxv5 • open-source at https://t.co/VawMWMr84F
Sumanth @sumanthd17
4K Followers 2K Following Building Models @sarvamai PhD’ing @iitmadras @AI4Bharat, Google PhD Fellow, Past life - @GoogleAI @Mila_Quebec @IIITSC
Prem Qu Nair @premqnair
5K Followers 910 Following @cognition, previously @nuro @princeton. Pursuing 70mm, 225lb, and $0.10/piece
bagels.ai @bagelsAI
362 Followers 580 Following founder + chief script kiddie @bagels.ai, a 🥯2🥯 (b2b) llm gen ai startup in stealth | cofounder of loxML (acq. 2020) | ex-OpenAI (catering) | 🤖+🥯=🦾
SemiAnalysis @SemiAnalysis_
34K Followers 16 Following
Omar Sanseviero @osanseviero
50K Followers 3K Following Developer Experience Lead at @GoogleDeepMind Building Gemini API, Gemma, AI Studio and more AI products. My views ex-Chief Llama Officer @huggingface 🇵🇪🇲🇽
Alexandr Wang @alexandr_wang
327K Followers 833 Following chief ai officer @meta, founder @scale_ai. rational in the fullness of time
utkarsh @utkarsh_2105
656 Followers 2K Following he/him | CS undergrad BITS Pilani | (distributed, large, fast and so on) Systems for ML @MSFTResearch, prev @Inria
Ativ Joshi @ativsc
306 Followers 2K Following संसारोऽयम अतीव विचित्रः | CS Ph.D. Student @UMassAmherst | Online Optimization, ML Theory | Prev: Research Assistant @TIFRScience, CMI, @ahduniv
kalomaze @kalomaze
18K Followers 2K Following ML researcher (@primeintellect), speculator • extremely silly jester
MiniMax (official) @MiniMax__AI
18K Followers 11 Following Our mission is to build a world where intelligence thrives with everyone. MiniMax Agent: https://t.co/XzaTmAos0V
Songlin Yang @SonglinYang4
12K Followers 3K Following PhD-ing @MIT_CSAIL. Working on scalable and principled algorithms in #LLM and #MLSys. In open-sourcing I trust 🐳. she/her/hers
vLLM @vllm_project
17K Followers 20 Following A high-throughput and memory-efficient inference and serving engine for LLMs. Join https://t.co/lxJ0SfX5pJ to discuss together with the community!
Alan Ritter @alan_ritter
5K Followers 1K Following Computing professor at Georgia Tech - natural language processing, language models, machine learning, information extraction, dialogue
Manoj Acharya @manoja328
630 Followers 7K Following Mostly Interested in safe and aligned (neural inspired) Machine Intelligence ; PhD from Rochester Institute of Technology
Zhuang Liu @liuzhuang1234
11K Followers 1K Following Assistant Professor @PrincetonCS. researcher in deep learning, vision, models. previously @MetaAI, @UCBerkeley, @Tsinghua_Uni
typedfemale @typedfemale
38K Followers 532 Following a really exciting new account "advanced pytorch user" - @cHHillee alt: @typedalt
Ioannis Antonoglou @real_ioannis
2K Followers 29 Following Co-Founder, CTO, @reflection_ai DQN, AlphaGo, AlphaZero, MuZero, Gemini RLHF Prev Senior Staff RS and founding eng @GoogleDeepMind AGI one PR at a time
Misha Laskin @MishaLaskin
15K Followers 214 Following Co-founder, CEO at @reflection_ai. Prev: Research @DeepMind. Gemini RL team.
Sholto Douglas @_sholtodouglas
25K Followers 1K Following Scaling RL @AnthropicAI, ex @DeepMind - working towards intelligence too cheap to meter
Hanxiao Liu @Hanxiao_6
2K Followers 103 Following @Microsoft AI, ex-Inflection, Google Brain, DeepMind We are hiring!
kipply @kipperrii
9K Followers 972 Following "uncanny ability to be mentioned in every slack thread about code that's mysteriously breaking" - claude | alt @kipperriiii
Hao Zhang @haozhangml
6K Followers 474 Following Asst. Prof. @HDSIUCSD and @ucsd_cse running @haoailab. Cofounder and runs @lmsysorg. 20% with @Snowflake
Replit ⠕ @Replit
192K Followers 554 Following Idea to app, fast. Create beautiful, modern web applications at the speed of thought with the power of Replit's AI Agent.
Joe Mayo @JoeMayo
16K Followers 7K Following Author and Independent Consultant Recent books: - Programming the Microsoft Bot Framework/MSPress - C# Cookbook/O'Reilly Agents, AI, Generative AI, MCP, RAG
Trelis Research @TrelisResearch
1K Followers 491 Following 👷Work for Trelis: https://t.co/tAts18SIfB 🎥 Watch on Youtube: https://t.co/BPo1FyRuz9 💡 Book a Consultation: https://t.co/DqFajF3fV0
You Jiacheng @YouJiacheng
8K Followers 2K Following a big fan of TileLang 关注TileLang喵!关注TileLang谢谢喵! https://t.co/utshC0jrCO 十年老粉
Artificial Analysis @ArtificialAnlys
57K Followers 542 Following Independent analysis of AI models and hosting providers - choose the best model and API provider for your use-case
Devendra Chaplot @dchaplot
13K Followers 433 Following Building next-gen AI at @thinkymachines. Past: Founding team @MistralAI, RS at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.
Dwarkesh Patel @dwarkesh_sp
127K Followers 909 Following Host of @dwarkeshpodcast https://t.co/3SXlu7fy6N https://t.co/4DPAxODFYi https://t.co/hQfIWdM1Un
Jonas Geiping @jonasgeiping
4K Followers 800 Following Machine Learning Research at the ELLIS Institute & Max-Planck for Intelligent Systems// Excited about fundamental questions in Safety & Efficiency of modern ML
Exa @ExaAILabs
43K Followers 29 Following We're an AI research lab building search for the future. Most powerful web search API → https://t.co/M5QuIA5D2A high compute web search → https://t.co/uHn3Ra5yJ2
Muru Zhang @zhang_muru
565 Followers 306 Following First-year PhD @nlp_usc | Student Researcher @GoogleDeepmind | bsms @uwcse | Prevs. @togethercompute @AWS
Jacob Austin @jacobaustin132
7K Followers 918 Following Research at @GoogleDeepMind. Currently making LLMs go fast. I also play piano and climb. NYC. Opinions my own
Zachary Huang @ZacharyHuang12
4K Followers 1K Following Researcher @MSFTResearch AI Frontiers. LLM Agents and Systems. | PhD @ColumbiaCompSci | Prev: @GraySystemsLab @databricks| Fellowship: @GoogleAI | New YouTuber
Dan Pechi @danpechi
234 Followers 427 Following senior generative ai product specialist | computational linguist. nyu msds | tufts 19. databricks.