Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.
Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.
613 Followers 437 FollowingPhD student at the University of Amsterdam / ILLC, interested in computational linguistics and (mechanistic) interpretability. Current Anthropic Fellow.
2K Followers 529 FollowingAssistant Professor at @TelAvivUni and Research Scientist at @GoogleResearch; previously postdoc at @GoogleDeepMind and @allen_ai
1K Followers 864 FollowingProfessor at Saarland University @LstSaar @SIC_Saar.
Previously PhD at Stanford @stanfordnlp.
Machine learning, language, and cognitive science.
6K Followers 272 FollowingComputer Science Professor at Northeastern, Ex-Googler. Believes AI should be transparent. @[email protected] @davidbau.bsky.social https://t.co/wmP5LV0pJ4
163K Followers 166 FollowingCo-founder of Thinking Machines Lab @thinkymachines; Ex-VP, AI Safety & robotics, applied research @OpenAI; Author of Lil'Log
79K Followers 275 FollowingStudent of causal inference, human reasoning, and history of ideas, all viewed through the sharp lens of artificial intelligence.
30K Followers 123 FollowingMechanistic Interpretability lead DeepMind. Formerly @AnthropicAI, independent. In this to reduce AI X-risk. Neural networks can be understood, let's go do it!
20K Followers 452 Followingphysics of language models @ Meta (FAIR, not GenAI)
🎓:Tsinghua Physics — MIT CSAIL — Princeton/IAS
🏅:IOI x 2 — ACM-ICPC — USACO — Codejam — math MCM
2K Followers 572 FollowingAssociate Professor of Computer Science and Engineering at University of Notre Dame. Natural language processing, formal grammars, machine learning
12K Followers 3K Followingresearch @MIT_CSAIL @thinkymachines. work on scalable and principled algorithms in #LLM and #MLSys. in open-sourcing I trust 🐳. she/her/hers
10K Followers 1K FollowingAssistant Professor @UBC_CS & @VectorInst working on Natural Language Processing. Book: https://t.co/aBnNW4HaQ3. 🦋: @veredshwartz.bsky.social
26K Followers 875 FollowingResearch Scientist Director in Meta FAIR. Reasoning, Optimization and Understanding LLM. Novelist in spare time. PhD in @CMU_Robotics.
3K Followers 630 FollowingInterested in language in biological brains and artificial ones.
Research Fellow at @KempnerInst @Harvard. Previously PhD @mitbrainandcog. 🇱🇹🇩🇰
8K Followers 241 FollowingLlama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, kNN-LM, top-k sampling & Deal Or No Deal.
77K Followers 2K Followinga combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre physicist at @nyuniversity (@CILVRatNYU) & @PrescientDesign
4K Followers 389 Following@rplevy.bsky.social | Director, MIT Computational Psycholinguistics Laboratory | Past President, Cognitive Science Society | Chair of the MIT Faculty | He
4K Followers 11 FollowingI am a Professor of Computer Science at EPFL in Switzerland. My main research interests are in Computer Vision, Machine Learning, and Biomedical imaging.
No recent Favorites. New Favorites will appear here.