Tri Dao @tri_dao
Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems. tridao.me Stanford, CA Joined May 2012-
Tweets601
-
Followers19K
-
Following365
-
Likes1K
We just released Jamba-Instruct! Built from our groundbreaking SSM-Transformer Jamba architecture, Jamba-Instruct brings the same technological innovation to the enterprise via an aligned model. With leading quality benchmarks, a 256K context window, and the most competitive…
With over 20K downloads per month, community engagement with the RedPajama-V2 dataset has been incredible. The 30 trillion tokens of data have been used to train leading models like the recently released @SnowflakeDB Arctic LLM. We've compiled a list of FAQs for using it here:…
❓Wanna host a Llama2-7B-128K (14GB weight + 64GB KV cache) at home🤔 📢 Introducing TriForce! 🚀Lossless Ultra-Fast Long Seq Generation — training-free Spec Dec! 🌟 🔥 TriForce serves with 0.1s/token on 2 RTX4090s + CPU – only 2x slower on an A100 (~55ms on chip), 8x faster…
📢 Releasing TRI's open-source Mamba-7B trained on 1.2T tokens of RefinedWeb! Mamba-7B is the largest fully recurrent Mamba model trained and is a state-of-the-art recurrent LLM. 🚀🚀🚀 huggingface.co/TRI-ML/mamba-7…
It's a great week for open source AI! Data is among the highest impact work to push the field forward. Bravo to 🤗
It's a great week for open source AI! Data is among the highest impact work to push the field forward. Bravo to 🤗
Really proud to have worked hard with my team at @togethercompute over the last few months on our 2nd generation inference engine 🥹 and I can’t wait for everyone to give it a whirl 🔥 Do check out our blog at together.ai/blog/together-…! Also massive thanks to @AIatMeta for the…
Really proud to have worked hard with my team at @togethercompute over the last few months on our 2nd generation inference engine 🥹 and I can’t wait for everyone to give it a whirl 🔥 Do check out our blog at together.ai/blog/together-…! Also massive thanks to @AIatMeta for the…
We are thrilled to be a launch partner for Meta Llama 3. Experience Llama 3 now at up to 350 tokens per second for Llama 3 8B and up to 150 tokens per second for Llama 3 70B, running in full FP16 precision on the Together API! 🤯 together.ai/blog/together-…
🚀Mixtral-8x22B-Instruct-v0.1 now available on the Together API! 🚀 api.together.xyz/playground/cha… We can't wait to see what you build!
Combining SSM/RNN/EMA with attention is the way to higher quality, longer context, and faster inference! Griffin, Jamba, Zamba, and now Megalodon are great examples
Combining SSM/RNN/EMA with attention is the way to higher quality, longer context, and faster inference! Griffin, Jamba, Zamba, and now Megalodon are great examples
Another Mamba-Attention hybrid that looks very strong! These two layers are complementary: Mamba is great at compressing information, and a few attention layers are enough to retrieve from the context for in-context learning.
Another Mamba-Attention hybrid that looks very strong! These two layers are complementary: Mamba is great at compressing information, and a few attention layers are enough to retrieve from the context for in-context learning.
✨Excited to finally drop our new paper: SSMs “look like” RNNs, but we show their statefulness is an illusion🪄🐇 Current SSMs cannot express basic state tracking, but a minimal change fixes this! 👀 w/ @jowenpetty, @Ashish_S_AI arxiv.org/abs/2404.08819
📢We're thrilled to announce that Kurt Keutzer will give the keynote speech for MLSys 2024 Young Professionals Symposium. Welcome to join us for exciting invited talks by @Azaliamirh, Xupeng Miao, @jiawzhao , @ying11231 , @tri_dao on cutting-edge MLSys research! The full…
How does Mamba store knowledge? Is it very different from transformers? New pre-print with @diatkinson and @davidbau, where we investigate the mechanisms of factual recall within Mamba.
🚀Excited to be recognized for a second year by @FortuneMagazine in their Top 50 AI Startups list! We have come so far in the past year and a huge thank you to the now over 60,000 developers building on the Together API. Thank you!
🚀Excited to be recognized for a second year by @FortuneMagazine in their Top 50 AI Startups list! We have come so far in the past year and a huge thank you to the now over 60,000 developers building on the Together API. Thank you!
We are releasing a 1.6B Mamba model along with the full training recipe for practitioners to be able to build upon our work Best part? With the WSD scheduler, you can *actually* build upon our work and continue the pretraining or redo the decay phase, without cold restart issues
We are releasing a 1.6B Mamba model along with the full training recipe for practitioners to be able to build upon our work Best part? With the WSD scheduler, you can *actually* build upon our work and continue the pretraining or redo the decay phase, without cold restart issues
I highly recommend this tutorial on Mamba and related models. Full of insights on model design and hardware-aware implementation!
I highly recommend this tutorial on Mamba and related models. Full of insights on model design and hardware-aware implementation!
In the first 90 days after @_albertgu & @tri_dao published Mamba, we saw 30 downstream papers In the latest @CogRev_Podcast episode, @KamaraiCode and I explain it all Here, I summarize what we've learned about how Mamba works, and also how it complements attention
In the first 90 days after @_albertgu & @tri_dao published Mamba, we saw 30 downstream papers In the latest @CogRev_Podcast episode, @KamaraiCode and I explain it all Here, I summarize what we've learned about how Mamba works, and also how it complements attention https://t.co/ejbroE7V6J
Is Attention all you need? Mamba 🐍, a novel AI model based on State Space Models, emerges as a alternative to the widely used Transformer models 🤖 Read more in our latest article -> thegradient.pub/mamba-explaine…
Jamba just dropped and it is an open source model that combines the best of Mamba and Transformer architectures!! It fits on a single GPU and has a context length of 256k!! The new architecture allows for higher throughput and lower memory while maintaining performance This…
Andrej Karpathy @karpathy
980K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥AK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxSebastian Raschka @rasbt
267K Followers 885 Following Machine learning & AI researcher writing at https://t.co/A0tXWzG1p5. LLM research engineer @LightningAI. Previously stats professor at UW-Madison.Jim Fan @DrJimFan
230K Followers 3K Following @NVIDIA Sr. Research Manager & Lead of Embodied AI (GEAR Lab). Creating foundation models for Humanoid Robots & Gaming. @Stanford Ph.D. @OpenAI's first intern.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistHorace He @cHHillee
24K Followers 449 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleMark Tenenholtz @marktenenholtz
115K Followers 546 Following Head of AI @PredeloHQ. XGBoost peddler, transformer purveyor.Tim Dettmers @Tim_Dettmers
29K Followers 823 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Soumith Chintala @soumithchintala
187K Followers 885 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Beidi Chen @BeidiChen
6K Followers 343 Following Asst. Prof @CarnegieMellon, Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.Richard Socher @RichardSocher
101K Followers 971 Following CEO @youSearchEngine Investing at @aixventuresHQ Before: Stanford Adj Prof in AI/NLP, Chief Scientist at Salesforce, MetaMindAlex Ratner @ajratner
5K Followers 551 Following @SnorkelAI @uwcse / prev @StanfordAILab – Interested in data management systems for machine learning, weak supervision, and impactful applications.Dimitris Papailiopoul.. @DimitrisPapail
12K Followers 977 Following prof @ wisconsin; thinking about transformers; learning in context; babas of Inez LilyNoam Brown @polynoamial
34K Followers 612 Following Researching reasoning @OpenAI | Co-created Libratus/Pluribus, the first superhuman no-limit poker AIs | Co-created CICERO | PhD from @SCSatCMUDan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Sara Hooker @sarahookr
39K Followers 8K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Dan Fu @realDanFu
4K Followers 176 Following CS PhD Candidate at Stanford, systems for machine learning. Sometimes YouTuber/podcaster. Academic Partner, @togethercompute.🔨 Hammers @Hammertown_SF
79 Followers 328 Following Building companies in SF. Accelerating the boom loop.Alessandro Galloni @argalloni
442 Followers 819 Following Computational neuroscientist @ Milstein Lab. Previously @ Francis Crick Institute. Synaptic plasticity, neuroAI, neuromorphic computingRui (Ray) Wang @Rayruw
180 Followers 208 Following Postdoc @MIT; PhD @UCSanDiego; MS @Northeastern; #Spatiotemporal Dynamics, #Physics-guided AI #SymmetryJeremy Trimble @jeremyntrimble
7K Followers 6K Following I use diamond quantum sensors to measure magnetization dynamics in magnetic vortex spin textures | PhD candidate @PhysicsCWRU 🧲💎🔬Mitul Tiwari @mitultiwari
1K Followers 496 Following Director of AI @ServiceNow. Cofounder @PassageAI. Passionate about building products using AI/ML/data science. Worked on People You May Know @LinkedInChangqing Fu @evergreencqfu
47 Followers 383 Following PhD student in Computer Vision and Machine Learning in Univ. Paris 9 - PSLKevin Slagle @kjslag
97 Followers 198 Following professor @RiceUniversity interested in quantum physics and deep learningSabbir Ahmed @_Sabbbir_
13 Followers 744 Following CS undergrad 👨🎓 Linux Enthusiast🐧 Machine Learning and Artificial Intelligence 🤖Henrique Tavares @TavaresHtt
131 Followers 955 Following PhD candidate in Reinforcement Learning in Environmental Engineering at UFRJ. Space Ag enthusiast.Redie @rediejarvis
14 Followers 165 FollowingShiniqua Sunita @ShiniquaSu12044
0 Followers 6 Following Navigating Creative Content. We help you conceptualize, produce and analyze visual content for your business.Leshay Yousef @LeshayYous90778
20 Followers 5 Following "$CollectionName is the go-to spot for crypto NFT flippers! Get reselling and benefit from these unique non-fungible tokens! #cryptonft #PassItOn" #metaDixie Obert @DixieObert54504
0 Followers 7 Following Welcome to the world’s largest #Bitcoin conference! 🇺🇸 Nashville 👉 🇭🇰 Hong Kong 👉Pei Liu @lipi68654341332
19 Followers 102 FollowingFredrika Jyoti @FredrikaJy47405
1 Followers 3 Following まんまるおめめと口元ωが可愛い❤️ Uni, Minuet cat born on March 27, 2020🇯🇵 他に保護猫3匹(そら、あおば、よる)と暮らしてます→@soraaobauniyoru【うにグッズ Uni merch】→Tom @data_topology
104 Followers 473 Following Code your own Deep Learning System. Augment your individual insights with human-centered AI technologies. ... w/ a passion for theoretical physics.Simon Ding @MagicSimonB
0 Followers 46 FollowingNing @Ning15212237654
1 Followers 4 FollowingChivonne Edlin @ChivonneEd44382
2 Followers 2 Following LAUNCH SATURDAY 2 PM UTC $WIFINU THE MOST FAMOUS SHIBA INU WIFHAT CA: HFthZke53jkHHpSTJhipVUyAoeTxpRy1NJwQnuZZ5VeX TG:Jinming Lu @Jinmingmaster
2 Followers 10 FollowingMarena Kaysee @KayseeMare10485
19 Followers 9 Following 'Experimenting with net-based crypto & #NFTs from Zoroark Collection #artistAlexander ZHANG @ChenggongAlex
31 Followers 83 Followingsecret @magic_moneygod
31 Followers 165 FollowingPaul Tan @PaulTan28071521
18 Followers 84 FollowingBen Hoover @Ben_Hoov
686 Followers 273 Following AI Visualization & (re)Interpretability Researcher @IBMResearch @GeorgiaTechdc @p4htmyydf8
414 Followers 701 FollowingProtim @pr0timr
108 Followers 513 Following Open-minded experimentalist investigating high-weirdness. Discovering happinessKundan @kumarkundan
27 Followers 167 Followingqaz @jc_maxwell297
4 Followers 228 FollowingQuanquan Gu @QuanquanGu
9K Followers 2K Following Professor @UCLA | Head of AIDD, ByteDance Research | Recent work: Self-play fine-tuning (SPIN), Self-play preference optimization (SPPO) | Opinions are my ownPablo @Whailard
17 Followers 696 Following精神病狗婊子杂.. @frkglp
0 Followers 4K Following 神病狗婊子杂种邓小平,刘少奇就是整个世界的敌人,它那套歪把戏不除,世界战乱不断。Cgkl精神病狗婊子杂种习近平被凌迟处死。Cgk凌迟处死精神病狗婊子杂种中共狗屁家族邓小平,习近平,陈云,刘少奇,陈一新,张又侠,何卫东,刘振立,苗华,董军。锸s你跟踪本人的精神病狗婊子杂种全部中共空军、警察、台湾间谍Ben Hong (acc/e) @byhong03
39 Followers 232 FollowingRuss Poldrack @russpoldrack
33K Followers 870 Following Professor @ Stanford. Director of Stanford Center for Reproducible Neuroscience and @StanfordCORES he/himBurak Yildiz @byildiz
23 Followers 43 Following The quieter you become, the more you are able to hearFei Gao (Ash) @AskrBayern
72 Followers 335 Following grad stu in NLP @cl_UZH & Neuroinformatics @ETH NeuroAI, Cognitive AI, NLP/speech, and music. he/him. Alumn’ @Sorbonne_Univ_ @CRRdeParis @OhioStateAndrej Karpathy @karpathy
980K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Yann LeCun @ylecun
713K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistAI at Meta @AIatMeta
533K Followers 256 Following Together with the AI community, we are pushing the boundaries of what’s possible through open science to create a more connected world.Horace He @cHHillee
24K Followers 449 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleGoogle DeepMind @GoogleDeepMind
945K Followers 275 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.Tim Dettmers @Tim_Dettmers
29K Followers 823 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Soumith Chintala @soumithchintala
187K Followers 885 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.PyTorch @PyTorch
380K Followers 77 Following Tensors and neural networks in Python with strong hardware acceleration. PyTorch is an open source project at the Linux Foundation. #PyTorchFoundationSasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzGabriel Peyré @gabrielpeyre
92K Followers 450 Following @CNRS researcher at @ENS_ULM. One tweet a day on computational mathematics.Beidi Chen @BeidiChen
6K Followers 343 Following Asst. Prof @CarnegieMellon, Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.Ross Wightman @wightmanr
18K Followers 1K Following Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.Alex Ratner @ajratner
5K Followers 551 Following @SnorkelAI @uwcse / prev @StanfordAILab – Interested in data management systems for machine learning, weak supervision, and impactful applications.Dimitris Papailiopoul.. @DimitrisPapail
12K Followers 977 Following prof @ wisconsin; thinking about transformers; learning in context; babas of Inez Lilyclem 🤗 @ClementDelangue
91K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersDan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Sara Hooker @sarahookr
39K Followers 8K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Dan Fu @realDanFu
4K Followers 176 Following CS PhD Candidate at Stanford, systems for machine learning. Sometimes YouTuber/podcaster. Academic Partner, @togethercompute.Vivek Raghunathan @vivek7ue
4K Followers 2K Following * AI + search at @snowflakedb. * Co-founder @Neeva (acquired by @snowflakedb). #NeevaAI = AI search engine with LLMs. * Ex-VP of Engineering @GoogleJeff Rasley @jeffra45
686 Followers 928 Following @SnowflakeDB AI Research Team. @MSFTDeepSpeed co-founder, @BrownCSDept PhD, @uwcse alumGeorge @georgejrjrjr
2K Followers 852 Following The timeline vibetimes pipeline to things still more strange and enticing.Pragaash @arcanetorch
27 Followers 57 Following A wanderer of sorts and a tinkerer of language models with a passion for bridging all things science and art. Senior Staff Researcher @togethercompute.Hassan @nutlope
74K Followers 948 Following Developer Relations @togethercompute. Building AI apps like @roomGPT and https://t.co/3NFbnMUHJP. Tweeting about AI, web dev, and my side projects.Yikang Shen @Yikang_Shen
997 Followers 235 Following Research staff member at MIT-IBM Watson Lab. PhD from Mila.Numbers Station @NumbersStnAI
304 Followers 9 Following Redefining the end-to-end workflow of Data Analytics with a multi-agent architecture, beyond text-to-SQL. Preview our demo at https://t.co/nIDmIqhyZa.Junyang Lin @JustinLin610
5K Followers 1K Following Chief Evangelist Officer of Qwen Team & OpenDevin, building LLM and LMM. Now @Alibaba_Qwen . Previously @PKU1898 LANCO group. ❤️ 🍵 ☕️ 🍷 🥃Jared Quincy Davis @jaredq_
651 Followers 308 Following Founder and CEO, Foundry. @mlfoundry Orchestrating Compute. Fmr Research Scientist @DeepMind, Deep Learning Team. CS PhD @Stanford. ML, Distributed SystemsNeal Wu @WuNeal
15K Followers 391 Following Building @cognition_labs. Previously @tryramp, @GoogleBrain, @Harvard, competitive programming (featured in @Wired). Created https://t.co/pihw5AGvbV.Avner May @avnermay
130 Followers 202 Following Staff Research Scientist at https://t.co/WEMkSSRVeZ. Formerly research scientist at Google, postdoc at Stanford, and PhD student at Columbia.Aaron Gokaslan @SkyLi0n
3K Followers 345 Following Creator of the OpenWebText and OpenGPT2. @PyTorch Core Reviewer. PhD Student at @Cornell (interning at @MosaicML) Previously at @FacebookAI and @BrownUniversityYair Schiff @SchiffYair
167 Followers 122 FollowingAntonio Orvieto @orvieto_antonio
1K Followers 1K Following Deep Learning PI @ELLISInst_Tue, Group Leader @MPI_IS. I compute stuff with lots of gradients 🧮, I like Kierkegaard & Lévi-Strauss 🧙♂️Ferdinand Mom @FerdinandMom
132 Followers 558 Following Large scale training @HuggingFace. Average CPU & CUDA optimization enjoyer ~Itamar Zimerman @ItamarZimerman
255 Followers 334 Following PhD candidate @ Tel Aviv University. AI Research scientist @ IBM Research. Interested in deep learning and algorithms.Caglar Gulcehre @caglarml
4K Followers 1K Following ML Researcher Prof @ EPFL, PI @ CLAIRE lab Ex: Staff Research Scientist @ Deepmind, MSR, IBM Research Follow me on Mastodon: https://t.co/LZ5sWt7AsjHelen Qu @_helenqu
229 Followers 66 Following supernovae / cosmology / machine learning ✨ incoming research fellow @FlatironCCA, prev: PhD @physatpenn ‘24, BSE @CIS_Penn '17Felix @felix_red_panda
3K Followers 2K Following CS Student, speech synthesis and LLM nerd, DMs openMatt Shumer @mattshumer_
51K Followers 1K Following CEO @HyperWriteAI, @OthersideAI - I make AIs do the impossible.Mark Saroufim @marksaroufim
9K Followers 656 Following @pytorch dev broadly interested in performance https://t.co/6KJ328JUwvWenhu Chen @WenhuChen
11K Followers 520 Following AI researcher @UWaterloo @GoogleAI @VectorInst. Interested in natural language processing, diffusion models. I direct TIGER-Lab at UWaterloo.Jerry Chee @CheeJerry
42 Followers 10 FollowingYilun Du @du_yilun
5K Followers 211 Following PhD student at @MIT_LISLab/@MITCoCoSci, Researcher at @pika_labs, Generative Models, Robot Learning. Interned at @MetaAI, @DeepMind, Research Fellow at @openaiFrançois Fleuret @francoisfleuret
31K Followers 460 Following Prof. @Unige_en, Adjunct Prof. @EPFL_en, Research Fellow @idiap_ch, co-founder @nc_shape. AI and machine learning since 1994. I like reality.Angelos Katharopoulos @angeloskath
2K Followers 237 Following Machine Learning Research @Apple. Previously PhD student at @idiap_ch and @EPFL. Interested in all things machine learnableQuentin Anthony @QuentinAnthon15
999 Followers 129 Following I make models more efficient. Google Scholar: https://t.co/kzVsAKPdrpSanghun Cho @SanghunCho80494
26 Followers 19 FollowingOmar Khattab @lateinteraction
11K Followers 2K Following CS PhD candidate @StanfordNLP. 2022 Apple Scholar in AI/ML. Author of ColBERT (https://t.co/2ZtgXoa1np), DSPy (https://t.co/BH7WmMKDXR), & various retrieval & LM systems.Grace Isford @graceisford
7K Followers 2K Following Partner @Lux_Capital investing in the future 🚀 | board @ecorner (STVP) previously @canvasvc @stanfordwib @joinhandshake @stanfordSonglin Yang @SonglinYang4
2K Followers 2K Following PhD student @MIT_CSAIL. Prev. @ShanghaiTechUni @SUSTechSZ. Working on scalable and principled methods in #ML & #NLProc. INTP | 5w4 | sx/sp | she/herDevendra Chaplot @dchaplot
8K Followers 365 Following Building next-gen AI at @MistralAI. Past: Research Scientist at Facebook AI Research. Ph.D. @SCSatCMU, BTech @iitbombay CS.Armen Aghajanyan @ArmenAgha
6K Followers 264 Following Research Scientist @ Meta AI (FAIR) https://t.co/8XF2vtiIVy Opinions are my own.main @main_horse
8K Followers 478 Following AGI Believer. Haven't applied @OpenAI. Likes are not always endorsement.Nadia Polikarpova @polikarn
4K Followers 307 Following Associate prof @ucsd_cse. Building tools for program verification and synthesis.Alex McKinney @alexfmckinney
1K Followers 953 Following AI Researcher / Engineer 🤖 || Computery Guy 🖥️ || Big Models @cohere || I use Arch btw (≧▽≦)Wing Lian (caseus) @winglian
9K Followers 2K Following @axolotl_ai OSS maintainer. Axolotl AI founder. AI/ML tinkerer. Building tools for everyone. ☕ https://t.co/3ni1V4rI9wSimo Ryu @cloneofsimo
3K Followers 385 Following #KAIST RAI Lab (ML engineering #Naver) Interested in robotics, RL, math (but you might know me for t2i diffusion) [email protected]We just released Jamba-Instruct! Built from our groundbreaking SSM-Transformer Jamba architecture, Jamba-Instruct brings the same technological innovation to the enterprise via an aligned model. With leading quality benchmarks, a 256K context window, and the most competitive…
With over 20K downloads per month, community engagement with the RedPajama-V2 dataset has been incredible. The 30 trillion tokens of data have been used to train leading models like the recently released @SnowflakeDB Arctic LLM. We've compiled a list of FAQs for using it here:…
I am incredibly proud that the first author behind the development of the transformative Med-Gemini model is the Arab scientist, Khaled Saab. It's truly inspiring to see Arabs making significant strides in advancing our understanding of science. 💗🙏🏻
On the prominent Exams (USMLE), Med-Gemini demonstrates performance with an accuracy of 91.1%. Across seven multimodal medical benchmarks, Med-Gemini surpasses GPT-4 Family by an average relative margin of 44.5%. (4)
This is why you want to use full precision inference on @togethercompute
Llama 3 degrades more than Llama 2 when quantized. Probably because Llama 3, trained on a record 15T tokens, captures extremely nuanced data relationships, utilizing even the minutest decimals in BF16 precision fully. Making it more sensitive to quantization degradation.…
If we view Attention and MLP as below, they look drastically similar: Attention: out = f(Q * K^T) * V MLP: out = g(X * W_1) * W_2 where f is Softmax and g is whatever nonlinearity. So, why is there a FlashAttention but no FlashMLP? 🤔 As a CUDA enthusiast, I have a theory,…
Excited to partner w/ @vipulved @percyliang @tri_dao and team on this!
Together AI and Snowflake partner to bring their state-of-the-art Arctic LLM to enterprise customers. Experience Arctic on Together Inference with best in class performance. api.together.xyz/playground/cha…
Thanks for the shoutout! This beta version of UI is still bare-bone but a better version will drop soon. Stay tuned!
The homepage of @komo__ai mirrors the classic search UI we've all grown used to, but it also has all the trending features of AI search: ✅ Relevant answers ✅ Links to resources ✅ Conversational AI chats ✅ Privacy ✅ Ad-free What's your top AI search feature?
PyTorch 2.3 is here 😎🔥 PyTorch 2.3 offers support for user-defined Triton kernels in torch.compile, allowing for users to migrate their own Triton kernels from eager without experiencing performance regressions or graph breaks. Details: hubs.la/Q02tYcYq0
What a year it has been at @augmentcode! Today we have reached a massive milestone on our journey to augment software engineers with AI: We've secured $252M in Series B funding! I am proud to be part of the team and excited about what the future holds. techcrunch.com/2024/04/24/eri…
❓Wanna host a Llama2-7B-128K (14GB weight + 64GB KV cache) at home🤔 📢 Introducing TriForce! 🚀Lossless Ultra-Fast Long Seq Generation — training-free Spec Dec! 🌟 🔥 TriForce serves with 0.1s/token on 2 RTX4090s + CPU – only 2x slower on an A100 (~55ms on chip), 8x faster…
Our Mamba-7B outperforms Llama2-7B on the Eleuther LM Harness (with MMLU as a notable exception). We trained Mamba-7B as part of a larger study that identifies opportunities and limitations of recurrent LLMs . Stay tuned for our paper and code coming soon! 👀
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Llama3 was trained on 15 trillion tokens of public data. But where can you find such datasets and recipes?? Here comes the first release of 🍷Fineweb. A high quality large scale filtered web dataset out-performing all current datasets of its scale. We trained 200+ ablation…
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Really proud to have worked hard with my team at @togethercompute over the last few months on our 2nd generation inference engine 🥹 and I can’t wait for everyone to give it a whirl 🔥 Do check out our blog at together.ai/blog/together-…! Also massive thanks to @AIatMeta for the…
We are thrilled to be a launch partner for Meta Llama 3. Experience Llama 3 now at up to 350 tokens per second for Llama 3 8B and up to 150 tokens per second for Llama 3 70B, running in full FP16 precision on the Together API! 🤯 together.ai/blog/together-…
We are thrilled to be a launch partner for Meta Llama 3. Experience Llama 3 now at up to 350 tokens per second for Llama 3 8B and up to 150 tokens per second for Llama 3 70B, running in full FP16 precision on the Together API! 🤯 together.ai/blog/together-…
@drjwrae Totally agree. I wish their take had been able to point this out directly instead of being somewhat more adversarial in tone than necessary.
🚀Mixtral-8x22B-Instruct-v0.1 now available on the Together API! 🚀 api.together.xyz/playground/cha… We can't wait to see what you build!
Here are the Top 10 Fastest Growing US Companies founded by Indian immigrants: 1. @togethercompute 2. @NileSecure 3. @NirvanaTechInc 4. @glean 5. @Auradine_Inc 6. @Minio 7. @AmbientPhotonic 8. Electra Steel Inc 9. @ridezum 10.@AsteraLabs