JAYANTH @_jayanth_mohan_
AI undergrad | Researcher Joined December 2016-
Tweets252
-
Followers9
-
Following248
-
Likes75
4 advanced attention mechanisms you should know: • Slim attention — 8× less memory, 5× faster generation by storing only K from KV pairs and recomputing V. • XAttention — 13.5× speedup on long sequences via "looking" at the sum of values along diagonal lines in the attention…
the best researchers from Meta, Yale, Stanford, Google DeepMind, and Microsoft laid out all we know about Agents in a 264-page paper [book], here are some of their key findings:
Coded Llama 3.2 model from scratch and shared it on the HF Hub. Why? I think 1B & 3B models are great for experimentation, and I wanted to share a clean, readable implementation for learning & research: huggingface.co/rasbt/llama-3.…
QwQ is fantastic reasoner and is 10x cheaper than the o1 line We will be combining with o1-mini and o1-preview as part of our route LLM AGI will be an ensemble system that combines the best LLMs to maximize performance, speed and cost
Stanford CS229: Building Large Language Models This 1.5 hours lecture provides a concise overview of building a ChatGPT-like model, covering both pretraining (language modeling) and post-training (SFT/RLHF). youtu.be/9vM4p9NN0Ts?si…
Fantastic Survey! Autoregressive Models in Vision.
Fantastic Survey! Autoregressive Models in Vision.
M4 Mac AI Coding Cluster Uses @exolabs to run LLMs (here Qwen 2.5 Coder 32B at 18 tok/sec) distributed across 4 M4 Mac Minis (Thunderbolt 5 80Gbps) and a MacBook Pro M4 Max. Local alternative to @cursor_ai (benchmark comparison soon).
🚨This week’s top AI/ML research papers: - Mixture-of-Transformers - BitNet a4.8 - LoRA vs Full Fine-tuning: An Illusion of Equivalence - Mixtures of In-Context Learners - Emergence of Hidden Capabilities - DimensionX - The Surprising Effectiveness of Test-Time Training for…
Nice collection of LLM papers, blogs, and projects, focussing on OpenAI o1 and reasoning techniques. What it offers: 📌 Curates papers, blogs, talks, and Twitter discussions about OpenAI's o1 and LLM reasoning 📌 Tracks frontier developments in LLM reasoning capabilities and…
If you are looking for something to read/study this weekend, I added lots of LLM-related bonus from-scratch coding resources over the last few months (from implementing Llama 3.2 to preference tuning with DPO): github.com/rasbt/LLMs-fro… I hope you find them useful!
Microsoft just changed the game! 🔥 They've open-sourced bitnet.cpp: a blazing-fast 1-bit LLM inference framework that runs directly on CPUs. Why is this a game-changer❓ You can now run 100B parameter models on local devices with up to 6x speed improvements and 82% less…
You see the length of this prompt? This is what you should have in your instruct dataset if you want to compete with the big players.
You see the length of this prompt? This is what you should have in your instruct dataset if you want to compete with the big players. https://t.co/bUSThgrRf3
The model card has some more interesting info too: github.com/meta-llama/lla… Note that Llama 3 8B is actually somewhere in the territory of Llama 2 70B, depending on where you look. This might seem confusing at first but note that the former was trained for 15T tokens, while the…
Data contamination is a huge problem for LLM evals right now. At Scale, we created a new test set for GSM8k *from scratch* to measure overfitting and found evidence that some models (most notably Mistral and Phi) do substantially worse on this new test set compared to GSM8k.
Nice new read on tokenization! You've heard about the SolidGoldMagikarp token, which breaks GPT-2 because it was present in the training set of the Tokenizer, but not the LLM later. This paper digs in in a lot more depth and detail, on a lot more models, discovering a less…
Nice new read on tokenization! You've heard about the SolidGoldMagikarp token, which breaks GPT-2 because it was present in the training set of the Tokenizer, but not the LLM later. This paper digs in in a lot more depth and detail, on a lot more models, discovering a less…
Big: First BitNet reproduction shows consistent results!
llmlingua - This great lib from Microsoft can 𝗰𝗼𝗺𝗽𝗿𝗲𝘀𝘀 your prompt massively. 📌 Up to 20% of the prompt's original length (5x reduction), leading to massively reduced cost and latency. 🔥 speed up LLMs' inference and enhance LLM's perceive of key information, compress…
This sets the ground for AGI. Sakana AI just released a new method to combine the 500,000 open-source models to build new ones. Evolutionary Model Merge uses evolutionary techniques to automatically create new foundation models with the desired capabilities. "We find that our…
Thank you, @Thom_Wolf for sharing your slides from the recent lecture at ELLIS Winter School. Despite its modest title, "A Little Guide to Building Large Language Models in 2024," the presentation is anything but 'little'—offering a deep dive into the intricacies of the workflow…

Orsauso @Orsauso833443
30 Followers 1K Following
Sawteegr @SawteegrdPJ
76 Followers 2K Following
Haoyi Qiu @HaoyiQiu
973 Followers 823 Following Research intern @SFResearch ☁️ PhD student @UCLANLP 🧸 BS in CS&Math @UMich 〽️ #NLP #Multimodal #Safety 🌷
Kai-Wei Chang @kaiwei_chang
8K Followers 713 Following Associate Professor @UCLAengineering/@UCLA. Area: #NLProc/#ML/#AI https://t.co/zj1ssZj9ox
Fausto Pedro Garcia M... @faustospain
13K Followers 12K Following Full Professor at @ingenium_rg @uclm_es @Spain,SM at @IEEEorg #ArtificialIntelligence #AI #DataScience #Analytics #RenewableEnergy #Maintenance #Management #IoT
Roxanne @webber_roxanne7
402 Followers 3K Following
Henry Carter @henrycatersmith
5K Followers 6K Following Passionate about #tech, #datastorage, #AI, & #dataprotection services. Always looking to learn new about #Technology and invent something better for the world.
Po-Nien Kung @P_N_Kung
177 Followers 129 Following Ph.D. Student @ UCLA | Natural Language Processing & Machine Learning
Steffen Röcker @sroecker
2K Followers 6K Following OG local LLaMA shill. Sr. Solution Architect @RedHat, ex particle physicist. Born @ 347 ppm CO₂. Personal account, potentially unaligned.
John Carmack @ID_AA_Carmack
1.1M Followers 273 Following AGI at Keen Technologies, former CTO Oculus VR, Founder Id Software and Armadillo Aerospace
UNC AI @unc_ai_group
3K Followers 416 Following AI Group (NLP/CV/ML etc) at @UNCCS @UNC Faculty: @mohitban47+@gberta227+@snigdhac25+@shsriva+@tianlongchen4+@huaxiuyaoml+@dingmyu+@zhun_deng +@SenguptRoni et al
Mark Chen @markchen90
64K Followers 332 Following Chief Research Officer at @OpenAI. Coach for the USA IOI Team.
Thao Nguyen (Shibe) @thaoshibe
684 Followers 336 Following CS PhD @WisconsinCS 🥑 I'm trying to connect visual information... 🧠🔗👁️
Shiqi Yang @shiqi_yang_147
1K Followers 861 Following Team leader in SB Intuitions @sbintuitions, SoftBank, Tokyo. Ph.D. @CVC_UAB, Barcelona. Multi-modal/visual generation. Opinion own
Vivek Iyer @remorax98
491 Followers 389 Following AI/ML Intern at | PhD @EdinburghNLP 🏴 | Apple AI/ML Scholar 2025 | Interested in Multilingual & Multicultural NLP! 🌍
I-Hung Hsu @IHung_Hsu
459 Followers 328 Following Research Scientist @Google; CS PhD from @USC in NLP; Work on making machines to be reliable, intelligent , and user-friendly tools for all.
Jannik Kossen @janundnik
2K Followers 692 Following AI Research Scientist at FAIR (@meta) working on LLMs for CodeGen and Reasoning. PhD Student @OATML_Oxford and @oxcsml. Interned @DeepMind and @GoogleAI.
Yujin Tang @yujintang99
2K Followers 4K Following CS PhD Student @dartmouth, previousely intern @AlibabaGroup Damo Academy, BS @sjtu1896, Msc @cuhksz, Visiting @ucmerced | Computer Vision
Yu (Bryan) Zhou @yu_bryan_zhou
876 Followers 825 Following PhD @CS_UCLA, Intern @AIatMeta Segment Anything Team prev. @StanfordSVL
dilek hakkani-tur @dilekhakkanitur
598 Followers 257 Following Professor of Computer Science @UofIllinois, @convai_uiuc, @uiuc_nlp, @IllinoisCDS
Nooshin Mojab @nooshin_mojab
44 Followers 53 Following PhD student at University of Illinois-Chicago; Deep Learning, Computer Vision, Medical Imaging; @twitter ML researcher intern
Tiberiu Sosea @SoseaTiberiu
17 Followers 167 Following Ph.D. student at UIC studying NLP. Intern @google
Ashish Vaswani @ashVaswani
25K Followers 2K Following
Shreya Shankar @sh_reya
48K Followers 690 Following on the CS faculty job market | PhD @Berkeley_EECS, building https://t.co/PmuOqAYt6q | teaching https://t.co/CTWJ6z0JEg | formerly ML eng & undergrad @Stanford
Hieu Pham @hyhieu226
34K Followers 24 Following @openai | ex: @xai, @augmentcode, @GoogleBrain, @LTIatCMU, @Stanford, ACM ICPC, IMO🥈 Opinions are my own.
Francesco Pochetti @Fra_Pochetti
2K Followers 207 Following AWS ML Hero. MLE @boltapp. Failed Chemist. Blogging about my ML/DL journey. IceVision core-dev.
Cody Blakeney @code_star
5K Followers 1K Following Data Dawg @datologyai | Formerly Data Research Lead @DbrxMosaicAI | Visiting Researcher @ Facebook | Ph.D | #TXSTFOOTBALL fan | https://t.co/4G6Jf3at5w
Matt Shumer @mattshumer_
95K Followers 1K Following CEO @HyperWriteAI, @OthersideAI, creator of https://t.co/PSUlubx5bb (Github for prompts), investor in @GroqInc @Etched @Rork_App @OpenRouterAI + many more
Amita Kamath @kamath_amita
419 Followers 177 Following PhD student at @RAIVNLab and @UCLANLP Previously, Predoctoral Young Investigator at @Allen_AI and Stanford MSCS student at @StanfordNLP
Hritik Bansal @hbXNov
2K Followers 2K Following CS PhD @UCLA Intern @MetaAI FAIR | Prev: Bachelors @IITDelhi, Intern @GoogleDeepMind @AmazonScience | Multimodal ML, Language models | Cricket🏏
Xueqing Wu @xueqing_w
474 Followers 360 Following NLPer working on vision-language models | PhD student @CS_UCLA | MS @IllinoisCS
NLP nerd @NLP_nerd
161 Followers 305 Following Willing to create a twitter account on NLP | Lead ML Scientist at https://t.co/82XnGhBNJu | Host of https://t.co/Pu94mhyB9s
Umar Jamil @hkproj
15K Followers 1K Following AI @MistralAI - Join the best AI community on Discord: https://t.co/zYH1DlgdbW - Opinions my own
Rowan Cheung @rowancheung
565K Followers 513 Following Founder of the world’s most read daily AI newsletter @therundownai. Sharing the latest developments in the world of artificial intelligence.
Yuriy Yuzifovich @yvyuz
301 Followers 275 Following Transforming enterprises with Gen AI. CTO AI at @GlobalLogic / @Hitachi. Ex-Akamai / Alibaba Cloud.
Austin Huang @austinvhuang
6K Followers 2K Following R&D @answerdotai. Past: @GoogleDeepMind, Google Brain, MIT, Harvard, Berkeley.
Georgi Gerganov @ggerganov
52K Followers 289 Following 24th at the Electrica puzzle challenge | https://t.co/baTQS2bdia
Sidi (Steve) Lu @sidiluthedumbun
188 Followers 143 Following Research Scientist @ Tencent Seattle; former UCLA-Pluslaber; # or _just a dumb bunny._; PhD in CS/ML/NLP; NLG/Generative Models/ML/RL
Itamar Golan 🤓 @ItakGol
16K Followers 486 Following CEO & Co-founder @prompt_security ||| AI Researcher ||| LLM hacker
Yufei Tian @yufei_t
877 Followers 706 Following figuring out @openai | prev: PhD @UCLA | NLP, creativity, unconventional reasoning | undergrad @Tsinghua_Uni
Runway @runwayml
259K Followers 323 Following Building for the next era of art, entertainment and human creativity. We're hiring: https://t.co/Aj11xygZYI
ElevenLabs @elevenlabsio
139K Followers 11 Following Our mission is to make content universally accessible in any language and voice.
Arthur Soroken @asoroken
1K Followers 157 Following was at Songza now @Google via acquisition. focused on AI @Google. Co-founder of the AI Futures Fund @Google - https://t.co/BM349dLqcM
Prakash (Ate-a-Pi) @8teAPi
53K Followers 4K Following FOLLOW ME for AI commentary; tech optimist, future shocked self-aware neuron, once fooled by superconductors;
Zhou Yu @Zhou_Yu_AI
12K Followers 1K Following Founder of https://t.co/9KM4uFScMi, Associate Professor at Columbia. Making ai agent design and deployment easy and fast! Forbes 30 under 30.