Couldn't resist.
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…
Couldn't resist.
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro… https://t.co/9vF9U29pWh
what are large language models actually doing?
i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work.
here’s everything you need to know about llms in 3 minutes↓
As promised, my SOP draft is here:
algoroxyolo.github.io/assets/pdf/lrz…
Please lmk if you have any suggestions or you have any recommendations where you think I should apply or what I should do in my future research.
As always RT appreciated!!
#PhDApplication#NLP#HCI
New lecture recordings on RL+LLM! 📺
This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)
Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!
Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition!
We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors.
🧵1/N
Excited to share a new project! 🎉🎉
doi.org/10.1101/2024.0…
How do we navigate between brain states when we switch tasks? Are dynamics driven by control, or passive decay of the prev task?
To answer, we compare high-dim linear dynamical systems fit to EEG and RNNs🌀
⏬
Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference.
Train a single elastic model, get 100s of nested submodels for free!
Paper: sca.fo/mmpaper
Code: sca.fo/mmcode
🧵(1/10)
Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!!
Check out the links below...
[VAE] by Hand ✍️
A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure.
In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the…
EVLM
An Efficient Vision-Language Model for Visual Understanding
In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the
Just got around to trying ColPali arxiv.org/abs/2407.01449 but for more general extraction tasks than poorly formatted/scanned documents with complicated SEC tables*. Pretty impressive! VLMs for efficient indexing and late interaction matching gives a sizeable boost.
1K Followers 2K FollowingResearch Scientist at @SalesforceAI | Ph.D. from @UCLA | B.S. from @Tsinghua_Uni | Foundation Model, Theory, Reinforcement Learning | Opinions are my own
196 Followers 252 FollowingAssistant Professor @HDSIUCSD. Previously Research Assistant Professor @TTIC_Connect and PhD in Statistics & Data Science @Yale.
2K Followers 151 FollowingDeveloping algorithms for real-time reinforcement learning on robots. Research Scientist at Keen, a startup led by John Carmack.
Prev ~ PhD with Richard Sutton
8K Followers 1K FollowingDecision-making under uncertainty, machine learning, artificial intelligence, from theory to practice · anti-ideological · Assistant Research Professor @Cornell
83K Followers 8K FollowingCompiling in real-time, the race towards AGI.
🗞️ Don't miss my daily top 1% AI analysis newsletter directly to your inbox 👉 https://t.co/6LBxO8215l
1K Followers 946 FollowingPhD @UCBerkeley, Incoming Assistant Professor @UTCompSci, Senior Researcher @togethercompute. Working on building cooler things with fewer dollars 😊
37K Followers 565 FollowingAssistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ;
Working on ML, DL, RL, LLMs, and their theory.
10K Followers 4K Followingsth new // ex Gemini RL+Inference @GoogleDeepMind // Chat AI @Meta // RL Agents @EA // ML+Information Theory @MIT+@Harvard+@GeorgiaTech // زن زندگی آزادی
30K Followers 93 FollowingFounded in 1979, AAAI is an international, nonprofit, scientific society devoted to promote research in, and responsible use of Artificial Intelligence.
1K Followers 834 FollowingCurrently interning with Llama @Meta, PhD Candidate @UMDCS. Past @AmazonScience, @IITKgp.
A brick in the creation of Artificial General Intelligence.
4.5M Followers 460 FollowingCutting-edge research, news, commentary, and visuals from the Science family of journals. Follow @NewsfromScience for stories from our News team.
325K Followers 3K FollowingNVIDIA Director of Robotics & Distinguished Scientist. Co-Lead of GEAR lab. Solving Physical AGI, one motor at a time. Stanford Ph.D. OpenAI's 1st intern.
949K Followers 764 FollowingProfessor at NYU. Chief AI Scientist at Meta.
Researcher in AI, Machine Learning, Robotics, etc.
ACM Turing Award Laureate.