Nguyễn Đức Ánh @duc_anh2k2

Joined May 2024

Tweets

42
Followers

1
Following

84
Likes

31

Sebastian Raschka @rasbt

3 weeks ago

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…

Sebastian Raschka @rasbt

3 weeks ago

Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro… https://t.co/9vF9U29pWh

26 200 1K 465K 789

Download Image

61 545 4K 341K 4K

Download Image

Rohan Paul @rohanpaul_ai

3 weeks ago

🎯 Andrej Karpathy on how to learn.

93 558 5K 405K 4K

Download Image

Graham Neubig @gneubig

a month ago

Summary of GPT-OSS architectural innovations: 1. sliding window attention (ref: arxiv.org/abs/1901.02860) 2. mixture of experts (ref: arxiv.org/abs/2101.03961) 3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071) 4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)

11 359 2K 116K 2K

what are large language models actually doing? i read the 2025 textbook "Foundations of Large Language Models" by tong xiao and jingbo zhu and for the first time, i truly understood how they work. here’s everything you need to know about llms in 3 minutes↓

77 939 7K 1.1M 18K

Download Image

Lorenzo Xiao @lrzneedresearch

2 months ago

As promised, my SOP draft is here: algoroxyolo.github.io/assets/pdf/lrz… Please lmk if you have any suggestions or you have any recommendations where you think I should apply or what I should do in my future research. As always RT appreciated!! #PhDApplication #NLP #HCI

1 4 23 3K 15

Ernest Ryu @ErnestRyu

2 months ago

New lecture recordings on RL+LLM! 📺 This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)

11 159 1K 133K 2K

Ricardo Buitrago @rbuit_

2 months ago

Despite theoretically handling long contexts, existing recurrent models still fall short: they may fail to generalize past the training length. We show a simple and general fix which enables length generalization in up to 256k sequences, with no need to change the architectures!

5 34 195 40K 118

Download Image

Zhengfu He @ZhengfuHe

4 months ago

Are attention heads the right units to mechanistically understand Transformers' attention behavior? Probably not due the attention superposition! We extracted interpretable attention units in LMs and found finer grained versions of many known and novel attention behaviors. 🧵1/N

5 80 521 41K 453

Download Image

Probability and Statistics @probnstat

5 months ago

Statistical Learning Theory by Percy Liang web.stanford.edu/class/cs229t/n…

0 85 595 28K 529

Download Image

Zhuang Liu @liuzhuang1234

6 months ago

New paper - Transformers, but without normalization layers (1/n)

76 599 4K 1.3M 2K

Download Image

אגי-e/acc @murage_kibicho

8 months ago

I've been reading this book alongside Deepseek. The math is mathing. The code is coding. The Deepseek is deepseeking! @deepseek_ai you made god!

10 27 424 25K 308

Download Image

Math Cafe @Riazi_Cafe_en

9 months ago

Stanford “Statistics and Information Theory” lecture notes PDF: web.stanford.edu/class/stats311…

Math Cafe @Riazi_Cafe_en

10 months ago

Stanford “Statistics and Information Theory” lecture notes PDF: web.stanford.edu/class/stats311… https://t.co/bJ097Zg52K

2 103 719 128K 539

Download Image

1 143 918 85K 1K

Download Image

Harrison Ritz @harrison_ritz

11 months ago

Excited to share a new project! 🎉🎉 doi.org/10.1101/2024.0… How do we navigate between brain states when we switch tasks? Are dynamics driven by control, or passive decay of the prev task? To answer, we compare high-dim linear dynamical systems fit to EEG and RNNs🌀 ⏬

7 108 520 42K 357

Download Gif

Abhinav Shukla @Abhinav95_

11 months ago

Announcing MatMamba - an elastic Mamba2🐍architecture with🪆Matryoshka-style training and adaptive inference. Train a single elastic model, get 100s of nested submodels for free! Paper: sca.fo/mmpaper Code: sca.fo/mmcode 🧵(1/10)

2 55 231 35K 128

Download Image

Rohan Paul @rohanpaul_ai

12 months ago

A cool Github repo collecting LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

16 396 3K 198K 3K

Download Image

Khushi Agrawal @khushi__411

12 months ago

Excited to share a blog series I've been working on, diving deep into CUDA programming! Inspired by the #PMPP book & #CUDA_MODE!! Check out the links below...

9 64 393 39K 481

Download Image

Tom Yeh @ProfTomYeh

a year ago

[VAE] by Hand ✍️ A Variational Auto Encoder (VAE) learns the structure (mean and variance) of hidden features and generates new data from the learned structure. In contrast, GANs only learn to generate new data to fool a discriminator; they may not necessarily know the…

10 174 950 48K 603

Download Video

AK @_akhaliq

a year ago

EVLM An Efficient Vision-Language Model for Visual Understanding In the field of multi-modal language models, the majority of methods are built on an architecture similar to LLaVA. These models use a single-layer ViT feature as a visual prompt, directly feeding it into the

3 39 156 15K 64

Download Image

Shubhendu Trivedi @_onionesque

a year ago

Just got around to trying ColPali arxiv.org/abs/2407.01449 but for more general extraction tasks than poorly formatted/scanned documents with complicated SEC tables*. Pretty impressive! VLMs for efficient indexing and late interaction matching gives a sizeable boost.