Arindam @halg0rithmist

22 Aspiring theoretician Interested in decision making algorithms India Joined January 2025

Tweets

46
Followers

33
Following

1K
Likes

751

Arindam @halg0rithmist

3 weeks ago

KFC - Kolmogorov Fan Club

0 0 0 35 0

Arindam @halg0rithmist

a month ago

donk 3k to close the map

0 0 0 123 0

François Fleuret @francoisfleuret

2 months ago

I wrote this last year, and it doesn't hurt to share it again. A phone-formatted ELBO TL;DR.

14 34 382 27K 357

Download Image

Arindam @halg0rithmist

2 months ago

Munchkin you beauty

0 0 0 128 0

Zoubin Ghahramani @ZoubinGhahrama1

2 months ago

This is the syllabus of the course @geoffreyhinton and I taught in 1998 at the Gatsby Unit (just after it was founded). Notice anything?

66 138 2K 232K 866

Download Image

Mathieu @miniapeur

2 months ago

16 184 2K 60K 312

Download Image

Anthony Bonato @Anthony_Bonato

2 months ago

Asking a non-expert in AI but as a (discrete) mathematician: does anyone really know how deep learning works mathematically?

112 31 495 41K 223

LLMs can be programmed by backprop 🔎 In our new preprint, we show they can act as fuzzy program interpreters and databases. After being ‘programmed’ with next-token prediction, they can retrieve, evaluate, and even *compose* programs at test time, without seeing I/O examples.

4 55 316 34K 253

Download Image

Percy Liang @percyliang

3 months ago

Wrapped up Stanford CS336 (Language Models from Scratch), taught with an amazing team @tatsu_hashimoto @marcelroed @neilbband @rckpudi. Researchers are becoming detached from the technical details of how LMs work. In CS336, we try to fix that by having students build everything:

45 586 5K 659K 7K

Anthropic @AnthropicAI

3 months ago

Our interpretability team recently released research that traced the thoughts of a large language model. Now we’re open-sourcing the method. Researchers can generate “attribution graphs” like those in our study, and explore them interactively.

117 580 5K 782K 2K

Gokul Swamy @g_k_swamy

3 months ago

It was a dream come true to teach the course I wish existed at the start of my PhD. We built up the algorithmic foundations of modern-day RL, imitation learning, and RLHF, going deeper than the usual "grab bag of tricks". All 25 lectures + 150 pages of notes are now public! 🧵

8 92 708 53K 932

Download Image

Aaron Defazio @aaron_defazio

a year ago

L1 regularization for sparse solutions - as usually taught - is actually terrible in practice! I’m always surprised how few people know this. To get good results, retrain with the sparsity pattern found from the initial L1 run, but without the regularizer. Works much better.

Tiago Peixoto @tiagopeixoto

a year ago

6 13 120 101K 127

24 49 502 216K 756

Andrej Karpathy @karpathy

4 months ago

We're missing (at least one) major paradigm for LLM learning. Not sure what to call it, possibly it has a name - system prompt learning? Pretraining is for knowledge. Finetuning (SL/RL) is for habitual behavior. Both of these involve a change in parameters but a lot of human…

719 1K 10K 1.4M 7K

Sean Welleck @wellecks

5 months ago

And to finish off, Lectures 21 - 23: - AI for Mathematics: youtu.be/ToY57HgQKXA - Multimodal I (CLIP / Llava): youtu.be/5uI5WOpq8LQ - Multimodal II (VQVAE / Chameleon): youtu.be/VismiXpCs_Y

Sean Welleck @wellecks

8 months ago

And to finish off, Lectures 21 - 23: - AI for Mathematics: youtu.be/ToY57HgQKXA - Multimodal I (CLIP / Llava): youtu.be/5uI5WOpq8LQ - Multimodal II (VQVAE / Chameleon): youtu.be/VismiXpCs_Y

13 183 1K 757K 1K

1 49 326 28K 258

Arindam @halg0rithmist

5 months ago

first time someone talking about 2045 AGI timeline

Dwarkesh Patel @dwarkesh_sp

5 months ago

first time someone talking about 2045 AGI timeline

23 48 463 160K 306

Download Video

0 0 0 65 0

Goodfire @GoodfireAI

5 months ago

Today, we're announcing our $50M Series A and sharing a preview of Ember - a universal neural programming platform that gives direct, programmable access to any AI model's internal thoughts.

43 114 1K 339K 583

Download Video

Nathan Lambert @natolambert

6 months ago

This paper also recommended for understanding GRPO. TLDR is that the output-length normalization in DeepSeek GRPO is making models not penalize repetitive behaviors while rewarding shorter correct responses. Same intuition as the last RL paper I posted. Writeup soon.

Zichen Liu @zzlccc

6 months ago

26 185 1K 295K 1K

Download Image

4 29 406 37K 392

Daniel Han @danielhanchen

6 months ago

@natolambert I know you all uploaded GGUFs, but also just uploaded other GGUF formats + dynamic 4bit bitsandbytes and general 4bit BnB versions! Dynamic 4bit BnB: huggingface.co/unsloth/OLMo-2… 4bit BnB: huggingface.co/unsloth/OLMo-2… GGUFs: huggingface.co/unsloth/OLMo-2… Fantastic true open source model!

2 2 23 2K 4

Daniel Han @danielhanchen

6 months ago

My Gemma-3 analysis: 1. 1B text only, 4, 12, 27B Vision + text. 14T tokens 2. 128K context length further trained from 32K 3. Removed attn softcapping. Replaced with QK norm 4. 5 sliding + 1 global attn 5. 1024 sliding window attention 6. RL - BOND, WARM, WARP Detailed analysis:…