Ricardo Dominguez-Olmedo @rdolmedo_

PhD student at the Max Planck Institute for Intelligent Systems, working with Moritz Hardt and Bernhard Schölkopf. ricardodominguez.github.io Tübingen, Germany Joined January 2014

Tweets

98
Followers

472
Following

301
Likes

209

Ricardo Dominguez-Olmedo @rdolmedo_

4 months ago

My PhD advisor, Moritz Hardt, has just released the first half of his new book, The Emerging Science of Machine Learning Benchmarks. It’s freely available and highly recommended: mlbenchmarks.org

0 4 17 1K 2

Prasanna Mayilvahanan @prasannamayil

7 months ago

New preprint out! 🎉🎉 How does LLM training loss translate to downstream performance? We show that pretraining data and tokenizer shape loss-to-loss scaling laws, while architecture and other factors play a surprisingly minor role! brendel-group.github.io/llm-line/ 🧵1/8

2 28 133 16K 82

Download Image

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

“Aha moments” can be observed at step 0, so we should not fixate on reporting individual instances. Instead, we should seek reliable measures of internal reasoning that can be tracked throughout training. So far, response length appears to be one such (imperfect) measure.

1 1 2 199 0

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

gist.github.com/RicardoDomingu…

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

gist.github.com/RicardoDomingu…

1 3 11 1K 5

Download Image

0 0 3 323 1

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

gist.github.com/RicardoDomingu…

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

gist.github.com/RicardoDomingu…

22 33 391 186K 213

Download Image

1 4 41 7K 53

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

One important caveat is that I cannot get the response length to dramatically increase as in the R1 paper.

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

One important caveat is that I cannot get the response length to dramatically increase as in the R1 paper. https://t.co/3VNNkpNEAR

4 8 125 10K 59

Download Image

2 0 5 396 1

Download Image

Ricardo Dominguez-Olmedo @rdolmedo_

7 months ago

R1-style GRPO on Llama 3.2 1B Instruct yields +10 accuracy points on GSM8K. It just works! The train data is GSM8K train. Interestingly, supervised fine-tuning yields no performance improvements, since the dataset is tiny compared to all the math reasoning data seen by Llama 3.

4 8 125 10K 59

Download Image

Julius Adebayo @juliusadml

8 months ago

Really cool paper questioning all the 'incredible' progress we've seen recently: "after fine-tuning all models on the same amount of task data, performance per pre-training compute equalizes and newer models are no better than earlier models."