The Ultimate Guide to Fine-Tuning LLMs.
Here’s a 7-step roadmap that makes it simple.
This technical report takes a deep look at the fine-tuning process for LLMs.
Combining both theory and practice.
1. Introduction
2. Seven Stage Fine-Tuning Pipeline
↳ Stage-1: Data…
btw it's possible to use mantissa-free weights.
The Chief Scientist of NVIDIA had a keynote with content about it.
nvidia.com/en-us/on-deman… https://t.co/5wIfpJ6Vk8
RL Is Neither a Panacea Nor a Mirage: Understanding Supervised vs. Reinforcement Learning Fine-Tuning for LLMs
"RL primarily counteracts SFT-induced directional drift rather than finding new solutions. Our spectrum-aware analysis highlights inexpensive recovery knobs low-rank…
TogetherAI's Chief Scientist @tri_dao announced Flash Attention v4 at HotChips Conference which is up to 22% faster than the attention kernel implementation from NVIDIA's cuDNN library. Tri Dao was able to achieve this 2 key algorithmic changes. Firstly, it uses a new online…
I've written the full story of Attention Sinks — a technical deep-dive into how the mechanism was developed and how our research ended up being used in OpenAI's new OSS models.
For those interested in the details:
hanlab.mit.edu/blog/streaming…
just published a curated list of amazing blogs/articles on ai. highly selective.
feel free to comment if you find anything interesting, will update it async.
(link in replies)
In this paper is presented a detailed experimental analysis of NVIDIA’s Blackwell architecture through microbenchmarks with a comparison to the previous Hopper generation GPUs.
arxiv.org/pdf/2507.10789
This is a solid 29 videos playlist on how to build DeepSeek from scratch. It covers theory and code, from the very foundations to advanced.
Self attention, multi-head [latent] attention, GQA, how DeepSeek rewrote Quantization, etc.
One video a day and you’ll finish in a month.
i wrote a short post on this. i really shouldn't have, but I learned a lot.
i wanted to cover more like "optimal swizzling", but i ran out of time. https://t.co/rHFKJc6WJX
489 Followers 2K FollowingCompiling My Systems Journey with Compiler Team @IBM! ex @AMD
Thinking about Innovating Music Visualization!
ART, TECHNOLOGY and curated THOUGHTs💭 excite me :)
886 Followers 24 FollowingThe ELLIS Institute is set to become a world-renowned center for pioneering fundamental research in the field of artificial intelligence. The Institute aims to
14K Followers 179 FollowingInterested in programming, electronics, mechanics, and hand drawing.
blog: https://t.co/O70YitimtH
backup: https://t.co/Fu5pufJ3fw
112K Followers 4K FollowingTerrorism expert since 1990. Vice Chair of @NordicSafeCity. Columnist @SvDledare Member Swedish Royal Academy of War Sciences All views expressed are personal
3K Followers 90 Followingcreator of @electronjs, check https://t.co/ZDJujd4Nql for the open source things I built.
currently sponsored to write a CUDA backend for MLX.