Jongsu Liam Kim @sky0bserver

The CFD enthusiast has become a ML researcher. Senior Researcher at LG CNS AI Lab. Opinions are solely my own and do not express the opinions of my employer. liam.kim Seoul Joined November 2010

Tweets

1K
Followers

134
Following

681
Likes

4K

Pramod Goyal @goyal__pramod

a day ago

These blogs are so awesome, I feel like I should stop writing because I am not good enough.

Pramod Goyal @goyal__pramod

a day ago

These blogs are so awesome, I feel like I should stop writing because I am not good enough. https://t.co/U1yzUmoOSN

3 10 169 56K 150

Download Image

7 56 679 52K 712

Download Image

Pramod Goyal @goyal__pramod

2 days ago

I do not think a better "short note" exists on the topic. This was extremely to the point and knowledge dense. Love this style.

2 18 177 45K 208

Download Image

Joseph Suarez 🐡 @jsuarez5341

4 days ago

x.com/i/article/1963…

8 68 663 116K 938

Edward Z. Yang @ezyang

a week ago

Link: blog.ezyang.com/2025/08/the-pa…

2 8 41 4K 26

Edward Z. Yang @ezyang

a week ago

It's really interesting comparing the Ultra-Scale playbook (huggingface.co/spaces/nanotro…) and How To Scale Your Model, aka the JAX book (jax-ml.github.io/scaling-book/) side-by-side. 🧵

4 42 414 27K 512

Dimitris Papailiopoulos @DimitrisPapail

a week ago

Pretraining with cross entropy = learning the best compressor for all texts ever written Why? Minimize CE = Min. KL between two distributions, eg HUMANS and LLM = the expected cost in bits when compressing samples drawn from HUMANS, while using LLM as your best approximation

7 17 253 28K 190

Wenhao Yu @wyu_nd

a week ago

Some insights from training self-evolving LLM (R-Zero): 1. Larger size models -> stronger self-evolution ability 2. Challenger & Solver should NOT share parameters 3. Pseudo-label quality degrades over time (a key drawback to tackle) In Sec 5.4 and 5.5: arxiv.org/abs/2508.05004

5 22 149 10K 101

Download Image

Jongsu Liam Kim @sky0bserver

a week ago

“they built an image editing model. It could follow simple instructions well. When the ability to follow simple instructions is used and connected as needed for a task (CoT), visual reasoning problems will start to be solved.”

Rosinality @rosinality

a week ago

0 9 24 3K 13

0 0 0 82 0

Rosinality @rosinality

a week ago

This is my summary of a podcast interviewing Xiangyu Zhang (youtube.com/watch?v=vWrYHv…). I found this very insightful. As I have used whisper to transcribe and gemini to translate this, it could contain errors. Though, considering the overall flow of the content, I think it would be…

0 9 24 3K 13

Jongsu Liam Kim @sky0bserver

2 weeks ago

“In three words: RL finally works. More precisely: RL finally generalizes.”

ℏεsam @Hesamation

2 weeks ago

“In three words: RL finally works. More precisely: RL finally generalizes.”

0 3 38 4K 44

0 0 1 168 2

rank decomposition @rankdim

2 weeks ago

1. why RL didn't work before? 2. why it works now?

92 66 1K 292K 806

Alexander Doria @Dorialexander

2 weeks ago

Well sorry to insist, but the catch is that the market is still small — that's less than chewing gum industry. Too small for a bubble.

12 4 62 4K 16

Download Image

Kyle Vedder @KyleVedder

2 weeks ago

One of Pi0's novel architecture bits is the use of a Flow Matching action head -- previous to this, modern VLAs like OpenVLA leveraged diffusion diffusion heads What is a flow matching head? What makes it easier to use versus other denoising heads? A short thread!🧵 (1/7)

8 40 478 59K 545

Download Image

Glen Berseth @GlenBerseth

2 weeks ago

VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.

3 25 208 12K 145

Download Video

Ahmad Beirami @abeirami

3 weeks ago

Most “robustness” work (adversarial, shift, etc.) is just training on reweighted samples (augmented, model-generated, or mined). OOD generalization then comes from: (1) inductive bias (2) similarity to train data (3) luck The 3rd one is the most important of the three.

3 7 109 12K 56

Jacob Austin @jacobaustin132

3 weeks ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

38 532 3K 377K 5K

Download Image

Edward Z. Yang @ezyang

4 weeks ago

State of torch.compile, August 2025.

13 72 768 110K 564

Download Image

Ahmad Beirami @abeirami

4 weeks ago

The best AI researchers zoom at three abstraction levels: - High: paper-level ideas & math - Mid: code-level implementation - Low: GPU/TPU reality (kernels/memory) Low exposes bottlenecks. High accelerates exploration. Mid makes it real. The job is to translate between them!