The CFD enthusiast has become a ML researcher. Senior Researcher at LG CNS AI Lab. Opinions are solely my own and do not express the opinions of my employer.liam.kim SeoulJoined November 2010
Pretraining with cross entropy = learning the best compressor for all texts ever written
Why?
Minimize CE
= Min. KL between two distributions, eg HUMANS and LLM
= the expected cost in bits when compressing samples drawn from HUMANS, while using LLM as your best approximation
Some insights from training self-evolving LLM (R-Zero):
1. Larger size models -> stronger self-evolution ability
2. Challenger & Solver should NOT share parameters
3. Pseudo-label quality degrades over time (a key drawback to tackle)
In Sec 5.4 and 5.5: arxiv.org/abs/2508.05004
“they built an image editing model. It could follow simple instructions well. When the ability to follow simple instructions is used and connected as needed for a task (CoT), visual reasoning problems will start to be solved.”
“they built an image editing model. It could follow simple instructions well. When the ability to follow simple instructions is used and connected as needed for a task (CoT), visual reasoning problems will start to be solved.”
This is my summary of a podcast interviewing Xiangyu Zhang (youtube.com/watch?v=vWrYHv…). I found this very insightful. As I have used whisper to transcribe and gemini to translate this, it could contain errors. Though, considering the overall flow of the content, I think it would be…
One of Pi0's novel architecture bits is the use of a Flow Matching action head -- previous to this, modern VLAs like OpenVLA leveraged diffusion diffusion heads
What is a flow matching head? What makes it easier to use versus other denoising heads?
A short thread!🧵 (1/7)
VLAs offer an avenue for generalist robot policies; however, naively following the action predictions leads to brittle or unsafe behaviours. We introduce VLAPS, which integrates model-based search with pre-trained VLA policies to improve performance without additional training.
Most “robustness” work (adversarial, shift, etc.) is just training on reweighted samples (augmented, model-generated, or mined).
OOD generalization then comes from:
(1) inductive bias
(2) similarity to train data
(3) luck
The 3rd one is the most important of the three.
Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n
The best AI researchers zoom at three abstraction levels:
- High: paper-level ideas & math
- Mid: code-level implementation
- Low: GPU/TPU reality (kernels/memory)
Low exposes bottlenecks. High accelerates exploration. Mid makes it real.
The job is to translate between them!
5K Followers 8K Followinggeek, entrepreneur, 'I strictly color outside the lines!', opinions r my own indeed. @ayirpelle , universal handle at this time
1K Followers 6K FollowingSomewhere between machines and people. Less is exponentially more. Deciding what not to do is as important as deciding what to do. 靑天亂流.
2K Followers 679 Followingenjoying the late pre-agi; making llms go brrr @Aleph__Alpha; yapping about economics of AI systems at https://t.co/tbsybxOMHz
3K Followers 341 FollowingI’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
16K Followers 454 Following"The Kafka Guy" 🧠
Have worked on Apache Kafka for 6+ years, now I write about it. (& the general data space)
Low-frequency, highly-technical tweets. ✌️
11K Followers 6K Followinghiring agentic humans @hud_evals / https://t.co/OZbFIovysh | owned @AIHubCentral (1 million users, acq.) climate protester. don't do the deferred life plan
57K Followers 568 FollowingAssistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Joining @NYU_Courant September 2026. Co-EiC @TmlrOrg. I lead @TheSalonML.
17K Followers 20 FollowingA high-throughput and memory-efficient inference and serving engine for LLMs. Join https://t.co/lxJ0SfX5pJ to discuss together with the community!
13K Followers 687 FollowingResearch @Meta Superintelligence Labs, RL/post-training/agents; Previously Research @OpenAI on multimodal and RL; Opinions are my own.
5K Followers 534 FollowingSomewhere between AI and UI. Always engineering. @tailwindcss (prev @meta @sourcegraph @nutrientdocs). I sometimes post at https://t.co/IsbzJO1luY
5K Followers 668 FollowingIncoming Assistant Prof, Toyota Technical Institute at Chicago @TTIC_Connect
Recruiting PhD students (start 2026) 👀
Will irl - TC0 enthusiast
618 Followers 205 Following🎓phd in Tsinghua University. Focus on RL, Embodied AI, and MLLM. 📖Author of limit-of-RLVR,phyworld,DeeR-VLA. 💼Seek a visit currently.