Yu Zhang 🐈🐙 @yzhang_cs
@Kimi_Moonshot; PhD Student @ Soochow University; working on efficient methods for LLMs; disciple of parallel programming; INTP yzhang.site Joined February 2023-
Tweets650
-
Followers418
-
Following656
-
Likes4K
Announcing GLM Coding Plan for Claude Code! After seeing the amazing adoption of GLM-4.5 over the past month, we're making it more accessible. Get started: z.ai/subscribe Integration guide: docs.z.ai/scenario-examp… What's new: 1/7th the price of original Claude Code…
Why is Adam's Update RMS 0.2? kexue.fm/archives/11267 TLDR: Adam_Update_RMS ≈ sqrt((1 - beta1) / (1 + beta1))
杨植麟好久没露面了,上次采访他还是1年半以前。这次,我们聊了K2的研发和他的最新技术认知,以及,在过去一年的舆论风暴与创业起伏中,他的心情与思考。 杨植麟说,他反复阅读了英国物理学家David Deutsch的书《The Beginning of…
Kimi's founder, Zhilin Yang's interview is out. Again, you can let Kimi translate for you: ) lots of insights there. mp.weixin.qq.com/s/uqUGwJLO30mR… Several takes: 1/ Base Model Focus: K2 aims to be a solid base model. We've found that high-quality data growth is slow, and multi-modal…
Happy #InternationalDogDay!
TogetherAI's Chief Scientist @tri_dao announced Flash Attention v4 at HotChips Conference which is up to 22% faster than the attention kernel implementation from NVIDIA's cuDNN library. Tri Dao was able to achieve this 2 key algorithmic changes. Firstly, it uses a new online…
Many thanks to the Xiaomi MiMo team for contributing the GDN examples :) github.com/tile-ai/tilela…
Check out this TileLang implementation of Gated DeltaNet—outperforming FlashLinearAttention’s GDN Triton version 😍 TileLang balances flexibility, performance, and coding ease: easier to write than CuTe DSL and faster than Triton.
Check out this TileLang implementation of Gated DeltaNet—outperforming FlashLinearAttention’s GDN Triton version 😍 TileLang balances flexibility, performance, and coding ease: easier to write than CuTe DSL and faster than Triton.
Developing new LLM architectures is both costly and risky. Our latest project — hanlab.mit.edu/projects/jet-n… — offers an effective strategy to address this challenge. Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art…
Wow, pretty cool that they also open sourced a FSDP2 compatible Muon and PolyNorm working with @huggingface kernels!
Wow, pretty cool that they also open sourced a FSDP2 compatible Muon and PolyNorm working with @huggingface kernels! https://t.co/Gqw7Hpj1v3
Introducing DeepSeek-V3.1: our first step toward the agent era! 🚀 🧠 Hybrid inference: Think & Non-Think — one model, two modes ⚡️ Faster thinking: DeepSeek-V3.1-Think reaches answers in less time vs. DeepSeek-R1-0528 🛠️ Stronger agent skills: Post-training boosts tool use and…
ModRWKV is accepted by EMNLP’25 as Main Conference Paper! Code and model weights are fully opensourced for the community! 🚀Go Linear MLLM🚀 #EMNLP25 #EMNLP
We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: huggingface.co/collections/nv…
Tilelang now supports SM120 — give it a try if you have RTX 5090 🚀😎
🎉 Excited to share: We’ve open-sourced Triton-distributed MegaKernel! A fresh, powerful take on MegaKernel for LLMs—built entirely on our Triton-distributed framework. github.com/ByteDance-Seed… Why it’s awesome? 🧩 Super programmable ⚡ Blazing performance 📊 Rock-solid precision
Let's talk about the GLM 4.5 models. The latest frontier open weights model out of China (and possibly the best at the moment?) with quite a bit of details in the paper.
SGLang is hosting its first-ever China workshop at GOSIM HANGZHOU 2025! Join us this September in Hangzhou!🤗
SGLang is hosting its first-ever China workshop at GOSIM HANGZHOU 2025! Join us this September in Hangzhou!🤗
a common belief is that Transformers scale well because of less inductive bias, when it actually does have specific inductive biases. we developed H-Nets not to fix tokenization, but because I think that dynamic chunking represents a fundamental primitive that captures a bias…
a common belief is that Transformers scale well because of less inductive bias, when it actually does have specific inductive biases. we developed H-Nets not to fix tokenization, but because I think that dynamic chunking represents a fundamental primitive that captures a bias…
The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work (github.com/mit-han-lab/st…). Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁
The release of GPT-OSS-120B & GPT-OSS-20B models today incorporates my Attention Sink work (github.com/mit-han-lab/st…). Exciting to see this come to life! 🎉 Looking forward to more progress in this space. 😁 https://t.co/sPTcDDfGF9
Tiresome whorish winking, I hate his gimmick so much We get it sam. GPT-5. Cool. Just give a date and time.
Tiresome whorish winking, I hate his gimmick so much We get it sam. GPT-5. Cool. Just give a date and time.

Blxejay87 @blxejay87
3 Followers 374 Following
Unkonventionell @Awflorfu330661
30 Followers 2K Following 15-30% Monthly | 2 High-Conviction Stocks.Short-Term Gains: 15-20% in Days/Weeks.DM "JOIN" for WhatsApp Alerts. Live Trade Signals • Market Analysis
Fanqing Meng @FanqingMengAI
115 Followers 487 Following Intern at Moonshot AI https://t.co/LDxlIjhSih
Feca @Feca8578
37 Followers 2K Following
Eric Tchirnhausen @tchirnhaus20039
25 Followers 5K Following Like to try new things you never know; trying to prove all software can be automated 😅 😅 😅 | ML/AI, | C++/Java/Go | GitHub : Dyl777
Saber Darabi @SADarabi
303 Followers 7K Following
Yao Fu @Francis_YAO_
20K Followers 2K Following Research Scientist at @GoogleDeepMind I study complex, multimodal, interactive reasoning. Opinions are my own
远山青 @yunshnqng152597
0 Followers 31 Following
Zixin Wen @Zixin_Wen
474 Followers 647 Following PhD student @mldcmu, working on the theory of deep learning.
Kkksnk @Kkksnk13
6 Followers 140 Following
Yiqi Wang @YiqiWang119050
0 Followers 137 Following
Jafar Isbarov @phylo_GENETIC
117 Followers 496 Following CS PhD student @VT_CS. Previously a visiting researcher @NYU_Courant. I work on Structured Generation, LLM Security, and Multilingual LLMs. Sapere aude.
TICK_Trader🇺🇸 @eaplenem51123
35 Followers 2K Following 15-30% Monthly | 2 High-Conviction Stocks.Short-Term Gains: 15-20% in Days/Weeks.DM "JOIN" for WhatsApp Alerts. Live Trade Signals • Market Analysis
saki @ichitandaqqq
17 Followers 330 Following
Jiahao Shao @jiahaoshao1
114 Followers 972 Following B.Eng. @ZJU_China | Generative modeling, Embodied agents
Rudi Alaja @Racotour
40 Followers 1K Following
Saibo-Creator @SaiboGeng
208 Followers 252 Following PhD @EPFL | Reliable Efficient LLM Inference | ex-intern @MSFT
Paul Friend @PaulFriend77043
0 Followers 30 Following
Jingwei Zuo @JingweiZuo
62 Followers 82 Following Lead Researcher @tiiuae, Falcon LLM team https://t.co/JGy0M8d5Mx
bob @pretty14130158
2 Followers 268 Following
Sean X. Han @hugoohann
30 Followers 3K Following
Letian Ruan @s1mplore
4 Followers 62 Following UG @UMichCSE and @SJTU1896 | Efficient and scalable systems for GenAI.
Junli Wang @JunliWang2021
69 Followers 210 Following Undergraduate @Tsinghua_Uni | Prev visiting student @XLangNLP, advised by @taoyds | Intern @Alibaba_Qwen
axypetalum @axypetalum
21 Followers 133 Following from the us of a! interested in programming, tech, and education
Feng Yao @fengyao1909
1K Followers 634 Following Ph.D. student @UCSD_CSE | Intern @Amazon Rufus Foundation Model Ex. @MSFTResearch @TsinghuaNLP
Ben @SolidlySheafy
273 Followers 341 Following Understanding intelligence @tilderesearch // prev math @Penn and @Cambridge_Uni
Dawei Zhu @dwzhu128
397 Followers 234 Following 3rd year PhD Student @PKU1898 | Prev. intern @MSFTResearch (MSRA) | Current student researcher @googlecloud | Focusing on Long Context Modeling & Multimodality
vllbc02 @vllbc2002
8 Followers 111 Following Master student @Soochow University (Su zhou). Interested in agent planning and world model in NLP.
yu @yu88627931
50 Followers 3K Following
Wentao Li @lwtwl23
107 Followers 2K Following
zffl @zffl
14 Followers 827 Following
Benhao Huang @huskydogewoof
86 Followers 669 Following M.S. student @mldcmu, Prev. @UCSanDiego @hseas @UofIllinois @sjtu1896 | Opinions approved by my puppy.
Harry Chong @ChongHarry1
29 Followers 869 Following
Sparsh Jain @Sparshj8287
205 Followers 2K Following Associate Research Fellow @AI4Bharat, IIT Madras || Ex- Data Science Intern @Culinda || Data Science || ML enthusiast
Nikhil Raghuraman @nikraghuraman
251 Followers 989 Following Research @MistralAI | Prev @JaneStreetGroup, @StanfordAILab | DMs open.
Agni @ShaunAgni
22 Followers 1K Following
betterest @betterestli
23 Followers 551 Following MS student (2023-2026) 📖 ; Feel free to contact ✉️; sampling_params = {'temperature': 2.0, 'top_p': 1.0} 🤯; I'm a fool who needs a reasoning model🫠
Eitan Turok @ICML 202... @EitanTurok
914 Followers 2K Following AI researcher sorting in exponential time. Ex @DbrxMosaicAI @Columbia.
Alex Zhang @a1zhang
13K Followers 587 Following phd student @MIT_CSAIL + @SakanaAILabs, ugrad @Princeton, 🫵🏻 go participate in the @GPU_MODE kernel competitions!
Jiashuo Liu @liujiashuo77
1K Followers 555 Following Research Scientist at ByteDance Seed | Advanced & Interesting LLM/Agent Evaluation. Opinions are my own.
Kevin Lu @_kevinlu
9K Followers 216 Following @thinkymachines. formerly: - @openai: RL, synthetic data, efficient models - @berkeley_ai: decision transformer, universal computation
Ying Sheng @ying11231
12K Followers 714 Following @lmsysorg | Prev. @xAI @Stanford | Assist Prof @UCLA. (Fall 2026) | Do it anyway | Live to fight another day
Timor Averbuch @timorchik
29 Followers 27 Following
Liyuan Liu (Lucas) @LiyuanLucas
970 Followers 350 Following Researcher @MSFTResearch | 🎓 @UofIllinois Working on deep learning heuristics (aka tricks) He/him
Chris Lu @_chris_lu_
4K Followers 615 Following Research @OpenAI Prev: DPhil Student @UniofOxford, RS Intern @SakanaAILabs @DeepMind and RS @CovariantAI
Xuechen Li @lxuechen
16K Followers 944 Following Previously @xai. Interested in the engineering and science for scaling. Opinions are my own. @Stanford PhD.
Junli Wang @JunliWang2021
69 Followers 210 Following Undergraduate @Tsinghua_Uni | Prev visiting student @XLangNLP, advised by @taoyds | Intern @Alibaba_Qwen
World of Engineering @engineers_feed
3.5M Followers 69 Following The most fun way to learn something new everyday. Brother page of @stats_feed YouTube https://t.co/rAUIl0V1xC
Artificial Analysis @ArtificialAnlys
57K Followers 542 Following Independent analysis of AI models and hosting providers - choose the best model and API provider for your use-case
Matt Deitke @mattdeitke
13K Followers 299 Following AI Researcher @ Meta Superintelligence Lab Ph.D. dropout at @uwcse
Dylan X. Hou @XinmingHou
530 Followers 2K Following undergrad studying AI at Renmin Univ. of China, NLP researcher, intelligence explorer&trainer, interned@Tencent AI Lab. Carpe Diem🍀
張小珺 Xiaojùn @zhang_benita
17K Followers 78 Following 财经作者,写作中国商业深度报道,包括AI/科技巨头/风险投资/人物,也是播客《张小珺商业访谈录》主持人、制作人。Financial writer covering China business world, also the producer and host of "Zhang Xiaojun Podcast."
Z.ai @Zai_org
15K Followers 142 Following The AI lab behind GLM models, dedicated to inspiring the development of AGI to benefit humanity. https://t.co/b6zGxJvzzS
Zhihu Frontier @ZhihuFrontier
750 Followers 73 Following 🚀Bringing China's AI & tech trends, voices, and perspectives to the global stage. ⚡️Powered by Zhihu/知乎, China's leading knowledge community.
OpenRouter @OpenRouterAI
52K Followers 304 Following Discover and use the latest LLMs. 400+ models (incl. 50+ free), explorable data, private chat, & a unified API. https://t.co/qJG5mKrigL
Crystal @crystalsssup
11K Followers 597 Following Staff @Kimi_Moonshot prev. co-maker of ModelizeAI & gemsouls "Personality goes a long way" @UCSanDiego
Xingcheng Yao @StuartYao22139
250 Followers 268 Following Member of technical staff at @Kimi_Moonshot, Prev @uclanlp, @Tsinghua_IIIS, @princeton_nlp.
Lechao Xiao @Locchiu
1K Followers 596 Following Research Scientist @GoogleDeepMind / Google Brain. Tackle scaling, along the path to AGI.
Zonghan Yang @yang_zonghan
2K Followers 2K Following PhD student at Tsinghua NLP & AIR, studying agents that automate tasks ranging from daily activities to creative endeavors. Two drifters with the world to see.
M-A-P @MM_Art_Project
152 Followers 11 Following An open-source AI research community, known as SuperGPQA, YuE, MERT, OpenCodeInterpreter https://t.co/wiC7aNBZhU
Nathan Lambert @natolambert
56K Followers 853 Following Figuring out AI @allen_ai, open models, RLHF, fine-tuning, etc Contact via email. Writes @interconnectsai Wrote The RLHF Book Mountain runner
Wentao Guo @WentaoGuo7
313 Followers 161 Following CS PhD student @PrincetonCS, Previously CS MEng + BS @CornellCIS
Nathan Chen @nathancgy4
656 Followers 562 Following @tilderesearch trying to (pragmatically) understand my friend, ml & open-source, 16
Bitdeer AI @Bitdeer_AI
739 Followers 7 Following One-Stop Neocloud with AI solutions: Empower AI Growth Innovation NASDAQ: $BTDR
World of Statistics @stats_feed
4.3M Followers 444 Following There are three kinds of lies: lies, damned lies, and statistics. Sister page of @engineers_feed
Xinyu Zhou @zxytim
1K Followers 1K Following
Dinghuai Zhang 张鼎... @zdhnarsil
4K Followers 2K Following Researcher at @MSFTResearch. Prev: PhD at @Mila_Quebec, intern at @Apple MLR and FAIR Labs @MetaAI, math undergraduate at @PKU1898.
Johannes Oswald @oswaldjoh
1K Followers 641 Following Research Scientist, Paradigms of Intelligence Team, Google Zurich
chen zhuoming @chenzhuoming911
422 Followers 82 Following Ph.D. @SCSATCMU; undergraduate @Tsinghua_Uni
Lucas Beyer (bl16) @giffmana
108K Followers 519 Following Researcher (now: Meta. ex: OpenAI, DeepMind, Brain, RWTH Aachen), Gamer, Hacker, Belgian. Anon feedback: https://t.co/xe2XUqkKit ✗DMs → email
Tianyuan Zhang @tianyuanzhang99
2K Followers 920 Following PhDing in@MIT, towards general intelligence and lifelong machine M.S. in CMU, B.S. in PKU.
Xidulu @xidulu
342 Followers 510 Following Xi Wang, Full-stack Bayesian, ECNU, UMass CICS, JHU CS, Fan of U-Shape
Ben Anson @benaibean
116 Followers 51 Following
Ember @ember_energy
38K Followers 3K Following Global energy think tank accelerating the energy transition with data and policy - [email protected]