yashwanth @yashwanth__e

tech & cats, undergrad researcher, deepl for life, a lil too employed, gpu poor @ hostel room yashwanth-dl.netlify.app Convergence Joined June 2024

Tweets

2K
Followers

171
Following

1K
Likes

9K

Xun Huang @xunhuang1995

3 weeks ago

The 3rd world model in the past 3 days?

Hunyuan @TencentHunyuan

3 weeks ago

The 3rd world model in the past 3 days?

44 294 1K 164K 791

Download Video

7 22 195 14K 51

Sathwik @VishnuSathvik1

4 weeks ago

If I were smarter, I would have pursued physics

0 3 10 377 0

Inspired by today's Genie 3 release? We are open-sourcing 🧞‍♀️Jasmine🧞‍♀️, a production-ready JAX-based codebase for world modeling from unlabeled videos. Scale from single hosts to hundreds of xPUs thanks to XLA! 🧵 (1/10)

7 67 426 55K 218

Download Gif

Philip J. Ball @philipjohnball

a month ago

Good Knight 🏰⚔️🛡️

Google DeepMind @GoogleDeepMind

a month ago

Good Knight 🏰⚔️🛡️ https://t.co/KMQsfE7nau

846 3K 14K 3.6M 4K

Download Video

164 267 4K 1.2M 1K

Download Video

yashwanth @yashwanth__e

a month ago

Diffusion is the way of life and we must converge to it

机器之心 JIQIZHIXIN @jiqizhixin

a month ago

Diffusion is the way of life and we must converge to it

5 77 542 45K 220

Download Image

0 0 0 44 0

yashwanth @yashwanth__e

a month ago

I was being sarcastic 😭😭 ofc ik LoRA see my auto correct puts it in the correct way also 😭 LoRA not lora

yashwanth @yashwanth__e

a month ago

I was being sarcastic 😭😭 ofc ik LoRA see my auto correct puts it in the correct way also 😭 LoRA not lora

5 0 3 302 0

Download Image

0 0 1 29 0

yashwanth @yashwanth__e

a month ago

What is lora⁉️‼️

5 0 3 302 0

Download Image

yashwanth @yashwanth__e

a month ago

Lol

0 0 1 39 1

Download Image

yashwanth @yashwanth__e

a month ago

Read all papers shared by wenhao!! They are real gold and join his discord for more gold 🗣️ 🗣️

Wenhao Chai @wenhaocha1

a month ago

Read all papers shared by wenhao!! They are real gold and join his discord for more gold 🗣️ 🗣️

1 8 58 4K 30

0 0 1 30 0

机器之心 JIQIZHIXIN @jiqizhixin

a month ago

ByteDance is exploring diffusion LLMs too! 👀 Seed Diffusion Preview: a blazing-fast LLM for code, built on discrete-state diffusion. With 2,146 tokens/sec inference on H20 GPUs, it outpaces Mercury & Gemini Diffusion, while matching their performance on standard code…

5 77 542 45K 220

Download Image

Francielle Dellamora @francidellamora

a month ago

you don't need more hours. you lack discipline, get distracted easily and don't know how to mange your time (self criticism)

2 13 97 3K 11

yashwanth @yashwanth__e

a month ago

Muon ftw (checkout my blog to understand how it works in detail! 😁)

elie @eliebakouch

a month ago

Muon ftw (checkout my blog to understand how it works in detail! 😁)

3 26 232 25K 113

Download Image

0 0 2 77 0

Simo Ryu @cloneofsimo

a month ago

Good paper btw

4 15 194 14K 159

Download Image

yashwanth @yashwanth__e

2 months ago

Guys, I'm actively looking for professors/labs to work under. My speciality is diffusion modeling and llms. If you know someone who is looking for interns, please do let me know. Appreciate any kind of leads 🙏 Thank you!!

0 1 10 383 1

Download Image

Tanishq Mathew Abraham, Ph.D. @iScienceLuvr

2 months ago

Diffusion Beats Autoregressive in Data-Constrained Settings Comparison of diffusion and autoregressive language models from 7M to 2.5B params and up to 80B training tokens. Key findings: 1. Diffusion models surpass autoregressive models given sufficient compute. Across a wide…

13 120 682 47K 491

Download Image

elie @eliebakouch

2 months ago

Kimi K2 tech report is full of gems as always. Here are my notes on it: > MuonClip: Pretty crazy how after 70k the training stabilizes and the QK-clip is basically inactive. There is also no loss in perf with QK-clip which is not trivial at all (at small scale but with…

7 51 340 25K 277

Download Image

Yashovardhan Srivastava @Yaaaaaashhh

2 months ago

on june 15, my prev role ended - i’ve taken some time since then to reflect. now that i’m back, i am actively looking for my next FT role in AI/LLM. if you're building in AI and hiring someone who can work on challenging problems, i'd love to chat. DMs open RTs appreciated 💛

5 8 53 5K 8

𝚐𝔪𝟾𝚡𝚡𝟾 @gm8xx8

2 months ago

AdaMuon: Adaptive Muon Optimizer Builds on Muon, a geometry-preserving optimizer using polar decomposition (Newton–Schulz) for orthogonal 2D updates, and adds per-parameter second-moment scaling and RMS-aligned rescaling to fix Muon’s sensitivity to noisy gradients. AdaMuon…