Vijay @__tensorcore__
MLIR, CUTLASS,Tensor Core arch @NVIDIA. Mechanic @hpcgarage. Exercise of any 1st amendment rights are for none other than myself. thakkarv.dev Joined July 2015-
Tweets1K
-
Followers2K
-
Following515
-
Likes8K
“TogetherAI’s chief scientist @tri_dao announced Flash Attention v4 … uses CUTLASS CuTe Python DSL” As always, thanks for being the tip of the spear and pushing us along too 💚
“TogetherAI’s chief scientist @tri_dao announced Flash Attention v4 … uses CUTLASS CuTe Python DSL” As always, thanks for being the tip of the spear and pushing us along too 💚
Using CUTLASS CuTe-DSL, TogetherAI's Chief Scientist @tri_dao announced that he has written kernels that is 50% faster than NVIDIA's latest cuBLAS 13.0 library for small K reduction dim shapes on Blackwell during today's hotchip conference. His kernels beats cuBLAS by using 2…
Cute-DSL is basically perfect (for me). thank you nvidia and cutlass team. i no longer need to wait for long compile times because i underspecified a template param. i hope everyone involved gets an extra chicken nugget in their happy meal
On Sep 6 in NYC, this won't be your typical hackathon where you do your own thing in a corner and then present at the of the day. You'll deploy real models to the market, trades will happen, chaos should be expected. The fastest model is great but time to market matters more.
ariXv gpu kernel researcher be like: • liquid nitrogen cooling their benchmark GPU • overclock their H200 to 1000W "Custom Thermal Solution CTS" • nvidia-smi boost-slider --vboost 1 • nvidia-smi -i 0 --lock-gpu-clocks=1830,1830 • use specially binned GPUs where the number…
Part 2: developer.nvidia.com/blog/cutlass-3… Covers the design of CUTLASS 3.x itself and how it builds a 2 layer GPU microkernel abstraction using CuTe as the foundation.
CUTLASS 4.1 is now available, which adds support for ARM systems (GB200) and block scaled MMAs
Hierarchical layout is super elegant. Feels like the right abstraction for high performance GPU kernels. FlashAttention 2 actually started bc we wanted to rewrite FA1 in CuTe
Hierarchical layout is super elegant. Feels like the right abstraction for high performance GPU kernels. FlashAttention 2 actually started bc we wanted to rewrite FA1 in CuTe
CuTe is such an elegant library that we stopped working on our own system and wholeheartedly adopted CUTLASS for vLLM in the beginning of 2024. I can happily report that was a very wise investment! Vijay and co should be so proud of the many strong OSS projects built on top 🥳
CuTe is such an elegant library that we stopped working on our own system and wholeheartedly adopted CUTLASS for vLLM in the beginning of 2024. I can happily report that was a very wise investment! Vijay and co should be so proud of the many strong OSS projects built on top 🥳
This is what the internet was made for 🥹
This is what the internet was made for 🥹
Cosmos-Predict2 meets NATTEN. We just released variants of Cosmos-Predict2 where we replace most self attentions with neighborhood attention, bringing up to 2.6X end-to-end speedup, with minimal effect on quality! github.com/nvidia-cosmos/… (1/5)
Getting mem-bound kernels to speed-of-light isn't a dark art, it's just about getting the a couple of details right. We wrote a tutorial on how to do this, with code you can directly use. Thanks to the new CuTe-DSL, we can hit speed-of-light without a single line of CUDA C++.
Getting mem-bound kernels to speed-of-light isn't a dark art, it's just about getting the a couple of details right. We wrote a tutorial on how to do this, with code you can directly use. Thanks to the new CuTe-DSL, we can hit speed-of-light without a single line of CUDA C++.
🦆🚀QuACK🦆🚀: new SOL mem-bound kernel library without a single line of CUDA C++ all straight in Python thanks to CuTe-DSL. On H100 with 3TB/s, it performs 33%-50% faster than highly optimized libraries like PyTorch's torch.compile and Liger. 🤯 With @tedzadouri and @tri_dao
Another 🔥 blog about CUTLASS from @colfaxintl, this time focusing on the gory details of block-scaled MXFP and NVFP data types and Blackwell kernels for them. research.colfax-intl.com/cutlass-tutori…
We've been thinking about what the "ideal" architecture should look like in the era where inference is driving AI progress. GTA & GLA are steps in this direction: attention variants tailored for inference: high arithmetic intensity (make GPUs go brr even during decoding), easy to…
We've been thinking about what the "ideal" architecture should look like in the era where inference is driving AI progress. GTA & GLA are steps in this direction: attention variants tailored for inference: high arithmetic intensity (make GPUs go brr even during decoding), easy to…

Jon Masters 🏴�... @jonmasters
15K Followers 7K Following Troublemaker | Computer Architect | @Arm Servers Architect @Google | Previously @RedHat, @Nuvia_Inc | Runner | Author | All views my own | #ArmServers
Longhorn @never_released
14K Followers 143 Following Kernel/hypervisor engineer @awscloud EC2. Hobby @checkra1n. Mastodon: https://t.co/DsXP8PFgL0 Bluesky: https://t.co/dAOfFSSqY4
Dylan Patel @dylan522p
94K Followers 941 Following SemiAnalysis Boutique AI & Semiconductor Research and Consulting DMs are open for consulting, quotes, or to talk shop
Stacy Rasgon @Srasgon
12K Followers 4K Following Semiconductors, stocks, scifi, and smallfry, from a serendipitous sell-sider settled in sunny SoCal. Apparently 65 in a 45 zone. Also banana tweets.
Josiah Draper (aka Al... @coolingreviews
2K Followers 2K Following PC Cooling Reviewer for @tomshardware
Dayman @Dayman58
2K Followers 614 Following
Moshe Dolejsi @lasserith
1K Followers 282 Following Making smol things. All tweets (and mistakes) my own. (He/Him)
Fabricated Knowledge @_fabknowledge_
24K Followers 712 Following Simplifying the world of semiconductor investing in the age of AI. Part of the @semianalysis_ gang.
Intel Graphics @IntelGraphics
65K Followers 780 Following Intel Arc Graphics: our High-Performance Graphics Brand for gamers and creators.
GaTech CSE @GTCSE
3K Followers 794 Following School of Computational Science and Engineering at Georgia Tech
Nicholas Malaya @nicholasmalaya
1K Followers 960 Following Computational Scientist, AMD. To Exascale, and beyond!
matt godbolt is mostl... @mattgodbolt
15K Followers 2K Following Husband, father, coder, sometime verb, real person. Fond of old hardware. Co-host @twoscp. #BlackLivesMatter. @matt.godbolt.org on bsky He/him
Todd Gamblin / @tgamb... @tgamblin
4K Followers 5K Following Dev tools, open source, HPC, systems, parallel computing @Livermore_Lab. @spackpm guy. Setting up https://t.co/MHhbvakyFO. Opinions mine. he/him.
noone @windsofchng
19 Followers 103 Following AI developer , Remote Viewer, Global Day Trader , Turkish Patriot. Sci-Fi and Military Literature Enthusiast.
Bert Maher @tensorbert
3K Followers 342 Following I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
Aman Swar @AmanSwar_
2 Followers 141 Following MLSys. Hacking on CUDA kernels, compilers,and LLM infra. Pushing performance
云创兽Ai @Uprauougear213
2 Followers 68 Following 📊 wealth goddess all in on clearly tracking market trends! curious for market views. DM me for EV stocks! ⚡ #MacroTrends
Anish Malik @anishmalikk
1 Followers 25 Following
Murali Nandan @muralinandann
3 Followers 179 Following
Ian @t894883711
11 Followers 1K Following
Joe Sanchez @JoeSanchez1213
79 Followers 4K Following
TonyStock @TonyStock966952
147 Followers 2K Following
Tianqi Chen @tqchenml
18K Followers 1K Following AssistProf @CarnegieMellon. Distinguished Eng @NVIDIA. Creator of @XGBoostProject, @ApacheTVM. Member https://t.co/QYyfjQNp4p, @TheASF. Views are on my own
Kit @KITTpatel
27 Followers 82 Following
TheValueist @TheValueist
2K Followers 4K Following L/S equity in tech and energy. ISO convexity. Path dependence matters. Sizing via Kelly Criterion. Results never lie. Not financial advice.
Earth @booomeee
400 Followers 3K Following
Ofir Press @OfirPress
15K Followers 6K Following I build tough benchmarks for LMs and then I get the LMs to solve them. SWE-bench & SWE-agent. Postdoc @Princeton. PhD @nlpnoah @UW.
Stephen Oates @stephenjaoates
810 Followers 7K Following
AG @navierisstoked
17 Followers 1K Following
dxtiitle @yesprimemi
0 Followers 66 Following
Ruixiang Ma @HOPPMOHUO
1 Followers 261 Following LLM inference,Previously at Aliyun | SenseTime GitHub: https://t.co/WjyRFc16DN Zhihu: https://t.co/D5j8VkPc6B
seacret @seacret1337
0 Followers 18 Following
SHREYA GUPTA @DrShre_
54 Followers 201 Following Machine Learning Researcher 📚 Passionate about empowering women to lead in Tech #ai #machinelearning @UTAustin @ut_orie @SparkCognition | Opinions are my own
Victor Hugo @VictorHugo45995
0 Followers 7K Following
Brendan Graham @brendanigraham
436 Followers 1K Following
Brian D. Colwell @briandcolwell
72K Followers 2K Following The future is being written in atoms and algorithms. My role is to help ensure we're reading that story accurately & positioning ourselves wisely. Quantum Nerd.
Madeline @HBJ0vpmQRWl18
11 Followers 1K Following
michael.scarn.eth @scarn_eth
428 Followers 3K Following cybersec • sysowner • keyboardist • permaculture • doomposter • data
Tianjiao Huang @tjhu_cook
2 Followers 115 Following
Saber Darabi @SADarabi
302 Followers 7K Following
Abdi M. @abdimoalim_
49 Followers 51 Following Interested in GPU programming & heterogeneous computing.
Gregory Stoner @angstroms
811 Followers 592 Following Love building out apps for Science, Data Science/Deep Learning, and VFX and CG production. Music is a passion via an electric guitar. All opinions are my own
Balaji S @4014_balaji
1 Followers 182 Following
flaging @flaging_
2 Followers 171 Following
Jon Masters 🏴�... @jonmasters
15K Followers 7K Following Troublemaker | Computer Architect | @Arm Servers Architect @Google | Previously @RedHat, @Nuvia_Inc | Runner | Author | All views my own | #ArmServers
𝐷𝑟. 𝐼𝑎�... @IanCutress
49K Followers 1K Following Consultant, Chief Analyst, Influencer @TechTechPotato - @MoreThanMoore2x
HPC Guru @HPC_Guru
28K Followers 89 Following "It takes a lot of knowledge to know what one does not know" 😎Tweets on things related to High Performance Computing -- systems, interconnects, storage, 🥭 ...
Longhorn @never_released
14K Followers 143 Following Kernel/hypervisor engineer @awscloud EC2. Hobby @checkra1n. Mastodon: https://t.co/DsXP8PFgL0 Bluesky: https://t.co/dAOfFSSqY4
Dylan Patel @dylan522p
94K Followers 941 Following SemiAnalysis Boutique AI & Semiconductor Research and Consulting DMs are open for consulting, quotes, or to talk shop
STH @ServeTheHome
20K Followers 227 Following ServeTheHome provides insights and analysis delivered to you since 2009. We specialize in the data center industry with servers, storage, and networking.
Satoshi Matsuoka @ProfMatsuoka
25K Followers 921 Following 理研計算科学研究センター長 Director RIKEN R-CCS, 東科大特定教授 Prof. Inst. Sci.. ACM/ISC/JSSST/IPSJ Fellows, IEEE Fernbach(2014)&Cray(2022) Awards, 令4紫綬褒章 Purple Ribbon Medal 2022
Fritzchens Fritz @FritzchensFritz
5K Followers 82 Following Watch neat Infrared photos or siliconpr0n on Flickr: https://t.co/vD6gNHVn8k
Underfox @Underfox3
9K Followers 128 Following Physicist, Telecom Engineering lover, HPC Enthusiast. Prog Rock/Metal fan.
Dayman @Dayman58
2K Followers 614 Following
Tom Forsyth (TODO: fi... @tom_forsyth
18K Followers 306 Following Gfx coder and chip designer. 3Dlabs/Muckyfoot/RAD/Valve/Oculus/Intel/Rec Room. https://t.co/Y6hyjycmgo @tomforsyth.bsky.social
siliconmemes @realmemes6
6K Followers 340 Following The best AI models have found this account to be incredibly brilliant, every tweet having rare but reliable ideas about tech and energy stocks and military.
François Chollet @fchollet
572K Followers 813 Following Co-founder @ndea. Co-founder @arcprize. Creator of Keras and ARC-AGI. Author of 'Deep Learning with Python'.
InstLatX64 @InstLatX64
4K Followers 0 Following x86/x64, SIMD, #AVX512, "Aha!" moments. I have been writing code since 1986.
SC25 @Supercomputing
18K Followers 544 Following Official Twitter for the SC Conference Series • SC25 • Nov 16–21, 2025 • America’s Center, St. Louis, MO
Glenn K. Lockwood @glennklockwood
6K Followers 323 Following #HPC and supercomputing enthusiast. Employed by @VAST_Data. My posts go to Bluesky these days.
Bert Maher @tensorbert
3K Followers 342 Following I’m a software engineer building high-performance kernels and compilers at Anthropic! Previously at Facebook/Meta (PyTorch, HHVM, ReDex)
TBPN @tbpn
102K Followers 921 Following Technology's daily show. Hosted by @johncoogan and @jordihays. Streaming live 11AM-2PM PT every weekday and available on Apple, Spotify, and YouTube.
Fei Hu @Fei__Hu
373 Followers 1K Following
Kimbo @kimbochen
554 Followers 622 Following
Ferdinand Mom @FerdinandMom
3K Followers 1K Following Distributed & Decentralized training @HuggingFace
RocketPoweredMohawk @RocketPMohawk
75K Followers 1 Following Spreading love and light in the F1 online community — Abu Dhabi 2021 survivor — Patreon: https://t.co/ANcrq3VWvz
Shannon Yang @shannonyangsky
1K Followers 4K Following 25. Building talent & community in AI safety. Currently @AISecurityInst, prev. @AnthropicAI. Philosphy, Politics, and Economics alumna @UniofOxford.
Vitaliy Chiley @vitaliychiley
3K Followers 1K Following LLM Reasearch @ Meta. ex @DataBricks (@DBRXMosaicAI), @CerebrasSystems
Elliot Arledge @elliotarledge
18K Followers 2K Following 21 | instructor @freecodecamp probably timelapsing my life away
Fung XIE @fengxie83
3 Followers 13 Following
Charles 🎉 Frye @charles_irl
14K Followers 3K Following gpu enjoyer at @modal. he/him. ex @full_stack_dl, @weights_biases (acq. @CoreWeave), phd Berkeley @Redwood_Neuro. try https://t.co/SYWVMCazZ3
dePaul Miller @depaulmillz
11 Followers 80 Following
Luke Melas-Kyriazi @lukemelas
1K Followers 3 Following Building @cursor_ai | Rhodes Scholar, Oxford University PhD (Visual Geometry Group) | Prev. Meta Research
John Schulman @johnschulman2
65K Followers 1K Following Recently started @thinkymachines. Interested in reinforcement learning, alignment, birds, jazz music
Barret Zoph @barret_zoph
21K Followers 1K Following CTO & Co-Founder Thinking Machines Lab (@thinkymachines) Past: - VP Research (Post-Training) @openai - Research Scientist at Google Brain
Scott McCrae @scottymccrae
176 Followers 1K Following superintelligence @Meta. helping machines learn :). former founder, @Dropbox, @berkeley_ai
Zion @BlasianHokage
1K Followers 3K Following Creating AI brains @onairosapp . Who should have the power to read your mind?
Fan Donald J. Trump P... @TrumpDailyPosts
2.5M Followers 23K Following Reposting Trump’s Truth Social posts (with date/time) on X + news/commentary. Unofficial. Profile Artist: @ElenaRuseva1 Not affiliated with @realdonaldtrump.
Adam Beyer @realAdamBeyer
198K Followers 494 Following DJ / Producer and label boss of @drumcoderecords ‘Explorer Vol. 1’ Out Now
Charlie Marsh @charliermarsh
28K Followers 830 Following Building @astral_sh: Ruff, uv, and other high-performance Python tools. Prev: Staff engineer @SpringDiscovery, @KhanAcademy, BSE @PrincetonCS.
Alex Zhang @a1zhang
13K Followers 587 Following phd student @MIT_CSAIL + @SakanaAILabs, ugrad @Princeton, 🫵🏻 go participate in the @GPU_MODE kernel competitions!
Simon Guo @simonguozirui
3K Followers 5K Following CS PhD student @Stanford | 🎓 @Berkeley_EECS | prev pre-training @cohere & built things at @ @anyscalecompute @nvidia
Tanisha @tbanaszczyk
277 Followers 363 Following
Anshumaan Gandhi @AnshumaanGandhi
34 Followers 218 Following Quant | Trader | addicted to desserts | The other Gandhi
unusual_whales @unusual_whales
2.5M Followers 2K Following Stocks/Options/Crypto/Market News + Tools. Not advice Get a bonus opening a new tastytrade account: https://t.co/wGf2ZdlXpw Discord: https://t.co/0xJ9e0ZYYG More: https://t.co/nsxZlPV0pC
Mira Murati @miramurati
365K Followers 573 Following Now building @thinkymachines. Previously CTO @OpenAI
Together AI @togethercompute
50K Followers 387 Following AI pioneers train, fine-tune, and run frontier models on our GPU cloud platform.
Gautam Jain @GautamJ18826702
7 Followers 53 Following
vLLM @vllm_project
17K Followers 20 Following A high-throughput and memory-efficient inference and serving engine for LLMs. Join https://t.co/lxJ0SfX5pJ to discuss together with the community!
Zihao Ye @ye_combinator
2K Followers 537 Following Proud to be an engineer. I'm building flashinfer (https://t.co/PabCM3ksjN) at @NVIDIA Opinions are my own.
Tristan @CubeSmiling
197 Followers 166 Following @trishume's non-programming alt. I like making things, weird ideas and solid tungsten objects. Check out my dating page! https://t.co/e7NezzsCR2
Justin Fargnoli @justin_fargnoli
115 Followers 269 Following On @Twitter to learn about GPU, AI, and compiler stuff. LLVM compiler engineer @NVIDIA (opinions are my own).
Jin Wang @jinwang_jw
7 Followers 11 Following
Manish Gupta @BigManniM9
540 Followers 643 Following Software Engineer, Compiler Lover, Fortune Cookie Writer