Hello you fine Internet folks,
Today's article is on Nvidia's RTX PRO 6000 Blackwell and diving into the Blackwell architecture generally and more specifically into the GB202 GPU die in the RTX PRO 6000.
Hope y'all enjoy!
chipsandcheese.com/p/blackwell-nv…old.chipsandcheese.com/2025/06/28/bla…
PyTorch + ROCm on Windows: community powered. Thanks to the great work by @adyaman and Scott Tsai on the port and TheRock team building a community first build and CI system for ROCm
PyTorch + ROCm on Windows: community powered. Thanks to the great work by @adyaman and Scott Tsai on the port and TheRock team building a community first build and CI system for ROCm
Radix Sort is a fast, non-comparison-based algorithm ideal for sorting large datasets especially on GPUs.
💡 Check out our research that introduces a memory-efficient GPU Radix Sort that improves on Onesweep:
gpuopen.com/learn/boosting…
RenderDoc v1.38 is now available!
This version contains a number of new bugfixes and some usability improvements.
Full release notes: github.com/baldurk/render…
Binary builds: renderdoc.org/builds
I thought I'd share this, it might be useful to some... I have a small collection of stb-style single-file header-only C libraries, most with a dual MIT/public domain license here: github.com/mattiasgustavs…
🚀 Day 3 of #OpenSourceWeek: DeepGEMM
Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference.
⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs
✅ No heavy dependency, as clean as a tutorial
✅ Fully Just-In-Time compiled…
🚀 Day 2 of #OpenSourceWeek: DeepEP
Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference.
✅ Efficient and optimized all-to-all communication
✅ Both intranode and internode support with NVLink and RDMA
✅…
The core of this work is now upstream in Linus's tree, and the per-fs bits have been rebased on top of that:
git.kernel.dk/cgit/linux/log…
xfs/btrfs/nfs should be fine to include, ext4 would ideally need an iomap conversion first.
The core of this work is now upstream in Linus's tree, and the per-fs bits have been rebased on top of that:
git.kernel.dk/cgit/linux/log…
xfs/btrfs/nfs should be fine to include, ext4 would ideally need an iomap conversion first.
@realGeorgeHotz@ajmedick@TensorWaveCloud I've been running AMD GPUs for many years now. 150k of them, so not some small operation. Even though every single GPU was a snowflake, the hardware was generally fantastic. 20k of those were the PS5 APU chips, which I got to know very well.
We automated tuning the living…
We've got another exciting technical report to share with you today:
🎓 A Numerically Stable Implementation of the von Mises–Fisher Distribution on S^2
Read the full report via GPUOpen (direct link - 1MB PDF): gpuopen.com/download/publi…
So what have I been up to this year? Hardware raytracing, Vulkan, deformable voxels, engine architecture, physics and lots of multithreading. blog.voxagon.se/2024/12/29/yea…
Posted v8 of the uncached buffered IO patchset. This should be fine for merging at this point, let's hope we can make the 6.14 kernel release.
lore.kernel.org/linux-fsdevel/…
Separate branches exist for the core support and fs support, see the cover letter for details.
Posted v5 now, various fixes and cleanups. Should be good enough for folks to apply and test at this point, would love to hear how it works for folks. XFS and ext4 are fully supported for both uncached reads and writes.
lore.kernel.org/lkml/202411101…
Posted v5 now, various fixes and cleanups. Should be good enough for folks to apply and test at this point, would love to hear how it works for folks. XFS and ext4 are fully supported for both uncached reads and writes.
lore.kernel.org/lkml/202411101…
V3 below. Did a bunch of testing and fixed a few corner cases, should be solid now in terms of not over-caching. Writes run consistently at 180GB/sec uncached.
And while I test on a big box, many practical uses for this on smaller iron. dd(1) needs this.
git.kernel.dk/cgit/linux/log…
V3 below. Did a bunch of testing and fixed a few corner cases, should be solid now in terms of not over-caching. Writes run consistently at 180GB/sec uncached.
And while I test on a big box, many practical uses for this on smaller iron. dd(1) needs this.
git.kernel.dk/cgit/linux/log…
Uncached buffered IO is back, after a 5 year hiatus. Simpler and cleaner now. Up to 65-75% improvement, at half the CPU usage on my system. And none of the nonsense of the unpredictability of the page cache. See commit 1 and 3 for read/write perf data.
git.kernel.dk/cgit/linux/log…
Just like I did for 6.10, I wrote up a "what's new with io_uring" but for the 6.11/12 kernels. 6.11 wasn't super exciting in terms of features, so bundled these into a single page.
github.com/axboe/liburing…
484 Followers 947 FollowingML @ AMD
Former ML+3D Engineer @ Stability AI
Ex. AMD Research Engineer, RT & Neural Rendering
2021 Graduate, Computer Graphics Group @ University of Tokyo.
3K Followers 646 FollowingSenior Graphics Programmer @rockstargames. Past @hangar13games, @weareplayground, @ttgames
Dei/Deum
All views and opinions are my own
8K Followers 2K FollowingCreator, Founder and CEO of @TigerBeetleDB — the financial transactions database designed to power the next 30 years of transaction processing.
51 Followers 265 FollowingSenior Graphics Programmer @AMD | Game developer
Hey! I am always looking for new cool things to learn about computer graphics and tech.
All opinions are my own
477 Followers 1K FollowingBuilding @Zenduty - minimizing downtime at companies and institutionalizing reliability and modern incident response best practices, one incident at a time.
166 Followers 845 Followinggraphics/rendering/gpu performance @Xbox ATG // prev: Final Fantasy XV and ATD @SquareEnix, NHL and CTG @EA // Opinions are mine //🥤in 🇯🇵 // 日本語OK
1K Followers 2K FollowingPart solo gamedev, part graphics programmer on https://t.co/ysbQgHf716. Used to work on bigger games (& ex demoscene), now trying to find my own way. ❤️ @Katcooti42
142 Followers 414 FollowingFounder, Managing Director at Odyssey3D
https://t.co/hgSkCl1gJx
Hobbyist weightlifter
GPUs - 3D Graphics - HPC
Views are my own
54K Followers 979 FollowingTeaches math to engineers: https://t.co/TJ5i3Pg678
Professor @UW researching #MachineLearning for #Dynamics and #Control, especially for #FluidDynamics.
472 Followers 397 Followingbuilding GPUs in 🇺🇸. tired of no GPU competition? we figured out how to make them faster, consume less power, and affordable @boltgraphicsinc DMs open
16K Followers 529 FollowingTech Journalism with zero ads, & zero Big Tech influence. We cover the Big Tech stories that other publications are afraid to touch.
5K Followers 668 FollowingIncoming Assistant Prof, Toyota Technical Institute at Chicago @TTIC_Connect
Recruiting PhD students (start 2026) 👀
Will irl - TC0 enthusiast
12K Followers 3K FollowingPhD-ing @MIT_CSAIL. Working on scalable and principled algorithms in #LLM and #MLSys. In open-sourcing I trust 🐳. she/her/hers
1K Followers 273 FollowingFellow at Tenstorrent; believes in dynamic typing, first-class functions, the immortal essence of the human soul and tea. Tweets are my own.
42K Followers 187 FollowingNews from https://t.co/enurGFxpcS, a free distribution service and an open archive for scholarly articles.
For help with arXiv, see https://t.co/LcWuhM0BOl
386K Followers 622 FollowingLove Linux/Unix, open source, and programming? Into Sysadmin & DevOps? Follow us! Boost your IT career with daily new tools, apps, and humor ⤵️
13K Followers 2K FollowingEngineer and Technology Communication. On a mission to make ASICs more accessible. YosysHQ & Tiny Tapeout founder member.
@mattvenn.net on blue sky
484 Followers 947 FollowingML @ AMD
Former ML+3D Engineer @ Stability AI
Ex. AMD Research Engineer, RT & Neural Rendering
2021 Graduate, Computer Graphics Group @ University of Tokyo.
198K Followers 38 FollowingThe Gemini app turns research into reality, bringing frontier AI experiences like Veo 3, Deep Think, and more to hundreds of millions of people.