AMD’s RDNA4 GPU architecture | Svelte Hacker News

steelbrain a day ago

Went down the MI300A rabbit hole that was just casually mentioned in this post (https://chipsandcheese.com/p/inside-the-amd-radeon-instinct-...). What a fun chip! (and blog!)

dark__paladin 7 hours ago

MI300A's are very fun. very fortunate to have one to mess around on at work.

incomingpain an hour ago

I made the silly decision to go rdna4 for my ai box. Rocm is awful.

I am optimistic, april of 2026 a new LTS of ubuntu + rocm 7. I guess technically i dont know if they even plan to support this, but I am hoping they do.

This might be game changing for me; but until then, 30-50% load on my gpus on vulkan.

erulabs a day ago

Lower power consumption on a desktop monitor is an interesting technical challenge but I do wonder “Cui bono?” - obviously I’d want my gaming machine to consume less power but I’m not sure I’ve ever considered mouse-idle monitor-on power consumption when considering eg AMD versus Nvidia for my gaming machine.

Don’t get me wrong this is very interesting and AMD does great engineering and I loath to throw shade on an engineering focused company but… Is this going to convert to even a single net gain purchase for AMD?

I’m a relatively (to myself) a large AMD shareholder (colloquially: fanboy) and damn I’d love to see more focus on hardware matmul acceleration rather than idle monitor power draw.

Luker88 a day ago

Some people appreciate leaving the pc open for light tasks even at night, and wasting too much power doing nothing is... well, wasteful. Imagine a home server that has the GPU for AI or multimedia stuff.
The same architecture will also be used in mobile, so depending on where this comes from (architecturally) it could mean more power savings there, too.
Besides, lower power also means lower cooling/noise on idle, and shorter cooldown times after a burst of work.
And since AMD is slowly going to the (ever next-time) unified architecture, any gains there will also mean less idle power draw in other environments, like servers.
Nothing groundbreaking, sure, but I won't say no to all of that.
- masfuerte a day ago
  
  They have optimized idle performance for when the display is still on. Which is nice, but configuring the screen to switch off when idle will save far more power.
  
  AnthonyMouse 19 hours ago
  
  You're confusing idle (the user is not present) with idle (the screen contents is not changing). The latter is extremely common when someone is e.g. reading a document. Nothing changes for a minute or two, the user scrolls down and in an instant the contents changes, then nothing changes for a minute or two again.
  You don't want the hardware in a high-power state during the time it's not doing anything, even when the user is actively looking at the screen.
- delusional a day ago
  
  > Imagine a home server that has the GPU for AI or multimedia stuff.
  I imagine you wouldn't attach a display to your home server. Would the display engine draw any power in that case?
  
  vid a day ago
  
  For the past decade my home server is also my desktop workstation. According to canon people shouldn't do this because the workstation might crash or be updated more frequently or create conflicting resource usage, but with containers it's never been a problem for me and lets me have one more capable computer (with dGPU) essentially for each purpose, rather than managing two systems, one of which I only use for ~20 hours a week (I usually use my laptop). However I definitely would like it to use as little power when in low processing states.
  
  ThatPlayer a day ago
  
  Is a PiKVM considered a display? I've got one attached to my home server. Alongside the dedicated graphics card, it probably uses more power than usual server motherboard KVMs, but it's still cheaper and accessible for home servers.
  
  mrheosuper 10 hours ago
  
  i attached a "virtual display" to my windows VM box, to do game streaming
WhyNotHugo a day ago

I definitely value lower power usage when idle. My desktop PC uses ~150W when idle. Sometimes I leave it on overnight simply downloading a remote file or some other extremely light operation.
It doesn't make sense that it would draw this much power. A laptop can do the same thing with ~10W.
This sort of improvement might not increase sales with one generation, but it'll make a difference if they keep focusing on this year after year. It also makes their design easier to translate into mobile platforms.
- AnthonyMouse 19 hours ago
  
  > I definitely value lower power usage when idle. My desktop PC uses ~150W when idle.
  These two statements appear to be in conflict. 150W is a high idle power consumption for a modern PC. Unless you have something like an internal RAID or have reached for the bottom of the barrel when choosing a power supply, 40W is on the high side for idle power consumption and many exist that will actually do ~10W.
  
  port11 4 hours ago
  
  There's a couple dozen of us stuck with older hardware because we have less money to throw around.
- randomNumber7 a day ago
  
  Energy is so cheap it's not really worth the effort (economically).
  And if it would be more expensive stuff like cooking or washing clothes would still hurt more than downloading a file with a big PC.
  
  yndoendo 21 hours ago
  
  It is cheap until it is not. Cost of energy is dependent on where in the world you live, how it is sourced, time of the year, and if there is a natural disaster. Some places in the world, like parts of rural India, only have access to solar.
  Expect energy cost to also go up in the USA with the administration pushing to phase out renewable energy with fossil fuels. Fossil fuel never goes down in value. They may seem to go up and down but over time always increase.
  As we keep adding more and more computers to the grid it will require more and more energy.
  Second hand computers with this energy efficiency will benefit the poor and counties where energy is still a costly commodity. I don't mind paying the initial cost.
  
  port11 4 hours ago
  
  Energy is cheap in some places of the world. Something that saves me, say, 50Wh, nets me 150€ at the end of a year. "Don't make things more optimised because my electricity is cheap" would keep us all stuck with big energy bills.
jayd16 a day ago

Rumors have been floating around about some kind of PS6 portable or next gen steam deck with RDNA4 where power consumption matters.
There's also simply laptop longevity that would be nice.
kokada a day ago

I am not saying that this was the reason I bought it, but I recently purchased a Radeon 9070 and I was surprised how little power this card uses in idle. I was seeing figures between 4W~10W on Windows (sadly slightly more on Linux).
In general this generation of Radeon GPUs seems highly efficient. Radeon 9070 is a beast of a GPU.
- dontlaugh a day ago
  
  They benchmark so well that I’m considering replacing my 3070 with a 9070xt.
Telaneo 15 hours ago

Might be helpful to get some perspective on this. Most cards idle in the 5-10 watt range, but there have been outliers, like the Intel A770 before the drivers were mostly fixed, which ran at 45 watts.[1] I believe there have been even more stupid incidents with cards combined with early drivers, where, if you just happened to have a multi-monitor setup running at two different resolutions, it would go up 70+ watts. Obviously, these are outlier situations probably caused by some optimisations being disabled while the drivers were developed, and never re-enabled before release. 20 watts and below should be easily achievable with pretty much any card, but it's easy to forget that this is still some work that should be done, and shouldn't be left to chance and happenstance.
[1] https://thinkcomputers.org/here-is-the-solution-for-high-idl...
dragontamer a day ago

If the tech was developed, then might as well deploy it to both laptops (likely it's original intent), as well as Desktop.
makeitdouble a day ago

To wager a guess, would that optimization also help push the envelope when one application needs all the power it can get while another monitor is just sitting idle ?
Another angle I'm wondering about is longevity of the card. Not sure if AMD would positively care in the first place, but as a user if the card didn't have to grind much on the idle parts and thus last a year or two longer, it would be pretty valuable.
- formerly_proven a day ago
  
  Recent nvidia generations also about doubled their idle power consumption. Those increases are probably actual baseline increases (i.e. reduce compute power budget), while prior RDNA generations would idle at around 80-100 W doing video playback or driving more than one monitor, which is more indicative of problematic power management.
sjnonweb a day ago

Power efficient chips will result in more overall performance for the same amount of total power drawn. Its all about performance/watt
reactordev 19 hours ago

Even if you save 0.01kw a year, if you multiply that by the number of computers, you’ll save around 100Mw of power. Even small improvements have macro level implications.
sylware a day ago

Another pane of AMD GPU R&D is the _userland_ _hardware_ [ring] buffers for near direct hardware userland programming.
They started to experiment on that in mesa and linux ("user queues", as "user hardware queues").
I don't know how they will work around the scarse VM IDs, but here, we are talking near 0 driver. Obviously, they will have to simplify/cleanup a lot 3D pipeline programming and be very sure of its robustness, basically to have it ready for "default" rendering/usage right away.
Userland will get from the kernel stuff along those lines: command/event hardware ring buffers, data dma buffers, read/write pointers & doorbells memory page for those ring buffers, and an event file descriptor for an event ring buffer. Basically, what the kernel currently has.
I wonder if it will provide some significant simplification than the current way which is giving indirect command buffers to the kernel and deal with 'sync objects'/barriers.
- averne_ a day ago
  
  The NVidia driver also has userland submission (in fact it does not support kernel-mode submission at all). I don't think it leads to a significant simplification or not of the userland code, basically a driver has to keep track of the same thing it would've submitted to an ioctl. If anything there are some subtleties that require careful consideration.
  The major upside is removing the context switch on a submission. The idea is that an application only talks to the kernel for queue setup/teardown, everything else happens in userland.
  
  sylware 5 hours ago
  
  Yep. Future of GPU hardware programming? The one we will have to "standard"-ized à la RISC-V for CPUs?
  The thing are the vulkan "fences", namely the GPU to CPU notifications. Probably hardware interrupts which will have to be forwarded by the kernel to the userland for an event ring buffer (probably a specific event file descriptor). There are alternatives though: we could think of userland polling/spinning on some cpu-mapped device memory content for notification or we could go one "expensive" step further which would "efficiently" remove the kernel for good here but would lock a CPU core (should be fine nowdays with our many cores CPUs): something along the line of a MONITOR machine instruction, basically a CPU core would halt until some memory content is written, with the possibility for another CPU core to un-halt it (namely spurious un-halting is expected).
  Does nvidia handle their GPU to CPU notifications without the kernel too?
  
  sylware an hour ago
  
  eewww... my bad, we would need a timeout on the CPU core locking go back to the kernel.
  Well, polling? erk... I guess a event file descriptor is in order, and that nvidia is doing the same.
adgjlsfhk1 a day ago

the architecture is shared between desktop and mobile. this sounds 100% like something that they did to give some dual display laptop or handheld 3 hours extra battery life by fixing something dumb.
formerly_proven a day ago

In terms of heat output the difference between an idling gaming PC from 10 years ago (~30-40 W) and one today (100+ W) is very noticeable in a room. Besides, even gaming PCs are likely idle or nearly idle a significant amount of time, and that's just power wasted. There are also commercial users of desktop GPUs, and there they are idle an even bigger percentage of the time.
- mmis1000 18 hours ago
  
  I think the power efficiency of AMD graphics is improved by a lot in the past 10 years. If you compare Rx580 and Radeon 890m. They are 7 years apart, with almost the same performance, and 12X power usage difference (the new one is so low so it can be put into a mini pc and used as igpu). It's unimaginable if you said this 7 years ago.
- DiabloD3 a day ago
  
  Idling "gaming PCs" idle about 30-40w.
  Your monitor configuration has always controlled idle power of a GPU (for about the past 15 years), and you need to be aware of what is "too much" for your GPU.
  RDNA4 and Series 50 is anything more than the equivalent of a single 4k 120hz kicks it out of super-idle, and it sits at around ~75W.
  
  daneel_w a day ago
  
  > Idling "gaming PCs" idle about 30-40w.
  Hm, do they? I don't think any stationary PC I've had the past 15 years have idled that low. They have all had modest(ish) specs, and the setups were tuned for balanced power consumption rather than performance. My current one idles at 50-55W. There's a Ryzen 5 5600G and an Nvidia GTX 1650 in there. The rest of the components are unassuming in terms of wattage: a single NVMe SSD, a single 120mm fan running at half RPM, and 16 GiB of RAM (of course without RGB LED nonsense).
  
  DiabloD3 a day ago
  
  Series 16 cards have weird idle problems. Mine also exhibited that. They're literally Series 20s with no RTX cores at all, and their identical 20 counterparts didn't seem to have the same issue.
  So, I assume its Nvidia incompetence. Its my first and last Nvidia card in years, AMD treats users better.
  
  daneel_w a day ago
  
  Are the 10- and 40- series similar? Before the 1650 I had the 1060 in the same PC, and for a while I had an RTX 4060 in it as well (which I bailed on because the model emitted terrible coil noise). Neither really made any mentionable difference in idle power. I'm personally convinced that besides "NUCs" and running on only a 25-35W AMD APU with no discrete graphics card, the days of low-power stationary PCs are long over.
  
  DiabloD3 a day ago
  
  10 was probably the last truly inefficient, but you shouldn't be having problems with a 40.
  Shame you don't have it around anymore, because I'd say set your desktop to like, native res, 60hz, only have one monitor installed, 8 bit SDR not 10 bit SDR, and see if the power usage goes away.
  Like, on a 9800x3D /w 7900XTX with my 4th monitor unplugged (to get under the maximum super-idle load for the GPU), I'm sub-50W idle.
  
  daneel_w a day ago
  
  I run at 1080p60 in 32-bit true color mode, aka 8-bit SDR in Windows' various settings panels. My plan is to revisit the RTX series in the future when there might be a decently performing model at or below 100 watts TDP.
  
  DiabloD3 19 hours ago
  
  Wonder whats going on in your system. Hard to give any more suggestions without having it in front of me, but as much as I shit on Nvidia, you should absolutely not be drawing that much at idle.
  There are settings in the BIOS that can effect this, and sometimes certain manufacturers (Asus, mainly) screw with them because they're morons, so I wonder if you might be effected by it.
  
  rubatuga 19 hours ago
  
  Just checked my gaming PC - 5700xt + rx9060 and it's idling comfortably at 39W with a single 120hz 1080p display. Dual monitors will probably cause higher idling wattage.

syntaxing a day ago

More curious, does RDNA4 have native FP8 support?

krasin a day ago

I refer to the RDNA4 instruction set manual ([1]), page 90, Table 41. WMMA Instructions.
They support FP8/BF8 with F32 accumulate and also IU4 with I32 accumulate. The max matrix size is 16x16. For comparison, NVIDIA Blackwell GB200 supports matrices up to 256x32 for FP8 and 256x96 for NVFP4.
This matters for overall throughput, as feeding a bigger matrix unit is actually cheaper in terms of memory bandwidth, as the number of FLOPs grows O(n^2) when increasing the size of a systolic array, while the number of inputs/outputs as O(n).
1. https://www.amd.com/content/dam/amd/en/documents/radeon-tech...
2. https://semianalysis.com/2025/06/23/nvidia-tensor-core-evolu...
- atq2119 a day ago
  
  It's misleading to compare a desktop GPU against a data center GPU on these metrics. Blackwell data center tenor cores are different from Blackwell consumer tensor cores, and same for the AMD side.
  Also, the size of the native / atomic matrix fragment size isn't relevant for memory bandwidth because you can always build larger matrices out of multiple fragments in the register file. A single matrix fragment is read from memory once and used in multiple matmul instructions, which has the same effect on memory bandwidth as using a single larger matmul instruction.

curtisszmania a day ago

[dead]

brcmthrowaway 19 hours ago

When will the gaming madness end? How much of sales can gamers really drive?