CDNA4 also brings native support for FP6 and FP4 data types to AMD’s accelerators for the first time. One of the marquee features of rival NVIDIA’s Blackwell architecture, FP6 and FP4 have become a new target for AI inference, as developers look to wring every TOP/FLOP of performance from these expensive and power-hungry GPUs. And, aiming to one-up NVIDIA at their own game here, AMD has even beefed up FP6 performance on their architecture so that it processes at twice the rate of FP8, unlike NVIDIA’s architecture where it processes at the same rate as FP8. AMD in essence built a better FP4 unit to support FP6, rather than reusing an FP8 unit to support FP6. This carries a die area penalty, but the upshot is double the performance.
Key Announcements from AMD Advancing AI 2025
morethanmoore.substack.com