AMD has big plans for the future of the consumer GPU segment, and they aren’t ordinary ones, since based on rumors and new patents, the firm is expected to adopt multi-chiplet GPUs soon.
AMD’s Latest Multi-Chiplet GPU Patent Might Have Solved The Latency Problem, Through a “Smart Switch”
The concept of MCM (Multi-Chiplet Module) isn’t entirely new for the graphics segment, but with the limitations of monolithic designs, the inclination towards MCMs in the industry is surely growing. AMD looks to be one of the firms that are well-experienced with multi-chiplet designs, since their lineup of Instinct MI200 AI accelerators was the first to feature an MCM design with multiple chiplets stacked on a single package, such as the GPCs (Graphics Processing Cores), HBM stack, and the I/O die. With the Instinct MI350 lineup, AMD took a rather new approach, which could be the foundation of chiplet-based consumer GPUs, according to coreteks.
Now, the biggest limitation with adopting a chiplet design with gaming GPUs is higher latency, as frames are intolerant to long‑distance data hops, and to sort this issue out, AMD has to come up with a solution that bridges the gap between data and computation as closely as possible. Based on a new patent filing disclosed in the video, it is revealed that AMD might have cracked the code for “multi-chiplet” gaming GPUs. Interestingly, the patent has disclosed details about CPUs instead of GPUs, but the text and mechanism indicate that it is targeted towards a graphics use case.

So, how exactly will AMD use a multi-chiplet design with GPUs? Well, the main driver in the patent is said to be a “data‑fabric circuit with a smart switch” that bridges communication between the compute chiplets and memory controllers. It’s basically AMD’s Infinity Fabric, but scaled down for consumer GPUs, since Team Red cannot employ the HBM memory dies. The switch is designed to optimize memory access by first comparing whether the request for a graphics task requires task migration or data replication, with nanosecond-scale decision latency.
Now that the problem with data accessing is solved, the patent proposes to have GCDs with L1 and L2 caches, similar to what is happening with AI accelerators. However, an extra shared L3 (or stacked SRAM) can be accessed through the switch, which will connect all the GCDs. This will reduce the need to access global memory, and more importantly, act as a shared staging zone between chiplets, similar to what AMD does with their 3D V-Cache, except that it is mainly for processors. Then, there’s also stacked DRAM involved, which is essentially the foundation for an MCM design.

What makes the emergence of multi-chiplet patents this time is that AMD is essentially ecosystem-ready. The company could use TSMC’s InFO‑RDL bridges and a specific version of Infinity Fabric between dies for packaging. And what makes this implementation even more compelling is that it is a scaled-down version of AI accelerators, and if you remember, AMD plans to merge their gaming and AI architectures under a single unit, the UDNA architecture. Team Red has also converged the software ecosystem here, and AMD can amortize driver and compiler work.
With limitations of monolithic designs, the industry needs a change, and AMD might have one of the best opportunities to take the lead over competitors here. However, there are complications with chiplet designs, one of which AMD has experienced with RDNA 3, which was driven by the latency introduced by the chiplet interconnect. However, with the innovative switch approach, combined with the extra shared L3, Team Red hopes to cater to latency issues; however, this is a big architectural leap. As an enthusiast, I would love the innovation to hit the markets, but we will potentially have to wait and watch with UDNA 5.