"match-head sized chiplets" sort of falls into the chiplet mythos I think. Chiplets aren't magic, they actually increase power usage vs an equivalent monolithic chip, data movement is expensive and the more data you move the more expensive it is. People just think chiplets are efficient because AMD made a huge node leap (GF 12nm to TSMC 7nm is like, more than a full node, probably at least 1.5 if not 2) at the same time, but chiplets have their own costs.
The smaller you split the chiplets, the more data is moving around. And the more power you'll burn. It's not desirable to go super small, you want some reasonably-sized chiplet to minimize data movement.
Even if you keep the chiplets "medium-sized" and just use a lot of them... there is still some new asymptotic efficiency limit where data movement power starts to overwhelm your savings from clocking the chips lower/etc. And there's copper-copper bonding to try and fix that, but that makes thermal density even worse (and boy is Zen4 hot already... 95C under any load). Like everything else, it's just kicking the can down the road, it doesn't solve all the problems forever.
The smaller you split the chiplets, the more data is moving around. And the more power you'll burn. It's not desirable to go super small, you want some reasonably-sized chiplet to minimize data movement.
Even if you keep the chiplets "medium-sized" and just use a lot of them... there is still some new asymptotic efficiency limit where data movement power starts to overwhelm your savings from clocking the chips lower/etc. And there's copper-copper bonding to try and fix that, but that makes thermal density even worse (and boy is Zen4 hot already... 95C under any load). Like everything else, it's just kicking the can down the road, it doesn't solve all the problems forever.