ARM code is a fair bit bigger than 16 bit 8086 code, so there is that to consider. It makes total cost of ownership much higher when we're talking about systems with RAM measured in hundreds of kilobytes or single digit megabytes.
But basically what you noticed is exactly why RISC swept everything else away in the late 80s to early 90s. If you have a 50,000 or 100,000 transistor budget and RAM is relatively cheap and fast, then complex microcoded designs really are a bad idea. You can get so, so much more performance out of a design like MIPS or ARM rather than an 80186, etc.
Hindsight is 20/20. If you sent me back to 1976 - 77 in a time machine, I would propose not something like the 68000 or 8086, but something very much like MIPS, or Berkeley RISC (minus the register windows). It could have been done in the late 70s. It would have been easily 3 - 5x as fast per MHz, and could probably be clocked faster. Doing so just wasn't obvious until later. Everyone was trying to pack as much sophistication into the instruction set and architecture, to ease assembly language programming, as possible.
How critical would a load–store architecture be to making that late 70s/early 80s "pre-RISC" better than the CPUs Intel and Motorola were cranking out?
E.g. imagine a simpler 16 bit extension of the 8080 than the 8086 was, with basically just more registers and focusing on making instructions execute in fewer cycles but maintaining a register–memory architecture (potentially also removing or simplifying some instructions but I think the 8080 ISA was already pretty minimal)?
> 29000 transistors? But it's the same as an ARM2 which apparently was full 32bit and had an integer multiplier.
It's a very good point. I think it's worth to ask a few questions in return.
How many years separated the two designs?
Does ARM2 support an equivalent ISA, in terms of features (not encoding)?
For instance, the 8086 has support for BCD integers, specialized instructions for loops, the ability to use its registers as 16 bits or 8bits (doubling the register count in the later case).
Conversely, ARM2 had, for example, a barrel shifter, immediate constants, every instruction being conditional, etc., all of which are absent from 8086.
My point exactly. They have different ISA, and as such cannot be compared on the transistor count alone.
They each correspond to a different era, with different needs, as BCD clearly tells.
If today's HW engineers had a chance to implement a small cpu core with the same tr count, would they come up with the same ISA as the ARM1 or 8086? Would they choose to implement integer division (DIV and IDIV in x86), or not and leave it to software (ARM)? Would they pick CISC, RISC, VLIW, or something else?