Here’s also a big misconception. When a CPU runs it has access to the entire mem...

Here’s also a big misconception. When a CPU runs it has access to the entire memory (or more specifically, a process can chew up almost all available memory if allowed). The cost of copying from RAM to L1|2|3 cache is lot lower than GPU.

GPU on the other hand is slightly complex. It behaves like lots of small CPUs with their own local memory. They can access the full swath of memory but in parts and copying between the two is much more expensive. If the problem can be boiled down to map on GPU and reduce, then GPU excels. If the problem is serial or can be parallelized with SIMD instructions, CPU will run circles around GPU.

SIMD has come pretty far.