The problem is that we're more or less stuck with this class of problem unless we end up with something that looks like a Xeon Phi without shared resources and run calculations on many, many truly independent cores, or we accept that the worst and best case performance cases are identical (which I don't foresee anyone really agreeing to).
Or, framed differently, if Intel or AMD announced a new gamer CPU tomorrow that was 3x faster in most games but utterly unsafe against all Meltdown/Spectre-class vulns, how fast do you think they'd sell out?
Larabee was fun to program, but I think it'd have an even worse time hardening memory sideband effects: the barrel processor (which was necessary to have anything like reasonable performance) was humorously easy to use for cross-process exfiltration. Like... it was so easy, we actually used it as an IPC mechanism.
Now you’re asking me technical details from more than a decade ago. My recollection is that you could map one of the caches between cores — there were uncached-write-through instructions. By reverse engineering the cache’s hash, you could write to a specific cache-line; the uc-write would push it up into the correct line and the “other core” could snoop that line from its side with a lazy read-and-clear. The whole thing was janky-AF, but way the hell faster than sending a message around the ring. (My recollection was that the three interlocking rings could make the longest-range message take hundreds of cycles.)
Sure, absolutely, there's large numbers of additional classes of side effects you would need to harden against if you wanted to eliminate everything, I was mostly thinking specifically of something with an enormous number of cores without the 4-way SMT as a high-level description.
I was always morbidly curious about programming those, but never to the point of actually buying one, and I always had more things to do in the day than time in past life when we had a few of the cards in my office.
"if Intel or AMD announced a new gamer CPU tomorrow that was 3x faster in most games but utterly unsafe against all Meltdown/Spectre-class vulns, how fast do you think they'd sell out"
Well, many people have gaming computers, they won't use for anything serious. So I would also buy it. And in restricted gaming consoles, I suppose the risk is not too high?
Also, many games today outright install rootkits to monitor your memory (see [1]) - some heartbleed is so far down the line of credible threats on a gaming machine that its outright ludicrous to trade off performance for it.
They're a pain in the ass all around. Spectre allowed you to read everything paged in (including kernel memory) from JS in the browser.
To mitigate it browsers did a bunch of hacks, including nerfing precision on all timer APIs and disabling shared memory, because you need an accurate timer for the exploit - to this day performance.now() rounds to 1MS on firefox and 0.1MS on Chrome.
This 1MS rounding funnily is a headache for me right as we speak. On a say 240Hz monitor, for video games you need to render a frame every ~4.16ms -- 1ms precision is not enough for accurate ticker -- even if you render your frames on time, the result can't be perfectly smooth as the browser doesn't give an accurate enough timer by which to advance your physics every frame.
Isn't it rather about data leaks between any two processes? Whether those two processes belong to different users is a detail of the threat model and the OS's security model. In a console it could well be about data leaks between a game with code-injection vulnerability and the OS or DRM system.
We already have heterogeneous cores these days, with E and P, and we have a ton of them as they take little space on the die relative to cache. The solution, it seems to me, is to have most cores go brrrrrr and a few that are secure.
Given that we have effectively two browser platforms (Chromium and Firefox) and two operating systems to contend with (Linux and Windows), it seems entirely tractable to get the security sensitive threads scheduled to the "S cores".
Also all the TLS, SSH, Wireguard and other encryption, anything with long-persisted secret information. Everything else, even secret (like displayed OTP codes) is likely too fleeting for a snooping attack to be able to find and exfiltrate it, even if an exfiltration channel remains. Until a better exfiltration method is found, of course :-(
I think we're headed towards the future of many highly insulated computing nodes that share little if anything. Maybe they'd have a faster way to communicate, e.g. by remapping fast cache-like memory between cores, but that memory would never be uncontrollably shared the way cache lines are now.
That's a secure enclave aka secure element aka TPM. Once you start wanting security you usually think up enough other features (voltage glitching prevention, memory encryption) that it's worth moving it off the CPU.
Eh, the TPM is a hell of a lot less functional than security processor on a modem arm board. You can seal and unseal based on system state, but once things are unsealed, it's just in memory
I agree at a gut / instinct level with that thought.
SINGLE thread best and worst case have to be the same to avoid speculation...
However for threads from completely unrelated domains could be run instead, if ready. Most likely the 'next' thread on the same unit, and worry about repacking free slots the next time the schedule runs.
++ Added ++
It might be possible to have operations that don't cross security boundaries have different performance as operations within a program's space.
An 'enhanced' level of protection for threads running a VM like guest code segment (such as browsers) might also be offered that avoids higher speculation operations.
Any operation similar to a segmentation fault relative to that thread's allowed memory accesses could result in forfeit of it's timeslice. Which would only leak what it should already know anyway, what memory it's allowed to access. Not the content of other memory segments.
Itanium allegedly was free from branch prediction issues but I suspect cache behavior still might have been an issue. Unfortunately it's also dead as a doornail.
>if Intel or AMD announced a new gamer CPU tomorrow that was 3x faster in most games but utterly unsafe against all Meltdown/Spectre-class vulns, how fast do you think they'd sell out?
I do realize that gamers aren't the most logical bunch, but aren't most games GPU-bound nowadays?
Not a gamer but I would guess it depends on the graphics settings. At lower resolutions, and with less lighting features, etc. one can probably turn a GPU bound game into a CPU bound game.
Also, a good chunk of these vulnerabilities (Retbleed, Downfall, Rowhammer, there's probably a few I'm forgetting) are either theoretical, lab-only or spear exploits that require a lot of setup. And then the leaking info from something like Retbleed mostly applies to shared machines like in cloud infrastructure.
Which makes it kind of terrible that the kernel has these mitigations turned on by default, stealing somewhere in the neighborhood of 20-60% of performance on older gen hardware, just because the kernel has to roll with "one size fits all" defaults.
I don’t think you are thinking of this right. One bit of leakage makes it half as hard to break encryption via brute force. It’s a serious problem. The defaults are justified.
I think things will only shift once we have systems they ship with fully sandboxes that are minimally optimized and fully isolated. Until then we are forced to assume the worst.
> I don’t think you are thinking of this right. One bit of leakage makes it half as hard to break encryption via brute force.
The problem is that you need to execute on the system, then need to know which application you’re targeting, then figure out the timings, and even then you’re not certain you are getting the bits you want.
Enabling mitigations For servers? Sure. Cloud servers? Definitely. High profile targets? Go for it.
The current defaults are like foisting iOS its “Lockdown Mode” on all users by default and then expecting them to figure out how to turn it off, except you have to do it by connecting it to your Mac/PC and punching in a bunch of terminal commands.
Then again, almost all kernel settings are server-optimal (and even then, 90s server optimal). There should honestly should be some serious effort to modernize the defaults for reasonably modern servers, and then also have a separate kernel for desktops (akin to CachyOS, just more upstream).
Maybe so but I think most users are going to be vulnerable to likely under-estimate their security-sensitivity than to over-estimate. On top of that security profiles can change and perhaps people won’t remember to update their settings to meet their current security needs.
These defaults are needed and if the loss is so massive we should be willing to embrace less programmable but more secure options.
Or, framed differently, if Intel or AMD announced a new gamer CPU tomorrow that was 3x faster in most games but utterly unsafe against all Meltdown/Spectre-class vulns, how fast do you think they'd sell out?