Considering the vast number of programs that wine works extremely well with I'm not so sure they spent too much optimizing the no-congestion case. You are just doing something extremely quirky in your program.
I've looked at the code in a debugger. Wine has futexes three deep in "malloc". The innermost one is a pure spinlock. The problem with "realloc" is that, when it can't grow an array in place, it has to copy the contents. The Wine implementation does that with the main lock on allocation still held. So, if you have Rust code with a lot of multithreaded vector "push" operations, and more threads than CPUs, you get futex congestion. It's possible to write applications that don't hit this bug, but it's Wine-only, not Windows, so not worth it.
What's "quirky" is trying to use all the CPUs with lower priority threads.