> I do wonder why there isn't an API for "lazy munmap()"... it would behave exactly like munmap(), except that the pages might remain accessible in other threads until the end of their timeslices, when the kernel can apply queued TLB flushes.
You would effectively need to run different address spaces in different threads of the "same" process, which might interoperate badly with whatever guarantees the kernel provides or relies on elsewhere based on the assumption of having a unified address space for the whole process. Though I absolutely agree that this is worthwhile, it probably needs to be designed carefully and can't just transparently replace any and all uses of munmap().
Hmm - both maps and unmaps are already non-atomic though; there's gonna be some however minor span of time where different CPUs will have different state regarding a page that is being updated. IPIs don't grind the entire system to a halt to guarantee everyone is getting the same update at the same time.
I.e. you can already have a syscall on thread A racing against the incoming IPI on thread B's mmap/munmap...
Lazy munmap() is definitely possible, there just needs to be more kernel to userspace bookkeeping. The weirdness you reference would only be visible in a use after free scenario, which is a bug, and therefore the behavior is undefined anyway.
But, reallocation is simplified with a synchronous munmap()—the memory allocator knows (through shared memory communication channels) that the virtual pages can be safely reallocated once the call returns; with a lazy approach some other mechanism needs to inform the allocator when all cores have been flushed (and thus it’s safe to make a new mapping), or else do a shootdown in mmap(). I think Solaris might have done something similar.
It’s safe to take a TLB miss on a remapping, but it’s not safe to reallocate memory and then inadvertently use an old, cached mapping. The synchronous design assumes that allocations are in the critical path and should be fast, but deallocations are not and can be slow. I think the original logic also assumed that virtual address space was scarce. These days I think a lazy unmap is probably worth it as a way of encouraging more efficient reuse of physical memory. Virtual address space is now plentiful. Note that, for security, a physical page might still result in a synchronous shoot down if it’s needed by another process quickly enough.
You would effectively need to run different address spaces in different threads of the "same" process, which might interoperate badly with whatever guarantees the kernel provides or relies on elsewhere based on the assumption of having a unified address space for the whole process. Though I absolutely agree that this is worthwhile, it probably needs to be designed carefully and can't just transparently replace any and all uses of munmap().