> I do wonder why there isn't an API for "lazy munmap()"... it would behave exac...

eqvinox · on May 17, 2020

Hmm - both maps and unmaps are already non-atomic though; there's gonna be some however minor span of time where different CPUs will have different state regarding a page that is being updated. IPIs don't grind the entire system to a halt to guarantee everyone is getting the same update at the same time.

I.e. you can already have a syscall on thread A racing against the incoming IPI on thread B's mmap/munmap...

new_realist · on May 18, 2020

Sure, but the munmap() won’t return to the caller until the effects are visible in all processors.

kentonv · on May 18, 2020

Imagine a program that, instead of calling munmap() directly, spins up a separate thread to call it, with the main thread proceeding on its way.

Doesn't such a program today already observe all the weirdness of the page "disappearing" from different threads at different points in time?

What difference would there be with a lazy mmap(), except that the window during which these inconsistencies are observable would be longer?

new_realist · on May 18, 2020

Lazy munmap() is definitely possible, there just needs to be more kernel to userspace bookkeeping. The weirdness you reference would only be visible in a use after free scenario, which is a bug, and therefore the behavior is undefined anyway.

But, reallocation is simplified with a synchronous munmap()—the memory allocator knows (through shared memory communication channels) that the virtual pages can be safely reallocated once the call returns; with a lazy approach some other mechanism needs to inform the allocator when all cores have been flushed (and thus it’s safe to make a new mapping), or else do a shootdown in mmap(). I think Solaris might have done something similar.

It’s safe to take a TLB miss on a remapping, but it’s not safe to reallocate memory and then inadvertently use an old, cached mapping. The synchronous design assumes that allocations are in the critical path and should be fast, but deallocations are not and can be slow. I think the original logic also assumed that virtual address space was scarce. These days I think a lazy unmap is probably worth it as a way of encouraging more efficient reuse of physical memory. Virtual address space is now plentiful. Note that, for security, a physical page might still result in a synchronous shoot down if it’s needed by another process quickly enough.

new_realist · on May 18, 2020

By the way, I love your new bandwidth-delay-product-based flow control logic in Cap’n Proto!

kentonv · on May 18, 2020

Thanks! (Though it'll be more exciting when it does its own BDP calculation rather than stealing the kernel's socket buffer size. :) )