"Memory safety without garbage collection." is now advertised as a "key value proposition" (this isn't quite the same as saying "we're not thinking of ever adding garbage collection", of course).
« Recently we (Felix, Niko, and I) have been working on getting compiler-level GC support for Rust. The plan is to provide a base set of APIs and intrinsics on which GCs can be built, without including an actual GC itself. »
This is the "Unfinished design for a builtin Rust GC" described in TFA.
The OP does not seem to mention CactusRef https://crates.io/crates/cactusref which provides deterministic deallocation (on par with simple refcounting) of reference cycles, with minimal tracing requirements and minimal memory overhead.
Wouldn't it make more sense to have garbage collectors at the level of LLVM?
That way, you could have direct access to all the pointers in a program without jumping through the language's hoops. And other languages could benefit from it too.
Yes, it makes total sense for the garbage collector to be tightly coupled to the memory allocation system, which is also tightly coupled to code generation (as is done in the case of various high-performance managed languages: Java, C#, Haskell, …). I remember hearing a talk a few years ago about a system of the kind you’re talking about - a kind of “managed language” micro-core that ran alongside LLVM so that it could hook into code generation, and was intended to be a kind of toolkit for making new managed languages. Can’t remember the name…
The performance is still too bad for anything but Java on real applications. The other killer feature, ahead of time compilation, is really clunky, and its most innovative change, static initialization at build time by default, was reverted to runtime initialization.
LLVM has built in hooks for GC, used by Azul's Zing JVM JIT compiler and perhaps Safari's LLVM JS JIT.
I'm not sure if the GC integration is tied to JIT support or not. I think it's mostly related to insertion of safepoints which could be useful for Rust implementation
The concept of what a pointer "is" is somewhat language-dependent, particularly as people finally attempt to mitigate the waste of moving to 64-bit by using tagged pointers (which maybe could try to be modelled in LLVM also, but I think you will rapidly realize that LLVM just isn't modeling a lot of things that affect scanning).
> particularly as people finally attempt to mitigate the waste of moving to 64-bit by using tagged pointers
It's very fraught to try to "mitigate the waste" at the top of the pointer. Everybody involved is clear that what you're doing there is saving up pain for yourself because periodically more of those "wasted" bits become significant and each time that happens if you've been using them now you've incurred a maintenance burden.
The waste at the bottom of the pointer is a far more sensible place to hide information because it won't get taken away when somebody ships a new CPU, it was already there in 32-bit architectures.
LLVM itself makes use of the latter trick extensively as I understand it, but doesn't use the former. If they've got a 64-byte structure, they can align them, then steal six bits at the bottom of each pointer to those structures for bit flags, and when Intel ships a new CPU their code still works. Whereas if I steal ten bits at the top of the pointer, when Intel ships Ice Lake CPUs my compiler now crashes. Good luck debugging that!
> Whereas if I steal ten bits at the top of the pointer, when Intel ships Ice Lake CPUs my compiler now crashes. Good luck debugging that!
That's not how it works. The x86-64 ISA requires these "top" bits to be properly sign extended when actually accessing addresses, so any binary code that's using this "trick" is forward compatible wrt. future CPU's that might enable a bigger virtual address space. The only concern wrt. compatibility is purely ABI related but that can be managed at the OS and system-software layer.
> That's not how it works. The x86-64 ISA requires these "top" bits to be properly sign extended when actually accessing addresses.
On Coffee Lake no more than 48 bits of the virtual address are "real". If I steal ten of the 64 bits from the top, all the actual address information is preserved and I've lost nothing. My compiler works just fine.
On Ice Lake up to 57 bits of the virtual address are "real". Since I'm stealing ten of the 64 bits from the top, I only have 54 bits left, so 3 significant bits were dropped and I don't have them any more. Sign-extending the wrong value won't magically make it into the right value.
Yes—and my concern is that old binaries will suddenly stop working on newer hardware. This has happened before!
The whole reason the top bits are sign-extended in the first place is to discourage these kinds of shenanigans. I remember the pain of switching 24-bit to 32-bit on M68K Macs.
Until the virtual address space gets so big that there are no bits left at the top that are "free". As soon as you have a 64 bit virtual address space, pointer tagging the top bits no longer works. Your program is now broken on the new CPU.
Of course competent programmers using the top K bits of the pointer would make sure that the kernel and memory allocator operate so that they don't allocate memory outside a (64 - K)-bit address space, regardless of what the CPU supports (e.g. Linux mmap as one would expect won't allocate beyond 2^48 unless explicitly requested).
And obviously any intelligent CPU architect, if they make a CPU that masks the upper bits, they would make the number of bits that gets masked configurable per-process and not dependent on the maximum virtual address space supported by the CPU.
> And obviously any intelligent CPU architect, if they make a CPU that masks the upper bits, they would make the number of bits that gets masked configurable per-process
x86-64 does this by requiring all accesses to be correctly sign-extended, which means that valid virtual addresses are unique and there is full forward compatibility even with an extended virtual address space. The "number of bits that get masked" is a pure system/ABI concern, the CPU support only defines a maximum.
> Of course competent programmers ... And obviously any intelligent CPU architect
Interesting to see the future is so "obvious" to you. You are correct however, that this is not a problem at the present because no architecture uses more then 48-bits of address space in practice. And since this is such a common practice now, there will have to allowances made in the future for when the address space is expanded.
As a sibling to you noted, Ice Lake has bumped this from 48 bits to 57 bits to support 5-level paging [1]. So the "in practice" appears to already by out of date. This was known publicly as early as 2017 (albeit this is the first I've heard of it).
That being said, I don't know why the increase from 256TB to 128PB is at all useful but it's enabled in my Arch Linux kernel on a 32GB laptop, so clearly adopted by the broader ecosystem even outside whatever niche use-case it's for.
That being said, the kernel maintains backward compatability by restricting the allocation address space it hands out by default. Userspace needs to explicitly ask for the higher bits [2], so it's probably OK to just continue as-is assuming the 48-bit limit in practice.
The longer I stare at your comment the deeper the feeling that you are hung up on a semantic issue feels. While the intermediate states I went through were maybe a bit more interestingly-explanatory with respect to specific techniques, the final state I landed in is simply: "if you only used the bottom single bit to tag that the entire value is a value instead of a pointer, that alone mitigates the waste of moving to 64-bits by having fewer indirect allocations".
The 32-bit platform can do this same trick, so, your 64-bit pointers are still wider than theirs, and you didn't gain anything unless you also steal high bits.
This isn't quite entirely true, for them address space is very tight and so they daren't just align everything and risk wasting too much address space, whereas on a 64-bit platform you can afford to be very generous with address space. This is akin to how it's fine to have randomised privacy addressing in IPv6, but that'd be a problem in IPv4. But this gain is hard to give a specific value to, whereas the simple fact is that 64-bit pointers are exactly twice as big as 32-bit pointers.
This can give you the incentive to build a different data structure, it's surprising how often "Just use a vector" is correct these days, Chandler Carruth has talked about that at length (in C++). If you have at least some idea how many Foozles you need, and Foozles aren't some crazy multi-page data structure, chances are you should put all the Foozles in a vector, and not carry around actual pointers to Foozles at all.
Isn't using bottom bit is how most runtimes represent their ints taking advantage of the fact that gc pointers are aligned? This is common practice afaiu.
Sure, but for the bottom bit, 64-bit addressing doesn't come into it. If you only want to steal bits at the bottom and have pointers with greater than single byte alignment this worked just fine on a 32-bit CPU already.
I was reacting to the claim that this mitigates the overhead (larger pointer size) of 64-bit addressing. You can steal bits at the top but it's probably a bad idea.
>In Go, the compiler needs to automatically insert code that checks the “pulse” of the heap every now and then, and potentially runs garbage collection. It also needs to automatically insert code that can tell the scheduler “hey now is a safe time to interrupt me if a different goroutine wishes to run”
Is it still the case after Go 1.14 (February 2020) when asynchronous preemption was implemented? As far as I know, goroutines are now preempted using signals instead of inserting checks at every function call.
The signal based mechanism is only used for stragglers.
Go still uses the function call check. The runtime devs found a way to combine it with a check to see if the goroutine's stack needs to be resized, so it doesn't actually cost anything.
For me, the most interesting one is Josephine[1]. This allows tracking of stack roots, doesn't limit the time when GC can run, and takes advantage of rust's affine type system. However I have to say the API feels fairly clunky, with the need to root everything before mutation and the need to pass context objects to every method.
One day, Rust needs a GC. Reference counting is a just crappy GC. Modern GC can perform better than this so Rust is actually hurting its own performance by not having one.
A good GC would make heavily concurrent apps much easier to build with Rust. And would have better performance than the typical Arc Mutex objects passed around right now
Tracing GC is troublesome for any non-memory resource, such as network connection or file handle, due to its untimely release, but otherwise I actually agree: reference counting is a GC mechanism—not a very good one, but it's the only one I'm aware of that works both for memory and resources.
I would enjoy someone test a model where the type system guarantees (or at least lets you detect the situation) that you cannot store such non-memory objects behind a traced GC node (these would include plain memory objects that need to be registered/unregistered precisely).
It might be that it would be needlessly annoying to use just compared to just RC. Or maybe it would be best of both worlds?
Not when the language also supports value types and region allocators (e.g. IDispose in .NET).
You can even turn it into RAII proper, by turning into a compilation error not handling those interfaces properly.
Again with .NET, there are SafeHandles as well, alongside the MarshalInterop APIs.
This is nothing new actually, Mesa/Cedar for Xerox PARC used reference counting with a cycle collector, while other descendent languages down to Modula-3 and Active Oberon always combined value types, tracing GC and C++ like resource handling capabilities.
Oh Common Lisp also has similar capabilities, specially the ZetaLisp predecessor from Lisp Machines.
Then Eiffel not only had this, it was also probably the first Algol like language to support non nullable references.
Sadly they decided to ignore all of this in Java, and then its world domination kind of made everyone else ignore it as well.
Thankfully even Java is improving their story in this regard, while languages like D, Nim and yes .NET kind of show what was already available for several decades.
I must be missing something. How is it possible to precisely collect a resource with tracing GC? And if you need to update counters when you make duplicates of object references, you are not using a tracing GC where the benefits are the cheap duplication of object references, cheap allocations and cheap (batched) releases, but the downside is not being able to precisely and automatically do it when the value is available for collection.
Seems to me it is impossible to have both automatic precise release of a resources and collection-based GC?
> The primary use of this interface is to release unmanaged resources. The garbage collector automatically releases the memory allocated to a managed object when that object is no longer used. However, it is not possible to predict when garbage collection will occur. Furthermore, the garbage collector has no knowledge of unmanaged resources such as window handles, or open files and streams.
> Use the Dispose method of this interface to explicitly release unmanaged resources in conjunction with the garbage collector. The consumer of an object can call this method when the object is no longer needed.
So this is the interface you can use to explicitly release a resource, because the GC gets around to it only later at some unspecified time.
> The SafeHandle class provides critical finalization of handle resources, preventing handles from being reclaimed prematurely by garbage collection and from being recycled by Windows to reference unintended unmanaged objects.
Doesn't seem it's at all helpful for automatic precise release of resources.
> the benefits are the cheap duplication of object references, cheap allocations and cheap (batched) releases, but the downside is not being able to precisely and automatically do it when the value is available for collection.
Note that you don't need GC to reap these benefits, if desired. You can allocate an arena and do secondary allocations inside it, then deallocate everything in a single operation. Arena deallocation is not timely or precise, but it does happen deterministically.
True, but GC gives those benefits automatically, compared to a naive program doing e.g. RC-based memory management.
And there is of course the question of safety; should you release an arena too early, you may have introduced a bug. Worse: it might not crash immediately.
There is actually some work for doing arena management automatically, called region inference: http://www.mlton.org/Regions
But the way I see it, it's just a way to make memory management even more efficient; it's not about precise release of resources, and indeed not all programs can be expressed so that releases can happen only in batches of an arena (assuming those arenas themselves aren't dynamically managed, which certainly is a valid strategy as well, but manual).
> should you release an arena too early, you may have introduced a bug.
A memory safe programmming language will detect any such bugs and reject the program. This is not hard, it's a clean application of existing lifetime checks.
You aren't reading it properly, the documentation you are reading is for the case you leave the work to the GC, you can take it yourself C++ RAII style:
{
using my_socket = new NetworkSocket()
}
// my_socket no longer exists when code arrives here
Or even better if NetworkSocket is a struct, it gets stack allocated, zero GC.
{
using my_socket = new NetworkSocket();
my_socket.write("Started");
register_callback(() => my_socket.write("Finished"));
}
This is the case what RC solves well and tracing GC doesn't solve at all, regardless of the number of interfaces you implement. It is easy to find yourself in this situation given how much callbacks are used in modern codebases.
And what did you try to prove here? There is no use after free and no runtime error in this rust code. The socket stays valid since the moment of its creation and for the whole lifetime of the network component. It gets moved out of nested scope properly and gets closed after leaving the outer scope, after dropping the NetworkComponent struct.
The "oops" comment is invalid in your Rust example because the socket is still valid at that point.
Which is totally different than what would happen in C#, where you'd get use-after-free bug (actually use-after-close).
Try with resources is not RAII. It is a lot weaker.
The playground link confirms the socket is closed after dropping networkComponent.
Last two lines of the output:
Dropping NetworkComponent
Dropping NetworkSocket
Btw: you probably fooled yourself by accidentally creating 2 sockets, and indeed the first one gets dropped immediately when you lose (overwrite) the reference to it. Use Option to avoid that.
By replacing the socket now the port number is another one, and all processes that had open connections to that port will now crash, or have messages dropped without getting why.
I can also fabricate plenty of error situations with Rust if you feel so inclined.
No I didn't. That line is totally irrelevant and does not apply to the socket that was passed to the NetworkComponent. It applies to the initial socket you've added which was not even present in the original example. You should have used Option to make your code equivalent.
Anyway, your example failed to show use-after-close in Rust.
> Remember, Rust isn't perfect, and only fixes 70% of existing error patterns
Sure, no-one here debated that. But it fixes/protects from more error patterns than C#, and use-after-close is one of them.
You stated that try-with-resources + struct types are functionally equivalent to RAII. My code proved they were not, because you can trivially make use-after-free, and it is even really easy to do that by accident. There is nothing in the language that protects from leaking a closeable reference from the `using` scope and then using that reference after the scope gets closed. And that leak can happen 10 layers below, when it is not as easily seen as in this trivial example I posted. In Rust you can do it only with explicit `unsafe`; otherwise the typesystem tracks that for you.
I would need some data on that but I have to say that it always makes me laugh when people only take about the GC in threads about D. It's so good for productivity. I don't really like it but I can't describe just how much of a non-issue it is for us (The company I work for)
Tracing GC has poor memory performance because it has to access rarely used or swapped out pages to scan them for pointers. And of course, the peak memory use is much higher since it doesn't free everything as soon as possible.
There may be advantages if you can use it to add compaction, but I don't think you need a GC to do that necessarily.
No wonder that M1 has specific architecture optimizations that help streamline ARC boilerplate code, while Swift 5.5 will bring more aggressive optimizations (disabled by default, because application can crash if weak/owned references are annotated improperly => WWDC 2021 talk)
> Yeah, that is the usual answer when benchmarks prove how much urban myth reference counting performance is actually like.
CPU/wall time benchmarks are not that relevant to system performance (seriously!) because second-order effects matter more. But if you had peak memory and page demand graphs that would matter.
For a network driver I don't know if it'd really look any different though. That's mostly arena allocations.
Thanks for sharing these links! Super interesting. I do have a question though. The ixy benchmarks seem to imply that RC is generally slower then GC (go and C# are much faster then swift and are only outdone by languages with manual memory management).
However in the tweet thread you shared, the poster said
> all that reference counting overhead (already more efficient than garbage collection) gets dropped in half.
Implying that reference counting is actually more efficient. I don't know how to rectify these two observations. Do you have any insights?
The observation is done by point of view of Swift developers.
The only reason why Swift has reference counting was historical.
Objective-C GC implementation failed, because it was very hard to mix frameworks compiled with and without GC enabled, alongside the usual issues of C memory semantics.
So Apple did the right design decision, instead of trying to fix tracing GC in such environment, just like Microsoft does in COM, they looked into Cocoa [retain/release] pattern, automated that, and in a marketing swoop sold that solution as ARC.
Swift as Objective-C replacement, naturally had to build on top of ARC as means to keep compatibility with Objective-C runtime without additional overhead (check RCW/CCW for how .NET GC deals with COM).
> As shown in the figure, performing RC operations takes on average 42% of the execution time in client programs, and 15% in server programs. The average across all programs can be shown to be 32%. The Swift compiler does implement optimization techniques to reduce the number of RC operations similar to those described
in Section 2.2.2. Without them, the overhead would be higher. The RC overhead is lower in server programs than in client programs. This is because server programs spend relatively less time in Swift code and RC operations; they spend relatively more time in runtime functions for networking and I/O written in C++.
It makes technical sense that Swift uses reference counting, as explained above, but it isn't due to performance, it just sells better than explaining it was due to Objective-C inherited C memory model, which besides the memory corruption problems, it doesn't allow for anything better than a conservative garbage collector, with very bad performance.
2014-09. RFC 256 https://rust-lang.github.io/rfcs/0256-remove-refcounting-gc-...
« Remove the reference-counting based Gc<T> type from the standard library and its associated support infrastructure from rustc.
Doing so lays a cleaner foundation upon which to prototype a proper tracing GC »
Contemporary with that: https://news.ycombinator.com/item?id=8312327 nikomatsakis:
« I wouldn't be so quick to give up on GC support yet! "We have a plan!" But I don't think we'll have it in for Rust 1.0. »
2015-04:
https://blog.rust-lang.org/2015/04/10/Fearless-Concurrency.h...
"Memory safety without garbage collection." is now advertised as a "key value proposition" (this isn't quite the same as saying "we're not thinking of ever adding garbage collection", of course).
2016-08: https://manishearth.github.io/blog/2016/08/18/gc-support-in-...
« Recently we (Felix, Niko, and I) have been working on getting compiler-level GC support for Rust. The plan is to provide a base set of APIs and intrinsics on which GCs can be built, without including an actual GC itself. »
This is the "Unfinished design for a builtin Rust GC" described in TFA.
2018-10:
the introductory post for the shifgrethor library (described in TFA): https://boats.gitlab.io/blog/post/shifgrethor-i/
includes a bolded « I do not expect we will ever “add GC to Rust” ».