I have a hard time believing that the cost of a TLS lookup and a couple of writes per freed object is more expensive than a Cheney scan for objects in the nursery
This paper describes how it is possible if you are happy to throw a lot of memory at the problem:
As with many late 80s memory management papers, cache effects obsolete the conclusions in this one. In fact, the compiler's ability to coalesce all the stack allocations in a single procedure call into a single pair of stack manipulation instructions makes the conclusions almost meaningless in practice: inlining optimizations and modern CPUs make that cost essentially zero nowadays. I'd almost argue that the methodology in this paper, combined with 2015 assumptions, offers an effective argument against GC. Appel's memory management papers really need to be taken with a grain of salt :)
This paper describes how it is possible if you are happy to throw a lot of memory at the problem:
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49....
Section 3 is titled "Explicit freeing is more expensive".