*The problem seems to be that no matter how you tweak GC, you will always have a...

obstinate · on July 19, 2015

> often going to be tons cheaper than doing the manual memory management.

And on top of that, manual memory management is not free. I maintain a simple but high-throughput C++ server at Google, and tcmalloc is never less than 10-15% of our profiles.

Don't get me wrong, I'm not saying that Go is faster than C++ or ever will be. I'm just trying to counter the notion that "GC is expensive, manual memory management is near zero runtime cost."

jblow · on July 19, 2015

I bet that if someone who knew what they were doing decided to optimize that, you'd get the cost WAY down, possibly almost to zero. (If you are using std::string, that is your problem right there).

But the very important difference here is that in your case you have a choice and it is possible to optimize the cost away and to otherwise control the characteristics of wheyou pay this cost. In GC systems it is never possible to do this completely. You can only sort of kind of try to prevent GC. It's not just a difference in magnitude, it's a categorical difference.

obstinate · on July 20, 2015

Perhaps. The team is a group of seasoned veterans of high performance server engineering. But perhaps there are others who could improve on our efforts by a significant margin.

Of course we do not use std::string.

reagency · on July 19, 2015

If you really really want to, you can allocate a buffer for all your data.

obstinate · on July 20, 2015

This solves little. What do you think the system allocator is doing under the covers?

SamReidHughes · on July 20, 2015

It's doing a lot less, if you're allocating one buffer for your data instead of many.

SamReidHughes · on July 20, 2015

Just curious: Have you tried jemalloc, and what numbers did you get?

obstinate · on July 21, 2015

We haven't. Google infrastructure uses tcmalloc. Is there a reason to believe it offers a significant win?

SamReidHughes · on July 21, 2015

I'd expect similar performance but less fragmentation?, less memory used by the process if you aren't regularly calling MallocExtension::instance()->ReleaseFreeMemory() as a tcmalloc user.

The first answer at https://www.quora.com/Is-tcmalloc-stable-enough-for-producti... (by Keith Adams) is completely consistent with what I've seen. Rust went with jemalloc for some reason too.

Jweb_Guru · on July 22, 2015

IIRC jemalloc is somewhat better about releasing memory in a timely fashion, at least by default.

chetanahuja · on July 19, 2015

"That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management"

That's just a pipe dream. I say this having spent inordinate amounts of time trying to tune myriad parameters in JVM GC for large heap systems without ultimate success. What it always comes down to is, how much extra physical RAM you're willing to burn to get some sort of predictable and acceptable pauses for GC. It's usually an unacceptable amount.

stcredzero · on July 20, 2015

That's just a pipe dream. I say this having spent inordinate amounts of time trying to tune myriad parameters in JVM GC for large heap systems without ultimate success.

Patient: Doctor, it hurts when I do this!

Doctor: Don't do that!

Possibly, divide your heap into smaller pieces with their own GC? Restructure your system, such that most of your heap is persistent and exempt from GC? I don't know the details of the system you're trying to build, of course. It sounds interesting and challenging.

chetanahuja · on July 20, 2015

"Possibly, divide your heap into smaller pieces with their own GC? Restructure your system"

That's the common recommendation. (resisting calling it "pat answer"). Suffice it to say, this is not always possible. Apart from all the business related issues with rewriting a complex system from scratch, breaking up a large shared memory system into smaller, communicating processes multiplies both the software complexity (roughly by O(N^2) where N is the number of new components created) as well as hardware requirements in it's own right -- think of all the overhead of marshalling/demarshalling, communication latencies, thread managements, increased missed cache-hits because of fragmenting that nice giant cache you were hosting in that big JVM heap.

obstinate · on July 19, 2015

I'm curious how much physical ram is an unacceptable expense to you, given how cheap it is.

stcredzero · on July 19, 2015

Even the amount of RAM parceled out for virtual servers is an embarrassment of riches, provided you pay for something other than the bottom tier!

In the context of games, and other ones as well, I think there's too much attention paid to pushing the envelope and not enough to how much awesome can be had for what is readily available.

banachtarski · on July 19, 2015

> That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management.

Calling shenanigans. No it's not, unless the person doing the manual solution is a novice.

pascal_cuoq · on July 19, 2015

Despite the drastic page limit in the category I was submitting in, I made sure to include a paragraph about how GC enable sharing and how the only reasonable alternative when implementing a similar system in a non-GC language is a lot of gratuitous copying to solve ownership issues in http://frama-c.com/u3cat/download/CuoqICFP09.pdf

(The page limit was 4. Organizers only raised it to 6 after seeing submitted papers.)

I can also confirm the “bit of due diligence” part, and the fact that it's cheaper that the aggravation of not having memory management at all. In the example that I can contribute to the discussion, the due diligence amounted to two more short articles: http://cristal.inria.fr/~doligez/publications/cuoq-doligez-m... and http://blog.frama-c.com/public/unmarshal.pdf

masklinn · on July 19, 2015

> GC enable sharing and how the only reasonable alternative when implementing a similar system in a non-GC language is a lot of gratuitous copying to solve ownership issues

The solution to unclear or shared ownership is generally reference counting. There's a reason why shared_ptr is called that.

pjmlp · on July 19, 2015

With the usual set of locks, cache contention and pauses on cascade deletions of deep datastructures it brings.

masklinn · on July 20, 2015

You don't need locks to RC immutable structures, just atomics (and not even that if the system is single-threaded)

pascal_cuoq · on July 20, 2015

Reference counting is a garbage-collection system like the others (and if you are going to use a garbage-collection system, you can for many usecases do better than reference counting).

masklinn · on July 20, 2015

> Reference counting is a garbage-collection system like the others

Reference counting is a form of automated memory management which can easily be integrated and used in a manually-managed system, and can be used for a specific subset of the in-memory structures (again see shared_ptr). Not so for more complex garbage collection systems which tend to interact badly with manual or ownership-based memory management. Putting the lie to your assertion that the only way to implement sharing in a non-GC language is "gratuitous copying".

pascal_cuoq · on July 20, 2015

Yes, it's a shame that you were not a reviewer, mid-2009, of my article published in September 2009.

stcredzero · on July 19, 2015

It's not the writing of manual memory management in the usual case/happy path that's the problem. It's the very occasional mistake and the debugging time involved. (Though to be fair, automated static analysis tools have taken great strides, and this is not as big a problem as it used to be.)

What GC often gets you is a program that doesn't crash but instead has performance problems, but these are often more easily profiled and found and less severe than a crash. (Manual memory management isn't immune from the same performance problems in any case.)

In other words, GC gets you to "Step 1 -- Get it Correct" faster so you can play with running code faster. The cost/benefit may not fit your situation. In that case, use a different tool.