Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The problem seems to be that no matter how you tweak GC, you will always have a class of program that it performs terribly for

For casual use, most programs can treat GC like magic, but if you are doing serious work in a language with GC, then you should learn about the GC's characteristics. That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management.

Reducing latency in exchange for throughput is the right decision for the vast majority of programs that will be written in Go. It was already a very attractive language for writing a multiplayer game server, so long as I didn't have very large heaps. (Even so, I can still support 150-250 players and 10's of thousands of entities.) With the "tweak," that limitation is much relaxed.



> often going to be tons cheaper than doing the manual memory management.

And on top of that, manual memory management is not free. I maintain a simple but high-throughput C++ server at Google, and tcmalloc is never less than 10-15% of our profiles.

Don't get me wrong, I'm not saying that Go is faster than C++ or ever will be. I'm just trying to counter the notion that "GC is expensive, manual memory management is near zero runtime cost."


I bet that if someone who knew what they were doing decided to optimize that, you'd get the cost WAY down, possibly almost to zero. (If you are using std::string, that is your problem right there).

But the very important difference here is that in your case you have a choice and it is possible to optimize the cost away and to otherwise control the characteristics of wheyou pay this cost. In GC systems it is never possible to do this completely. You can only sort of kind of try to prevent GC. It's not just a difference in magnitude, it's a categorical difference.


Perhaps. The team is a group of seasoned veterans of high performance server engineering. But perhaps there are others who could improve on our efforts by a significant margin.

Of course we do not use std::string.


If you really really want to, you can allocate a buffer for all your data.


This solves little. What do you think the system allocator is doing under the covers?


It's doing a lot less, if you're allocating one buffer for your data instead of many.


Just curious: Have you tried jemalloc, and what numbers did you get?


We haven't. Google infrastructure uses tcmalloc. Is there a reason to believe it offers a significant win?


I'd expect similar performance but less fragmentation?, less memory used by the process if you aren't regularly calling MallocExtension::instance()->ReleaseFreeMemory() as a tcmalloc user.

The first answer at https://www.quora.com/Is-tcmalloc-stable-enough-for-producti... (by Keith Adams) is completely consistent with what I've seen. Rust went with jemalloc for some reason too.


IIRC jemalloc is somewhat better about releasing memory in a timely fashion, at least by default.


"That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management"

That's just a pipe dream. I say this having spent inordinate amounts of time trying to tune myriad parameters in JVM GC for large heap systems without ultimate success. What it always comes down to is, how much extra physical RAM you're willing to burn to get some sort of predictable and acceptable pauses for GC. It's usually an unacceptable amount.


That's just a pipe dream. I say this having spent inordinate amounts of time trying to tune myriad parameters in JVM GC for large heap systems without ultimate success.

Patient: Doctor, it hurts when I do this!

Doctor: Don't do that!

Possibly, divide your heap into smaller pieces with their own GC? Restructure your system, such that most of your heap is persistent and exempt from GC? I don't know the details of the system you're trying to build, of course. It sounds interesting and challenging.


"Possibly, divide your heap into smaller pieces with their own GC? Restructure your system"

That's the common recommendation. (resisting calling it "pat answer"). Suffice it to say, this is not always possible. Apart from all the business related issues with rewriting a complex system from scratch, breaking up a large shared memory system into smaller, communicating processes multiplies both the software complexity (roughly by O(N^2) where N is the number of new components created) as well as hardware requirements in it's own right -- think of all the overhead of marshalling/demarshalling, communication latencies, thread managements, increased missed cache-hits because of fragmenting that nice giant cache you were hosting in that big JVM heap.


I'm curious how much physical ram is an unacceptable expense to you, given how cheap it is.


Even the amount of RAM parceled out for virtual servers is an embarrassment of riches, provided you pay for something other than the bottom tier!

In the context of games, and other ones as well, I think there's too much attention paid to pushing the envelope and not enough to how much awesome can be had for what is readily available.


> That bit of due diligence and up front design effort is still often going to be tons cheaper than doing the manual memory management.

Calling shenanigans. No it's not, unless the person doing the manual solution is a novice.


Despite the drastic page limit in the category I was submitting in, I made sure to include a paragraph about how GC enable sharing and how the only reasonable alternative when implementing a similar system in a non-GC language is a lot of gratuitous copying to solve ownership issues in http://frama-c.com/u3cat/download/CuoqICFP09.pdf

(The page limit was 4. Organizers only raised it to 6 after seeing submitted papers.)

I can also confirm the “bit of due diligence” part, and the fact that it's cheaper that the aggravation of not having memory management at all. In the example that I can contribute to the discussion, the due diligence amounted to two more short articles: http://cristal.inria.fr/~doligez/publications/cuoq-doligez-m... and http://blog.frama-c.com/public/unmarshal.pdf


> GC enable sharing and how the only reasonable alternative when implementing a similar system in a non-GC language is a lot of gratuitous copying to solve ownership issues

The solution to unclear or shared ownership is generally reference counting. There's a reason why shared_ptr is called that.


With the usual set of locks, cache contention and pauses on cascade deletions of deep datastructures it brings.


You don't need locks to RC immutable structures, just atomics (and not even that if the system is single-threaded)


Reference counting is a garbage-collection system like the others (and if you are going to use a garbage-collection system, you can for many usecases do better than reference counting).


> Reference counting is a garbage-collection system like the others

Reference counting is a form of automated memory management which can easily be integrated and used in a manually-managed system, and can be used for a specific subset of the in-memory structures (again see shared_ptr). Not so for more complex garbage collection systems which tend to interact badly with manual or ownership-based memory management. Putting the lie to your assertion that the only way to implement sharing in a non-GC language is "gratuitous copying".


Yes, it's a shame that you were not a reviewer, mid-2009, of my article published in September 2009.


It's not the writing of manual memory management in the usual case/happy path that's the problem. It's the very occasional mistake and the debugging time involved. (Though to be fair, automated static analysis tools have taken great strides, and this is not as big a problem as it used to be.)

What GC often gets you is a program that doesn't crash but instead has performance problems, but these are often more easily profiled and found and less severe than a crash. (Manual memory management isn't immune from the same performance problems in any case.)

In other words, GC gets you to "Step 1 -- Get it Correct" faster so you can play with running code faster. The cost/benefit may not fit your situation. In that case, use a different tool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: