Not sure why you are being downvoted. I agree with you, the whole thing seems ar...

DCKing · on May 6, 2014

Garbage collection is not acceptable in some scenarios, because garbage collection implies unpredictable performance, slower performance, and comparatively heavy runtime dependencies. This is all fine for many applications, but not for stuff like operating systems, JITs, layout engines and other stuff generally called 'systems programming' where performance and lack of runtime dependencies is key. Those languages generally do not require garbage collection (although Rust does support it optionally) and therefore must provide memory management semantics.

Rust is a language that caters to the oft-neglected field of systems programming. It does so by solving many problems of C++ which is mostly used in that area, and introducing/combining many features newish to systems programming languages and programming languages in general. ExpiredLink is being downvoted for missing that point.

zackmorris · on May 6, 2014

If someone who knows a lot about garbage collection stumbles onto this, I've always wondered if stop-the-world garbage collection is only a necessity for multithreaded code.

For example, imagine a single-threaded language like Go where the only way to communicate with any other threads/processes was through sockets, with no shared memory whatsoever. Would it still be necessary to stop the world? Or would it be possible to stop only the current thread periodically, and collect garbage with a known or fixed overhead?

Bonus points: if so, then would it be possible to abstract shared memory with something like a software transactional memory (STM) that uses only sockets and copy-on-write somehow to completely avoid the stop-the-world issue?

It just seems to me that if stop-the-world was tackled once and for all, so that we could always limit collection times, then so much of the complexity of low level programming would go away, and we wouldn't have to worry about reference counting, refcount loops, strong/weak references, etc etc etc..

nanofortnight · on May 7, 2014

Go (by itself without modifications) is a poor example because it has shared memory.

Though Rust would be a good example of a language where a garbage collection could be added, because the only way to data is transferred between other tasks is through channels wherein only PODs or owned pointers may pass.

A linear type system really helps with coordinating multiple garbage collectors across many threads.

Peaker · on May 6, 2014

If you avoid stop-the-world, you solve one of the major GC problems, but not all of them.

The need to repeatedly traverse all reachable pointers (even if relatively rarely) is still noticeably more expensive than never having to.

pkolaczk · on May 7, 2014

Repeatedly traversing a linked-list-like structure to find a free block for malloc is noticeably more expensive than never having to. Also repeatedly putting freed block back into allocator's structure for future use is noticeable more expensive then not having to. See? You can't get away with simple "even if relatively rarely". For some values of rarely, the cost would be lower.

Jweb_Guru · on May 7, 2014

Absolutely agreed :) That's why there are arena allocators for programs for which allocation performance is particularly sensitive (and have other nice properties). Not having a mandatory garbage collector frees you up to try other approaches where they make sense. Nobody in this thread is disputing (I think?) that for many problems garbage collection is a good choice, because it clearly is, only that the one-size-fits-all nature of GC is appropriate in all situations.

Peaker · on May 7, 2014

malloc/free suck. They are strawmen for manual memory management though, because the vast majority of manual memory management (when done carefully) avoid malloc/free.

Instead, you embed allocations within each other when possible. i.e: Every time you have a bunch of objects with (near) identical lifetimes, you allocate a single struct that embeds all of the allocations within it (and amortize the allocation cost).

More-over, you use a slub allocator for that particular struct size. Often, you have hard limits on the allocation due to external reasons, meaning you can use a static array with a simple free-list to manage it.

Then, doing a singly-linked list insert/delete is extremely cheap. It is slightly more expensive than a GC pointer bump to allocate, but vastly cheaper to free. Also, it has far better cache locality.

Additionally, the avoidance of indirections and pointer-chasing incurred by doing lots of tiny allocations that indirectly refer to each other also helps cache locality significantly.

There are some rare cases where you want GC-style memory management, simply because the lifetimes are too complex, but in my experience, it is quite a rare situation. Even when you do, you still want to make use of allocations' aggregations which is not possible in most GC'd languages.

For the vast majority of cases, manual MM via coarse-grained refcounting (where large objects pay for a refcount), and slub allocators are used with allocation embedding, you'll have far cheaper MM overhead as well as better cache behavior.

pcwalton · on May 6, 2014

> Or would it be possible to stop only the current thread periodically, and collect garbage with a known or fixed overhead?

You can stop only the current thread. This is what Erlang does.

pcwalton · on May 6, 2014

If you mean "do you have numbers as to why you might not want GC in all scenarios", yes. http://people.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

Peaker · on May 6, 2014

Lots of GC vs Manual MM do a disservice to manual MM, by comparing GC to a strawman with lots of malloc/free.

IME, manual MM often embeds allocations within each other and/or places allocation on the stack. Both of these options make manual MM far cheaper than GC, rather than slightly cheaper as shown in these papers.

pkolaczk · on May 7, 2014

There is nothing that prevents a GCed language to allocate things on the stack. Also I'm afraid using std::string in C++ causes more allocs/frees than typical use of Strings in Java/C#, because the former can't be safely shared and must be copied - on the heap.

Peaker · on May 7, 2014

A GCed language requires good stack escape analysis. Unless you expose it to the programmer for at least semi-manual MM, it is bound to have false negatives, where you pay for the allocation on the heap unnecessarily.

I totally agree about std::string.

I think C++ with the conventional libraries is actually an example of how to do manual MM badly.

pkolaczk · on May 6, 2014

Generally I agree that you don't want GC in all scenarios, but the paper you cite has several flaws which makes its conclusions very far from truth.

1. It is based on non-production JVM and does not use the same algorithms that are used in production level VMs. There is a huge performance difference between any GC from 1990s and modern generational GC used in production-level VMs.

2. Some part of the experiment involved simulation. You must be very careful with extrapolating simulation results onto real world. Actually someone on the Internet rerun one of the same experiment using non-simulated VM and got totally different results, but now I can't find that blog post :(

3. It compares GC to "ideal" manual memory management, ignoring the fact that manual memory management is not free either. Things like heap fragmentation or computational cost of running allocation/deallocation code do exist and may cause also some real problems. The costs lie elsewhere, but that doesn't mean they don't exist.

My experience with large applications is completely different than the conclusion of the paper. Unless you're doing something completely crazy like generating several GBs of garbage per second, modern generational GCs overhead is typically very small (<5%) with only 20-50% more memory than the live set size, not 3-5x as the paper claims.

The biggest pain not solved completely are pauses, but researchers are actively working on it and things are getting much better.

pcwalton · on May 6, 2014

> 3. It compares GC to "ideal" manual memory management, ignoring the fact that manual memory management is not free either. Things like heap fragmentation or computational cost of running allocation/deallocation code do exist and may cause also some real problems. The costs lie elsewhere, but that doesn't mean they don't exist.

No, it doesn't; it uses malloc and free, counting the costs of allocation and deallocation using a traditional memory allocator. (In fact, if anything, that's unfair to traditional memory allocators, as high-performance code will often use things like bump-allocating arenas which significantly outperform malloc and free. It also uses dlmalloc, which is outpaced by jemalloc/tcmalloc these days, although if the benchmarks are not multithreaded the difference will be small.)

Heap fragmentation exists in GC systems as well, and fragmentation in modern allocators like jemalloc is very small.

pkolaczk · on May 6, 2014

Ok, point taken, however their cost analysis is based on "simulated cycles", which is extremely simplified. With modern CPUs doing caching, prefetching, out-of-order execution I seriously doubt its accurate. malloc/free have typically a tendency to scatter objects around the whole heap, while compacting GCs allocate from a contiguous region - so a properly designed research experiment would take that into account. Hans Boehm did experiments on real programs and found that using compacting GCs actually speeded up some programs because of better cache friendliness.

As for heap fragmentation - it does not exist in some GC systems, like G1 or C4. And fragmentation is also extremely workload dependent - it might be "very small" for most cases and for some might be as much as 5x (Firefox struggled a lot from this).

jasode · on May 6, 2014

> , modern generational GCs overhead is typically very small (<5%) with only 20-50% more memory than the live set size, not 3-5x as the paper claims.

Another set of GC tests mentioned by the famous Drew Crawford article [1] said 4x to 6x was the sweet spot. A followup commenter wanted to clarify that the "best GC algorithms" worked within 2.5x. Whether its 2.5x or 4x, it's a counterpoint to the claims of only 50% more memory. Perhaps there are drastically different workloads skewing the tests. (I didn't thoroughly read both cited papers.)

[1] http://sealedabstract.com/rants/why-mobile-web-apps-are-slow...

pkolaczk · on May 6, 2014

This rant is about GCs in JS engines on mobile devices, which are nowhere near state-of-the art generational GCs used for JVM or CLR.

This is all very much dependent on the garbage production rate. If the application is priducing garbage like crazy, then it is possible to require even 2.5x more memory, but this is a rare edge case, just as 2.5x more memory required due to fragmentation. Any performance aware programmer will avoid dynamic allocation in tight loops, regardless of the type of memory management.

jasode · on May 6, 2014

The web blog topic talked about Javascript GC but the cited paper about GC behavior was measuring desktop Java/JVM. The 2.5x to 6x memory sweet spot was talking about state-of-the-art JVM not Javascript. However, he's extrapolating that the difficulties unsolved by the best GC algorithms in Java to be the same technical challenges for GC research and progress in Javascript.

Those GC challenges point back to why Rust's focus on memory management via different types of pointers is valid. The best GC algorithms haven't solved many of the performance problems that Rust's approach can address.

This is the link embedded within the middle of that blog: http://www-cs.canisius.edu/~hertzm/gcmalloc-oopsla-2005.pdf

pkolaczk · on May 7, 2014

This is the same paper I mentioned above, not "another" research paper. It is based on simulation, non-production JVM, doesn't use state of the art GC algorithms and rerunning the same benchmarks with Oracle JVM gives completely different results.

Also stating that GC CPU/memory overhead is X is generally incorrect because it depends on far too many things, like garbage production rate, eden space size, survivor rates, GC parallelism level etc.

I do have at least two coputational heavy applications running on JVM (one is doing lot of sparse LU-decompositions, another one is evolutionary optimization; both doing some allocations in the tight loop) and none of them requires 2x memory to run at top speed; they require much, much less. Everything depends on how it is coded. If you keep garbage production rates sane, and most objects die with first minor GC, the overhead of GC is extremely small, both CPU and memory-wise. I don't know what they did to get 6x overhead - this looks extremely fishy. Even memory heavy apps like databases don't reserve 6x more heap than the live set size.

Of course, there are some use cases, when GC definitely sucks (pauses, huge heaps) and therefore Rust is a nice thing to have. I hope some of those techniques get adapted in the future by GCed languages in order to reduce memory pressure.

ExpiredLink · on May 6, 2014

The GC argument is a straw man. You can have automatic resource management without GC (actually, GC-languages like Java and C# are at odds with automatic resource management) and without manual ownership semantics a la C++ and Rust.

andreasvc · on May 6, 2014

Can you give an example of automatic resource management without GC?

pohl · on May 7, 2014

Would Objective-C's ARC (where the compiler inserts retain/release calls automatically where needed) be such an example?

pcwalton · on May 7, 2014

No, reference counting is just a form of GC. Rust really is unique here among industry languages.

Kutta · on May 6, 2014

Rust is pretty much the first mainstream-ish language ever to have substructural typing as a core feature. I can't see how this is archaic.

pcwalton · on May 6, 2014

To be fair, C++ has an approximation of it with move semantics/rvalue references—and I'm really glad that it does, since it makes unique pointers much more mainstream of a concept than it would have been otherwise. (Rust tries to avoid unfamiliar concepts with no analogues in other languages.) It's just not safe in C++; there are lots of ways to get undefined behavior via use-after-move.

Jweb_Guru · on May 6, 2014

While the data and anecdotal experience certainly support that manual memory management is faster, I think focusing on that specifically misses the point a bit, because you don't just get performance by taking out the garbage collector.

For example, in systems programming memory is often not the only limited resource that needs to be cleaned up. In such situations, C++-style RAII is extremely valuable. While finalizers can be added to languages with garbage collectors, and indeed have been [1][2], it's not possible to predict when the resource destruction will occur, meaning that they end up requiring explicit (de)allocation of resources in pratice [3][4]. And if you still don't think it's a hard problem, take a look at some of the discussion the D community is currently engaged in over the same issue [5].

Another problem with GC that's already been brought up is the difficulty of using it in hard realtime systems. IBM has done a considerable amount of work in this area, including the development of a garbage collector designed specifically for realtime Java programming [6]. Some things to note about their solution include the heavy restrictions on the hard-realtime threads [7] (they are essentially disallowed from interacting with garbage collected objects, and in any case are not totally free of garbage collection), and that what are called realtime threads are still subject to nondeterministic pauses [8].

There are many more potential issues than this (e.g. signal handling [9]) but I hope you will agree that there is a clear need for languages without mandatory garbage collection. Given that such languages are necessary, it follows that it would be nice for there to be safe languages without mandatory garbage collection that can perform these tasks. While that doesn't necessarily mean you should care, it probably does, because even if you don't personally write programs with such requirements, you almost certainly use them.

[1] C# Destructors, http://msdn.microsoft.com/en-us/library/66x5fx1b.aspx

[2] Java's Object#finalize, http://docs.oracle.com/javase/8/docs/api/java/lang/Object.ht...

[3] .NET's IDisposable (often used with using), http://msdn.microsoft.com/en-us/library/system.idisposable.a... ; see also c# - Proper use of the IDisposable interface, http://stackoverflow.com/questions/538060/proper-use-of-the-...

[4] Java's AutoCloseable (designed for use with try-with-finally), http://docs.oracle.com/javase/8/docs/api/java/lang/AutoClose...

[5] A few considerations on garbage collection, http://forum.dlang.org/thread/ljr549$1kti$1@digitalmars.com ; More radical ideas about GC and reference counting, http://forum.dlang.org/thread/ljrm0d$28vf$1@digitalmars.com ; GC vs Resource Management, http://forum.dlang.org/thread/hhpldnysndikzbkfbiqm@forum.dla...

[6] Introduction to the Metronome Garbage Collector, http://publib.boulder.ibm.com/infocenter/realtime/v1r0/index... ; Real-time Java, part 4: Real-time garbage collection, http://www.ibm.com/developerworks/library/j-rtj4/

[7] Using no-heap real-time threads, http://publib.boulder.ibm.com/infocenter/realtime/v1r0/topic... ; in particular, Constraints on Java threads when running with NHRTs, http://publib.boulder.ibm.com/infocenter/realtime/v1r0/topic....

[8] Metronome Garbage Collector limitations, http://publib.boulder.ibm.com/infocenter/realtime/v1r0/topic... ; Metronome behavior in out of memory conditions, http://publib.boulder.ibm.com/infocenter/realtime/v1r0/topic...

[9] See, for example, Asynchronous Signals in Standard ML, http://www.smlnj.org/compiler-notes/90-tr-reppy.ps ; section 5.2 gives a good overview of why truly preemptive signal handling, often a necessary part of systems programming, is not in general possible in a system with garbage collection.