Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If someone who knows a lot about garbage collection stumbles onto this, I've always wondered if stop-the-world garbage collection is only a necessity for multithreaded code.

For example, imagine a single-threaded language like Go where the only way to communicate with any other threads/processes was through sockets, with no shared memory whatsoever. Would it still be necessary to stop the world? Or would it be possible to stop only the current thread periodically, and collect garbage with a known or fixed overhead?

Bonus points: if so, then would it be possible to abstract shared memory with something like a software transactional memory (STM) that uses only sockets and copy-on-write somehow to completely avoid the stop-the-world issue?

It just seems to me that if stop-the-world was tackled once and for all, so that we could always limit collection times, then so much of the complexity of low level programming would go away, and we wouldn't have to worry about reference counting, refcount loops, strong/weak references, etc etc etc..



Go (by itself without modifications) is a poor example because it has shared memory.

Though Rust would be a good example of a language where a garbage collection could be added, because the only way to data is transferred between other tasks is through channels wherein only PODs or owned pointers may pass.

A linear type system really helps with coordinating multiple garbage collectors across many threads.


If you avoid stop-the-world, you solve one of the major GC problems, but not all of them.

The need to repeatedly traverse all reachable pointers (even if relatively rarely) is still noticeably more expensive than never having to.


Repeatedly traversing a linked-list-like structure to find a free block for malloc is noticeably more expensive than never having to. Also repeatedly putting freed block back into allocator's structure for future use is noticeable more expensive then not having to. See? You can't get away with simple "even if relatively rarely". For some values of rarely, the cost would be lower.


Absolutely agreed :) That's why there are arena allocators for programs for which allocation performance is particularly sensitive (and have other nice properties). Not having a mandatory garbage collector frees you up to try other approaches where they make sense. Nobody in this thread is disputing (I think?) that for many problems garbage collection is a good choice, because it clearly is, only that the one-size-fits-all nature of GC is appropriate in all situations.


malloc/free suck. They are strawmen for manual memory management though, because the vast majority of manual memory management (when done carefully) avoid malloc/free.

Instead, you embed allocations within each other when possible. i.e: Every time you have a bunch of objects with (near) identical lifetimes, you allocate a single struct that embeds all of the allocations within it (and amortize the allocation cost).

More-over, you use a slub allocator for that particular struct size. Often, you have hard limits on the allocation due to external reasons, meaning you can use a static array with a simple free-list to manage it.

Then, doing a singly-linked list insert/delete is extremely cheap. It is slightly more expensive than a GC pointer bump to allocate, but vastly cheaper to free. Also, it has far better cache locality.

Additionally, the avoidance of indirections and pointer-chasing incurred by doing lots of tiny allocations that indirectly refer to each other also helps cache locality significantly.

There are some rare cases where you want GC-style memory management, simply because the lifetimes are too complex, but in my experience, it is quite a rare situation. Even when you do, you still want to make use of allocations' aggregations which is not possible in most GC'd languages.

For the vast majority of cases, manual MM via coarse-grained refcounting (where large objects pay for a refcount), and slub allocators are used with allocation embedding, you'll have far cheaper MM overhead as well as better cache behavior.


> Or would it be possible to stop only the current thread periodically, and collect garbage with a known or fixed overhead?

You can stop only the current thread. This is what Erlang does.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: