Presumably because it becomes a shared cost. 100 JVMs in 100 containers have to ...

dragontamer · on Dec 17, 2021

100 JVMs in 100 containers (on one machine) also have to pay the costs of 100x individual garbage collections, instead of centralizing those garbage collections and merging the memories of everyone together.

Hmmm... maybe just having a monolith at that point is better? If you can run 100 containers on one box, why not just run one environment on one box?

vlovich123 · on Dec 17, 2021

I know you're joking but there is an important distinction that enables this. The optimization choices you're going to make of how to lower bytecode to native are going to be largely the same across all your machines - the minute differences in profiles aren't going to make significant differences unless you have the same codebase doing drastically different things. Thus you can share cost easily that so you accumulate execution profiles across your cluster and then lower to native code efficiently.

The garbage collection choices are going to be unique to the workload running on that server and have little information worth sharing that will speed up operations.

dragontamer · on Dec 17, 2021

> The garbage collection choices are going to be unique to the workload running on that server and have little information worth sharing that will speed up operations.

Java's GC is generational though. Which means that Java-style GC benefits from excessive memory (ie: the more free memory you have, the less the GC is run, and the more that any such object is "dead" before the GC is run).

With more "dead objects" left in the heap, a "bigger heap" benefits all processes.

--------

Consider this simple case: 1MB of live data in a 6MB heap. The garbage collector could run 10 times in some given time period (the heap drops down to 1MB each time the GC is run, and 10-runs means that ~51MBs were allocated in this timeframe 6MBs before the first garbage collection, and then 5MB for each of the remaining 9 collections).

Now what happens if you had a 11MB heap instead? Well, the garbage collector won't run until 11MBs were allocated (1st run), and with 1MB of live data after each collection, each remaining run would only be done after 10MBs of allocations. That is to say: only 5-collections would happen.

So with the same algorithm, the 11MB heap with 1MB of live data is twice-as-efficient (ie: half the collections) as the 6MB heap (with 1MB of live data).

-------

Java and its garbage collector is multithreaded these days and the analysis is more difficult. But that doesn't change the fact that generational collectors have this property, where the more RAM they use between collections, the "better they perform".

So just "defacto-sharing more RAM" means all of the individual Java-threads will be more efficient at garbage collection. Well-written heaps get more efficient the more you throw at them.

vlovich123 · on Dec 17, 2021

Oh. I misread your suggestion. Yes, if you can somehow have all the services living within the same address space, a shared heap could probably be better. In the model described here we're talking about a multi-computer distributed system though, not a bunch of containers on one machine. Also, by merging everything into one address space you're now potentially making the blast radius of a security vulnerability much larger.

Also, it's not instantly clear things actually get better because while your heap space is larger, so is your object graph and the rate at which your objects are getting created is larger. You'll get some compression out of it by sharing the class information across services but not much more than that.

dragontamer · on Dec 20, 2021

> Also, it's not instantly clear things actually get better because while your heap space is larger, so is your object graph and the rate at which your objects are getting created is larger.

In practice, the amount of RAM that any application uses is constant. Front-end, back-end, doesn't matter.

If this were not true, all of our apps would "crash" with out-of-memory errors all the time. In effect, our programs are written in such a way that MUST use (at worst) O(1) memory usage.

This is especially true of long-running programs. A memory-leak (ie: using more-and-more live data at any given time) will kill any long-running program given enough time, as all of our computers have a constant amount of RAM in them.

----------

Garbage collection, especially generational garbage collection that Java uses, scales with respect to this live-data. If you only have 10MB of live data, it doesn't matter if your heap size is 300MB. The GC-phase will only analyze (at worst) the 10MB of live data and ignore the other 290MB.

In practice, that 10MB will be maybe 5MB of "old" data (generational), and therefore assumed to be part of a data-set that won't need to be collected, so maybe only 5MB will get actively scanned by the garbage collector in that collect.

hinkley · on Dec 17, 2021

We could in theory get a lot of this benefit by supporting profile guided optimization and building a simple distribution system. You could even instrument a fraction of your servers and distribute to the rest.