Azul introduces remote compilation for Java

mgaudet · on Dec 17, 2021

Interesting. OpenJ9 has something similar in preview: https://blog.openj9.org/tag/jitserver/

gavinray · on Dec 17, 2021

So I thought this whole thing was bullshit, then I read the blogpost from OpenJ9

I changed my mind, this image is really what they ought to be showing. It gets the point across:

https://i0.wp.com/blog.openj9.org/wp-content/uploads/2021/09...

What you're looking at is allocation of multiple apps inside of nodes in a cluster. With a JIT Server in each node, the memory required for each instance of the app is reduced, such that the effect is more instances can be fit in the same node size than before.

It reminds me of "bonuses" on equipment in RPGs. You lose 1 equipment slot to have it taken up by the item, but in return the rest of your equipment gets a bonus that more than makes up for the slot you can't use now.

kazinator · on Dec 17, 2021

I suppose ahead-of-time compiling everything so there is no compiler in any application node is a technique that is still some years away.

kjeetgill · on Dec 18, 2021

The article addresses this actually. For a language like Java, the jit is the source of a ton of it's performance.

Think of it more like continuous PGO, but the delta perf improvement is much higher.

pjmlp · on Dec 18, 2021

The problem is that requires a closed world approach, or constrained usage of reflection.

Yes there have been AOT compilers since around 2000, however you will notice that they target specific deployment cases, and also offer JIT caches in alternative.

Actually that is also how Android rebooted their AOT efforts in version 7.

You can see it in GraalVM and native image as well.

coldcode · on Dec 17, 2021

Remote inside your environment might make sense, but sending your source to an external entity to compile sounds like a security nightmare.

cogman10 · on Dec 17, 2021

Reading the article, they are only talking about "remote in your environment" because this is somewhat latency sensitive. Sending something like this over the internet would have far too many issues.

orev · on Dec 18, 2021

Developers have been fine with moving everyone else’s stuff to the cloud: Ops, data, security, networking, etc., but now when the cloud comes for your source code that’s a step too far? That horse left the barn a long time ago.

aredplug · on Dec 18, 2021

Think of this as an extension of the build pipeline, which already involves shipping source and build artifacts between multiple machines.

jedberg · on Dec 17, 2021

Back around 2004, Azul created a special machine to run Java. It had 792 small CPUs. You would put it in your datacenter and then replace the JVM on each machine with one that could use this box. There were some constraints on which calls and libraries you could use, but most Java code just worked.

It would use the network to do the JIT on this custom Java box, and it was super fast as long as your machine shared a fast switch with the box.

I wonder if this is basically the resurrection of that with some extra magic to make it work over the internet.

stickfigure · on Dec 17, 2021

Consider me extremely skeptical. One more network service to potentially fail or have security issues. And a whole new thundering herd problem on system recovery. I can only imagine this is tailored for some peculiar workload that I've never seen.

cogman10 · on Dec 17, 2021

The 2 usecases I see this working well for.

* Anything involving autoscaling

* My usecase of 1000s of servers doing ETL type work with java processes getting created and destroyed often.

I don't think this is a great boon for long lived services.

aredplug · on Dec 18, 2021

Azul is popular in low latency financial services. A usecase might be to reduce the variance that JIT compilation introduces to transaction latency, especially at the high percentiles.

ggfgg · on Dec 17, 2021

I can see the benefit of this. We have similar problem with JIT executing on a whole bunch of containers during a roll out.

However I’d rather scrap the software and rewrite it in something with a full AOT compiler and rapid startup at this point. And without more dependencies which will fuck up mid deployment whenever cloud provider X falls over.

rightbyte · on Dec 17, 2021

Seems like a rather unnecessarily complex dependency to do remote JIT compiles.

"it can reduce compute resources by up to 50 per cent"

How ... Do they not include their cloud service in the computations required?

hyperman1 · on Dec 17, 2021

Presumably because it becomes a shared cost. 100 JVMs in 100 containers have to pay that cost individually. Now 1 or 2 central instances have the resources for compiling, the 100 worker bees don't need them. They also might becable to optimize faster as they have runtime statistics from other instances, skipping warmup.

dragontamer · on Dec 17, 2021

100 JVMs in 100 containers (on one machine) also have to pay the costs of 100x individual garbage collections, instead of centralizing those garbage collections and merging the memories of everyone together.

Hmmm... maybe just having a monolith at that point is better? If you can run 100 containers on one box, why not just run one environment on one box?

vlovich123 · on Dec 17, 2021

I know you're joking but there is an important distinction that enables this. The optimization choices you're going to make of how to lower bytecode to native are going to be largely the same across all your machines - the minute differences in profiles aren't going to make significant differences unless you have the same codebase doing drastically different things. Thus you can share cost easily that so you accumulate execution profiles across your cluster and then lower to native code efficiently.

The garbage collection choices are going to be unique to the workload running on that server and have little information worth sharing that will speed up operations.

dragontamer · on Dec 17, 2021

> The garbage collection choices are going to be unique to the workload running on that server and have little information worth sharing that will speed up operations.

Java's GC is generational though. Which means that Java-style GC benefits from excessive memory (ie: the more free memory you have, the less the GC is run, and the more that any such object is "dead" before the GC is run).

With more "dead objects" left in the heap, a "bigger heap" benefits all processes.

--------

Consider this simple case: 1MB of live data in a 6MB heap. The garbage collector could run 10 times in some given time period (the heap drops down to 1MB each time the GC is run, and 10-runs means that ~51MBs were allocated in this timeframe 6MBs before the first garbage collection, and then 5MB for each of the remaining 9 collections).

Now what happens if you had a 11MB heap instead? Well, the garbage collector won't run until 11MBs were allocated (1st run), and with 1MB of live data after each collection, each remaining run would only be done after 10MBs of allocations. That is to say: only 5-collections would happen.

So with the same algorithm, the 11MB heap with 1MB of live data is twice-as-efficient (ie: half the collections) as the 6MB heap (with 1MB of live data).

-------

Java and its garbage collector is multithreaded these days and the analysis is more difficult. But that doesn't change the fact that generational collectors have this property, where the more RAM they use between collections, the "better they perform".

So just "defacto-sharing more RAM" means all of the individual Java-threads will be more efficient at garbage collection. Well-written heaps get more efficient the more you throw at them.

vlovich123 · on Dec 17, 2021

Oh. I misread your suggestion. Yes, if you can somehow have all the services living within the same address space, a shared heap could probably be better. In the model described here we're talking about a multi-computer distributed system though, not a bunch of containers on one machine. Also, by merging everything into one address space you're now potentially making the blast radius of a security vulnerability much larger.

Also, it's not instantly clear things actually get better because while your heap space is larger, so is your object graph and the rate at which your objects are getting created is larger. You'll get some compression out of it by sharing the class information across services but not much more than that.

dragontamer · on Dec 20, 2021

> Also, it's not instantly clear things actually get better because while your heap space is larger, so is your object graph and the rate at which your objects are getting created is larger.

In practice, the amount of RAM that any application uses is constant. Front-end, back-end, doesn't matter.

If this were not true, all of our apps would "crash" with out-of-memory errors all the time. In effect, our programs are written in such a way that MUST use (at worst) O(1) memory usage.

This is especially true of long-running programs. A memory-leak (ie: using more-and-more live data at any given time) will kill any long-running program given enough time, as all of our computers have a constant amount of RAM in them.

----------

Garbage collection, especially generational garbage collection that Java uses, scales with respect to this live-data. If you only have 10MB of live data, it doesn't matter if your heap size is 300MB. The GC-phase will only analyze (at worst) the 10MB of live data and ignore the other 290MB.

In practice, that 10MB will be maybe 5MB of "old" data (generational), and therefore assumed to be part of a data-set that won't need to be collected, so maybe only 5MB will get actively scanned by the garbage collector in that collect.

hinkley · on Dec 17, 2021

We could in theory get a lot of this benefit by supporting profile guided optimization and building a simple distribution system. You could even instrument a fraction of your servers and distribute to the rest.

chrisseaton · on Dec 17, 2021

> How ...

Not sure why you're so confused? Instead of compiling one method a thousand times, you compile it once and share the result. You still count the computation needed for the one compilation, but strictly reduces compute resources as less work is being done overall.

usrusr · on Dec 17, 2021

A remake of those old p2p file-sharing networks but instead of advertising file availability nodes would advertise availability of particularly well-optimized versions of the basic code unit every node already has?

Sounds ridiculous and even a bit pointless because you'd still need the capacity to run lots of compilations along not-yet-optimized code during startup (e.g. after redeploy), but with cluster size being highly variable and usage cost more and more replacing capacity cost as the primary concern this might actually work out. For very large clusters and certain workloads. And the more you modulate cluster size depending on demand, the bigger the benefit from reusing across nodes. Sure, it would be more straight-forward to just clone a fully baked node, but there's a nice peace of mind in not having to decide between cold start and hot clone, just leave it to the self-organization mechanism.

chrisseaton · on Dec 17, 2021

I think you're a bit confused about the basic idea.

> every node already has?

No every node doesn't already have it. That's the point. One node computes on behalf of everyone.

> you'd still need the capacity to run lots of compilations along not-yet-optimized code during startup

Well no - nodes also have interpreters so they don't wait. And it's not any slower for someone else to run the compilation on your behalf - the result comes in the same time. And most code doesn't change between redeploys so doesn't need to be recompiled.

usrusr · on Dec 17, 2021

Every node already has the bytecode and the interpreter, or more likely fast, superficial JIT without expensive optimisation. But that's slow, so when you need to serve a given load you'll need more nodes before they have settled into a hot optimization state.

Regarding unchanged code after redeploy, usage patterns might change a lot even for code that hasn't changed at all. With profile guided optimizations (which I'd expect to be more norm than exception for something like azul), this would mean that nominally unchanged code could suffer hard from running in a version optimized for a different calling environment (the previous version of the application).

In the grand scheme of things, I'd assume that this would be the main benefit and goal of a JIT outcome sharing mechanism like this: allowing nodes to cooperate by each profiling a different part of their shared code, kind of like becoming experts in a tiny specialization, and then sharing that expertise.

drunkenmagician · on Dec 18, 2021

Seems more like a marketing ’feature’, to justify jvm per node pricing, more than a feature with technical merit

anonymousDan · on Dec 17, 2021

Interesting. I wonder would it permit more advanced optimization passes that might have too much impact on performance otherwise?

nradov · on Dec 17, 2021

Any additional gains at the compilation stage are going to be marginal. The real potential is with further dynamic run time optimization within the JVM.

anonymousDan · on Dec 17, 2021

Not sure I understand what your disagreement is... as I said you can do more advanced compilation, which by implication should give you better performance. Is that what you meant?

nradov · on Dec 17, 2021

The gains from more advanced compilation are going to be so small as to not be worth the hassle for most code bases. It's better to focus effort on other areas.

heisenbit · on Dec 17, 2021

Somehow the link of the OP does not work. This one does:

https://www.theregister.com/2021/12/15/azul_introduces_remot...

dang · on Dec 17, 2021

Changed from https://www.theregister.com/2021/12/15/azul_introduces_remot... above. Thanks!

Zababa · on Dec 17, 2021

Thank you!

ksec · on Dec 17, 2021

Sorry, this page doesn't exist!

Edit: Could @dang change the link to https://www.theregister.com/2021/12/15/azul_introduces_remot...

dang · on Dec 17, 2021

Fixed now. Thanks!