Jitted LLVM IR on the JVM [pdf]

crudbug · on March 20, 2016

"Graal is a dynamic compiler written in Java that integrates with the HotSpot JVM. It has a focus on high performance and extensibility. In addition, it provides optimized performance for Truffle based languages running on the JVM."

"Truffle is a framework for implementing languages as simple interpreters. Together with the Graal compiler, Truffle interpreters are automatically just-in-time compiled and programs running on top of them can reach performance of normal Java."

haddr · on March 20, 2016

This is something I'm looking forward to in Java 9. There is already a big effort in bringing R language to the JVM (google for project fastr on bitbucket) and it might be just a beginning. It might finally bring the interoperability between other, more specific languages and take advantage of vast amounts of libraries available for Java and other JVM languages.

Edit: link to the fastr https://bitbucket.org/allr/fastr It's open source, and they encourage people to participate.

sciurus · on March 20, 2016

That's also some promising ongoing work implementing Ruby on Truffle.

https://github.com/jruby/jruby/wiki/Truffle

http://lists.ruby-lang.org/pipermail/jruby/2015-December/000...

haddr · on March 20, 2016

BTW. here's the list of CRAN packages they claim they can already run (i suppose install and run, maybe not yet tested). Anyway this is impressive:

https://bitbucket.org/allr/fastr/src/dcae74ecaaf3291f93ae2c0...

azinman2 · on March 20, 2016

What's happening in 9?

the8472 · on March 20, 2016

So far Graal has been an experimental/research fork of the hotspot JVM. JDK 9 builds include JVMCI[1] as experimental feature which can be used to run graal without JVM modifications.

[1] http://openjdk.java.net/jeps/243

wcrichton · on March 20, 2016

Does the memory created by the LLVM IR get managed by the JVM's garbage collector, or is it manually managed à la C++? If the latter, I feel there's an awesome opportunity here for creating a hybrid memory-managed/unmanaged language like Terra (http://terralang.org/) that takes advantage of the massive Java ecosystem while letting people write low-level code when they need performance and full control over memory.

ianopolous · on March 20, 2016

My understanding is they can either use allocation in the JVM heap, or use the Graal foreign function interface to call native malloc, which is useful for inter-operating with native libraries outside the JVM.

mike_hearn · on March 20, 2016

It says this in the presentation. It uses native malloc/free and it can allocate memory on the stack. So - not garbage collected.

However, they are interested in changing this so that programs compiled to LLVM can benefit from the additional safety of bounds checking and GC. They've already done this when running C (source code) on Graal/Truffle and the perf hit was manageable.

Note that you can already write Java that does manual memory allocation and other non-safe things, and quite a lot of popular Java libraries do exactly that to boost performance.

pron · on March 20, 2016

GitHub repo: https://github.com/graalvm/sulong

pcwalton · on March 20, 2016

I assume the type speculation stuff is for dealing with bitcasts in a strongly typed environment like the JVM? (And, if so, presumably the bailout reconstructs the C heap and performs the bitcast manually?)

Also, interesting use of PICs for function pointers. What are the advantages of that approach over just using Java interfaces and letting HotSpot write the ICs?

mike_hearn · on March 20, 2016

Graal/Truffle do not have to obey the vast bulk of the Java type system.

Together they are fairly mind bending projects so it's important to understand precisely how they work. I've been studying them for about six months now so hopefully this explanation isn't too garbled.

The typical code flow in a JVM looks like this:

1. Bytecode loading and verification. This is where the type system is enforced.

2. Interpreting the byte code

3. C1 compiler (very fast, low quality output) compiles bytecode to native code, inserts into the code cache.

4. Very hot code gets recompiled using C2 (slower, high quality output).

Graal inserts itself into and replaces step 4. Once bytecode verification is done, the JVM itself imposes very little in the way of typing. You can actually disable verification with a command line flag, and then bad bytecode will segfault the JVM. The core VM sees code as being composed of methods which are contained in classes, and it wants pointers to be distinct from integers, but otherwise the JIT compilers can produce code that hardly match the Java worldview at all.

Graal is a Java JIT compiler that is, itself, written in Java. Thus it is a module which is passed some sort of input, which can be verified bytecode but can also be something else (e.g. textual source code, or LLVM bytecode), and the result of that is Graal inserting compiled code and generated metadata into the code cache. The core HotSpot engine then takes care of swapping it into the program when it's safe to do so.

So whilst Graal can compile Java bytecode (and do cutting edge optimisations whilst it does), it isn't required to do so, and thus execution of LLVM bytecode like in this example doesn't work by translation to Java bytecode, it just bypasses the bytecode layers of the VM entirely.

And that's where Truffle comes in. Because Graal is written in Java it's quite easy to expose a clean OO API, and it does so. So you can write programs that manipulate or generate Graal's IR (which is a graph based IR) and thus compile code to use the rest of the HotSpot runtime services. However graph IR is a very low level way of thinking about a program. So there is this higher level, Truffle, which allows you to write an AST interpreter in Java. It comes with what they call the Truffle domain specific language, which isn't a language at all but rather a set of annotations, and you can annotate your interpreter to indicate how it should be optimised. And then Graal takes the interpreter and the interpreter's input together, and does some incredibly aggressive optimisations on them, exploiting the fact that the compiler knows what the input to the program is. And like magic the interpreter gets loop unrolled and the overhead is boiled away until you have a JIT compiled program.

Thus for the low low price of an AST interpreter, you get a full blown optimising JITC, high end garbage collector, language interop framework etc all 'for free'. It's really quite radical and the trivial effort of writing AST interpreters means that they now have Graal/Truffle engines for Ruby, Python 3, R, Javascript, C in both raw/unsafe and managed varieties, and of course treating LLVM bytecode as a "language" is the next step.

So to answer your questions - no, type speculation is not required to work around the Java type system. It's just that profile guided speculative optimisations can often improve all kinds of programs and Graal is an aggressively speculating compiler, speculation is just fundamental to its design. And your last question doesn't really make sense because Graal integrates so tightly that it isn't so much using HotSpot, rather, it actually becomes HotSpot. And then of course it can write its PICs however it likes.

pron · on March 21, 2016

I would emphasize that in general Graal isn't a bytecode compiler at all, but a general purpose compiler (somewhat like an LLVM backend) that compiles IR to machine code, and that Graal has two frontends: Java bytecode and Truffle.

Also, within HotSpot, Graal isn't necessarily inserted at stage 4; it could also be used in stage 3 (or even 2, instead of interpreting), it's just not recommended because Graal is a slow compiler.

Finally, Graal doesn't need to run in HotSpot at all. Another option of running it is in SubstrateVM, which takes your language's Truffle interpreter, Graal itself and additional Java code (we'll get to that), and AOT-compiles it all to a native binary (which then serves as a JITting VM for your language). If your language requires a GC, that additional code will contain a GC written in Java. I believe Substrate includes a simple GC, but you can write and use your own.

eatonphil · on March 20, 2016

This was a great explanation. Thanks for taking the time to write this all out. Very exciting stuff.

the8472 · on March 20, 2016

type speculation is not just needed for running C, running dynamic languages like ruby needs it just as much, or maybe even more so, because types are not known at compile time. At least C provides types most of the time.

edko · on March 20, 2016

I wonder if this has the potential of making Swift another alternative for JVM development?

mike_hearn · on March 20, 2016

It does, yes.

At the rate new Truffle languages and frameworks are being developed, it'll eventually be possible to run almost every language on the JVM, albeit with varying levels of interop.