Saturday 7 August 2021

GraalVM

I've been reading about GraalVM recently, and it's an interesting piece of technology. The thing is that I had to read several articles to get a clear understanding of what GraalVM is. At first sight you could think that it's a completely new Java VM, that's wrong. It's some specific pieces of the Java VM that are new, not the whole one (the Garbage Collectors and Heap Memory structure, Class Loader, ByteCode format... remain the same). GraalVM is a high-performance JDK distribution designed to accelerate the execution... When you download the Community GraalVM what you are getting basically is the HotSpot/OpenJDK with some extra (very important) pieces. Mainly

  • The GraalVM Compiler. This is the essential part, and the one that is important for 95% of people. The essential idea is that this ins not a new Java source code)compiler (like javac) but a new JIT compiler, that compiles Java Bytecodes to Native code.
  • The GraalVM Native Image. This technology compiles java applications ahead of time to native code. The resulting native applications do not run on a JVM (there's no need for an interpreter, JIT or I think even class loaders), but it needs some runtime components (GC, Thread scheduling...) that are called "Substrate VM" (though I don't think this can be considered as a VM).
  • The Truffle Language Implementation Framework. This is a beast on its own and maybe I'll write a separate post about it. It allows "easily" writing programming languages implementations as interpreters (written in Java) for self-modifying Abstract Syntax Trees (what?!). It has allowed the creation of very efficient Ruby, Python and JavaScript implementations to run in the GraalVM. As these implementations are written in Java you can run them on a standard JVM, but they'll be slow, as it's the GraalVM JIT compiler who allows these implementations to run efficiently, so that's why Truffle is so linked to GraalVM. This is a really good (and thick) read.

GraalVM Compiler
The Java HotSpot VM uses an hybrid model to run the bytecode that javac generates and puts in your .class files. It first runs your code with an interpreter, then uses a fast but not optimized JIT compiler, C1, to transform to native code those methods that are being used frequently (hot methods). Then those methods that are called very frequently (very hot methods) are JIT compiled by C2, an optimized but slower JIT compiler. You can read a good explanation here. This is awesome, but is not specific to the Hotspot JVM, as I explaine here and here. Even though the C2 JIT compiler seems particularly advanced with its use of Profile Guided Optimization, some smart folks thought they could replace it with an even (much) better beast, and that's what the GraalVM Compiler is, an speculative, profile-guided JIT Compiler to act as a replacement for C2.

The Hot Spot JVM features the JVMCI (Java Virtual Machine Compiler interface). Your code can use this interface to pass bytecodes to a JIT compiler, get them compiled and then install that native code into the VM. Through JVMCI you can plug a JIT compiler implementing that interface into the VM, that's how the GraalVM (JIT) Compiler is added to the VM replacing the C2 JIT.

The GraalVM Compiler is included in your normal OpenJDK (and Oracle JDK) distribution. To use it rather than the standard C2 JIT compiler, you have to start the VM with these options: -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+UseJVMCICompiler
The other, more common option, is installing the GraalVM JDK distribution, that comes with the GraalVM Compiler already switched on, along with the other related technologies that I've previously mentioned: Native Image, Truffle, Substrate...

The GraalVM Compiler is written in Java, so its bytecodes have to be converted to native code at some point, this is what is called bootstrapping. From what I've understood here and what I can envision on my own, I think there are 2 options, either compile it ahead of time with Native Image, or use C1 to initially compile it to not optimized native code, and then allow it to optimize itself as its methods become hot.

It's mentioned here that the GraalVM compiler uses an Intermediate Representation (IR). I was a bit confused about what this means with regards to standard Java bytecodes, and this article came to the rescue. Basically: Java ByteCode -> IR -> Machine Code

In short, the just-in-time compiler will convert Java bytecode into SSA IR. To be more precise, it is an IR graph containing control flow and data flow. Each bytecode corresponds to several nodes (note that some bytecodes do not have corresponding IR nodes). Then, the just-in-time compiler optimizes on the IR graph.

In all this post I've been saying that GraalVM Compiler is a replacement for the C2 JIT compiler, but I've found this article saying that it replaces both C1 and C2, I think that's wrong. In most places I've read, like here, it clearly says that it only replaces the C2.