Sunday 26 November 2023

Graalpy

A couple of years ago I wrote about my fascination with GraalVM and the Truffle Interpreters framework, but somehow I got the impression that the different truffle implementations (for Ruby, for Python...) would end up mainly as an amazing intellectual achievement but with litte real use save for a few Java applications wanting to get some scripting power or to have access to python AI-science libraries. I think was pretty wrong. I've recently looked into the truffle python implementation, graalpy, and the project is really alive and well (same as truffle ruby). Graalpy provides a python3.10 implementation that you can use as a replacement of the standard CPython implementation as long as it's only using the standard library modules or external modules written in pure python. If your application uses external extension modules, you'll have to look into the compability table.

They claim that graalpy can run certain applications up to 3 or 4 times faster than CPython, which is amazing. I'm not going to enter into that discussion, I just want to review how graalpy works.

So graalpy is a python interpreter built with the truffle framework. That means that (as you can read in my aforementioned post) indeed it's not just an interpreter, but an interpreter (written in Java) and a high performance JIT (the graalVM compiler). When graalpy executes a python souce file it compiles (as CPython does) that file to a .pyc file, containing python bytecodes. Then the graal interpreter creates an AST for those python bytecodes and interprets them. Overtime, we have the especialization/partial evaluation of the interpreter code (Java bytecodes) that interprets that AST, that is sent to the Graal Compiler to generate high performance native code. Then you go through all the crazy optimizations and deoptimizations that the ultra-smart people that build these amazing JIT's have developed. Et voilá, you have a high performance interpreter!

That's more or less what I had managed to understand 2 years ago, but graalpy comes with an extra, it makes good use of the Native Image technology. First, we have to bear in mind that the Graal JIT compiler, that is written in Java, can be compiled to native code. From here

There are two operating modes of the Graal compiler when used as the HotSpot JIT compiler: as pre-compiled machine code (“libgraal”), or as dynamically executed Java bytecode (“jargraal”).

libgraal: the Graal compiler is compiled ahead-of-time into a native shared library. In this operating mode, the shared library is loaded by the HotSpot VM. The compiler uses memory separate from the HotSpot heap. It runs fast from the start since it does not need to warm up. This is the default and recommended mode of operation.

jargraal: the Graal compiler goes through the same warm-up phase that the rest of the Java application does. That is, it is first interpreted before its hot methods are compiled. This mode is selected with the -XX:-UseJVMCINativeLibrary command line option.

That's pretty interesting. As we know the Graal compiler can be used not just as a JIT compiler, but also as an AOT compiler, and it's written in Java. For using it as JIT in your JVM process you'll import it as a library, either as native library (.so, .dll...) or as a Java jar. For importing it as a native library you'll have to previously compile it AOT. If used as Java jar, it will be first interpreted and then (for its hot methods) JIT compiled by the other JIT compiler, the less optimized C1. From here:

The primary benefit of libgraal is that compilations are fast from the start. This is because the compiler is running compiled from the get-go, by-passing the HotSpot interpreter altogether. Furthermore, it’s compiled by itself. By contrast, jargraal is compiled by C1. The result is that the compiled code of the compiler is more optimized with libgraal than with jargraal.

When you download graalpy (a .tar.gz that you just uncompress and that's all, no installation process as such) you can choose between a native version and a "normal" java version. The native version does not only mean using the native libgraal that I've previously mentioned, but that the java code for the truffle interpreter (and all the java standard libraries it depends on) has been compiled to native code. From here:

You can download GraalPy as a standalone distribution for Oracle GraalVM or GraalVM Community Edition. There are two standalone types to choose from:

Native Standalone: This contains a Native Image compiled launcher
JVM Standalone: This contains Python in the JVM configuration

If you want to use graalpy as a drop-in replacement for cpython to see if you get better performance, you want the Native Standalone version, as explained in this interesting quick reference:

The native runtime is most compatible with CPython. It runs as an ahead-of-time compiled Python launcher, which starts up faster and uses less memory.

The JVM runtime can interoperate with Java and other GraalVM languages can be added to it. You can identify it by the -jvm suffix in its name.

After installing (just unzipping the tar.gz) that "Native Standalone" version on Ubuntu, you mainly have the launcher app (graalpy), a folder with the python3.10 python modules, and a massive 350 MBs libpythonvm.so shared library, that I guesss contains the SubstrateVM and the result of compiling to native code the Java code of the Truffle interpreter and of all the Java Standard library packages used by the interpreter.

And now comes the big question. If Truffle power comes from optimizing with the Graal JIT the Java bytecodes of especialized parts of the interpreter, and now we are using a version where that interpreter has already been AOT compiled to Native code... does this make any sense? Well, others have gone through the same doubts, and there's a very interesting explanation in stackoverflow.

- Question
My understanding is that AOT compilation with native-image will result in methods compiled to native code that are run in the special-purpose SubstrateVM. Also, that the Truffle framework relies on dynamically gathered profiling information to determine which trees of nodes to partially evaluate. And that PE works by taking the JVM bytecode of the nodes in question and analyzing it with the help of the Graal JIT compiler. And here's where I'm confused. If we pass a Truffle interpreter through native-image, the code for each node's methods will be native code. How can PE proceed, then? In fact, is Graal even available in SubstrateVM?

- Answer
Besides the native code of the interpreter, Substrate VM also stores in the image a representation of the interpreter (a group of methods that conform the interpreter) for partial evaluation. The format of this representation is not JVM bytecodes, but the graphs already parsed into Graal IR form. PE runs on these graphs producing even smaller, optimized graphs which are then fed to the Graal compiler, so yes SVM ships the Graal compiler as well in the native image. Why the Graal graphs and not the bytecodes? Bytecodes were used in the past, but storing the graphs directly saves the (bytecodes to Graal IR) parsing step.

As I've said, that answer is really, really interesting. I thought that the Substrate VM used in Native Image applications provided little more than memory management (the JVM memory model) and Garbage Collection, but at least in this case it also includes the Graal Compiler. Additionally you can invoke the Graal Compiler (through the JVM Compiler Interface - JVMCI) passing over not just Java bytecodes, but directly Graal IR graphs.

Out of excitement I was thinking that graalpy would also bring an end to that huge CPython problem, the GIL (Global Interpreter Lock). Unfortunately that's not the case so far. Some of the graalpy guys have taken part in the discussions about the diffeent approaches to removing the GIL from CPython, and have explained that so far they also use a GIL, mainly to avoid problems with C-Extensions. It's quite a pity, seen that in the ruby world, while the current main ruby interpreter YARV (sort of the equivalent to CPython) built in C also has a GIL, Truffle Ruby has got rid of it.

On several occasions while checking some "academic" stuff about Truffle interpreters (not just graalpy) I've seen some references to PyPy (a python implementation using a JIT), comparing its Tracing JIT approach to the Partial Evaluation approach taken by Truffle. It seems now that the guys from graalpy and PyPy are working together on HPy a better API for C-extensions for python.

No comments:

Post a Comment