Tuesday 14 September 2021

Truffle Interpreters

In my previos post about Graal VM I mentioned that maybe I would write an additional post about Truffle. This is how I summarized in that post what Truffle is supposed to be:

The Truffle Language Implementation Framework. It allows "easily" writing programming languages implementations as interpreters (written in Java) for self-modifying Abstract Syntax Trees (what?!). It has allowed the creation of very efficient Ruby, Python and JavaScript implementations to run in the GraalVM. As these implementations are written in Java you can run them on a standard JVM, but they'll be slow, as it's the GraalVM JIT compiler who allows these implementations to run efficiently, so that's why Truffle is so linked to GraalVM. This is a really good (and thick) read.

This idea of writing high performance interpreters seems odd to me. I know very little about compilers, interpreters and VM's, but it seems to go against the last decades of common Computer Science knowledge. Indeed, the Java guys spent much time and efforts to adapt the JVM, by means of the Da Vinci project and the new invokedynamic bytecode instruction, to strongly increase its performance with dynamic languages. After doing that, all of a sudden comes Truffle and you can write interpreters for languages that you compile to "self-modifying AST's" (not to Java bytecodes), and those implementations of JavaScript, Ruby... seem to be faster that the implementations that compiled those languages to Java bytecodes and that leveraged that recent and magical invokedynamic instruction! Indeed, Oracle has discontinued Nashorn (that was developed along with the invokedynamic-Da Vinci thing and that compiled JavaScript to Java bytecodes containing invokedynamic), and replaced it with Graal.js, that is based on Truffle.

Truffle is intended to run with the GraalVM Compiler, so OK, as this is an optimized JIT, your interpreter, written in Java and compiled to bytecodes, will end up compiled into a very optimized machine code thanks to all the optimizations that this cool JIT will do overtime, but anyway, we are optimizing an interpreter... and there can not be such a difference between the GraalVM Compiler and the Hotspot JIT to allow for this big change that Truffle seems to mean.

Well, I think I've finally understood what's the magic with Truffle. This article has been quite esential (and this other one has also helped). First we have that the AST that gets constructed for the program that you are interpreting will evolve (based on runtime profiling) from dynamically typed nodes into statically typed nodes. And then comes the real magic, the Partial Evaluation technique. Your interpreter code (the java bytecodes, not the code being interpreted) that was written to work with any types, will be especialized to work with the types being used in these executions. This is the Futamura projections, your interpreter gets specialized to run a specific source code!

So in the end your interpreter will ask Graal to compile to native code especialized versions of the different classes that make up the interpreter. From one of the articles:

Partial evaluation is a very interesting topic in theoretical computer science – see the Wikipedia article on Futamura projections for some pretty mind-bending ideas. But for our case, it means compiling specific instances of a class, as opposed to the entire class, with aggressive optimizations, mainly inlining, constant folding and dead code elimination.

All the above seems terribly complex (and magic) to me, and at some point I still wonder if I'm really understanding it correctly. The Truffle-Ruby documentation here has a couple of paragraphs that seem to confirm that my understanding is correct.

The Truffle framework gets the bytecode representation of all of the AST interpreter methods involved in running your Ruby method, combines them into something like a single Java method, optimizes them together, and emits a single machine code function.

TruffleRuby doesn’t use invokedynamic, as it doesn’t emit bytecode. However it does have an optimizing method dispatch mechanism that achieves a similar result.

Same as Ruby has now the "classic" JRuby (that continues to use as far as I know a strange mix of a Ruby interpreter written in Java and a runtime compiler to java bytecodes, that then will be either interpreted or JIT compiled by the JVM) and the new Truffle interpreter, it would be interesting to see if other languages plan a similar move. For the moment Groovy has no plans for that. After the performance improvements they reached when moving from callsite caching to indy (invokedynamic), they don't seem to see any reason to create a Groovy Truffle interpreter. Bear in mind also that as with all Java code, they are getting additional performance improvements "for free", just by running with the GraalVM JIT rather than the C2 Hotspot JIT.

No comments:

Post a Comment