Sunday 3 December 2023

Truffle vs Invokedynamic

Reading (and writing) about Truffle and graalpy lately has had me thinking (again) about languages and interpreters. When I first heard about Truffle I wrote in the corresponding post:

Indeed, the Java guys spent much time and efforts to adapt the JVM, by means of the Da Vinci project and the new invokedynamic bytecode instruction, to strongly increase its performance with dynamic languages. After doing that, all of a sudden comes Truffle and you can write interpreters for languages that you compile to "self-modifying AST's" (not to Java bytecodes), and those implementations of JavaScript, Ruby... seem to be faster that the implementations that compiled those languages to Java bytecodes and that leveraged that recent and magical invokedynamic instruction!

After that initial astonishement I managed to more or less understand how Truffle, Partial Evaluation and so on work and make code fly. It seems like the future of dynamic languages in the JVM will be based on Truffle rather than invokedynamic. Oracle discontinued Nashorn (Javascript engine based on invokedynamic) while promoting its Truffle equivalent. For Python on the JVM, given that Jython remains stuck in Python 2.7, it's clear that graalpy has no other contender. However, for Ruby we have alive and well both a "traditional" version, JRuby that years ago was updated to leverage invokedynamic, and a Truffle version, TruffleRuby.

In case you wonder if TruffleRuby could be making some use of invokedynamic (in the code of the interpreter methods for the different AST nodes... the answer is NO

- How is this related to invokedynamic?
TruffleRuby doesn't use invokedynamic, as it doesn't emit bytecode. However it does have an optimizing method dispatch mechanism that achieves a similar result.

Already in my post from 2021 I had written something that I had forgotten and that pretty much surprised me at that time:

the "classic" JRuby (that continues to use as far as I know a strange mix of a Ruby interpreter written in Java and a runtime compiler to java bytecodes, that then will be either interpreted or JIT compiled by the JVM)

I tend to forget that interpreters do not need a bytecode stage, they can parse the source into an AST and directly interpret that AST, and indeed this used to be the common behaviour of first interpreters. The CPython interpreter compiles the python sources to bytecodes for a stack based VM (and saves them to a .pyc file) and then interprets those bytecodes (as we saw in my last post graalpy also uses those pyc bytecode files as an intermediate step for building its ASTs. The modern ruby C runtime YARV (that includes both an interpreter and a JIT) also compiles ruby source code to bytecodes for a stack based VM, but the old C runtime, MRI, was just building an AST from the ruby source and direclty interpreting that complex AST (I think this is what is called a pure AST-walking interpreter)

JRuby is a complex beast that is well explained here. It does not use the "YARV ruby bytecode format" or a specific one, it creates an AST from the ruby source and starts to interpret it (AST-walking interpreter). When a method becomes "hot" it will JIT compile it to Java bytecodes that will be run by the Java interpreter, that in turn can decide that this method has become "hot" and compile it to native code (and reoptimize via PGO and so on...).

JRuby’s JIT compiler is similar in nature, the primary difference being the source and destination formats. In Java the source format is Java bytecode, and the destination format is native machine instructions. Conversely, in JRuby, the source format is the JRuby AST, and the destination format is Java bytecode. Perhaps the most fascinating aspect of the JRuby JIT compiler is that it benefits from both the initial compilation into Java bytecode, and then later, when Java may attempt to translate the JRuby-generated bytecode into native machine instructions. So effectively, it is possible to get a double-JIT on your executing code.

That's amazing, you end up with 2 interpreters (for the AST built from the ruby source and for the Java bytecodes) and 2 JITs (for turning the ruby AST into Java bytecodes and for turning those bytecodes into native code, which indeed goese throught the 2 Java JITs: C1 and C2 or C1 and GraalCompiler). Regardless of how cool that looks, it makes pretty good sense to me that RubyTruffle performs better with its more straightforward approach, interpreting the AST, especializing/Partial Evaluation, and JIT compilation to Native code.

What is not clear to me is whether TruffleRuby creates its AST directly from ruby source code or whether it has an intermediate step and leverages YARV bytecodes (or another bytecode format).

As a final note, it's interesting to note that as explained here, Jython goes a different route from JRuby, it does not have an interpreter phase, it directly compiles to Java bytecodes and feeds them to the JVM. This is the same behaviour of IronPython on .Net.

Jython uses an ANTLRv3 generated parser to generate an AST that conforms to the spec for the one you can get from Pythons built in compile function (if you ask for AST Jython will just return after this stage). This is then fed through a custom compiler that uses the ASM Java bytecode generation library to generate Java bytecode that is then loaded and executed. Jython does not (in contrast to JRuby) have an initial interpreted stage, but compiles directly to Java bytecode and lets the JVM handle it from there. Because of this I've never liked it when people describe Jython as a Python interpreter written in Java, I'd much rather call it a Python implementation for Java.

By the way, while reading different sources about all this stuff, I've found a couple of references to a free ebook about interpreters programming. that looks pretty interesting.

No comments:

Post a Comment