I have a fascination with Graal and the Truffle interpreters framework, though it's all from a theoretical point, as I've never built an interpreter myself. The thing is that recently I've found out about a new addition to Truffle, the bytecode DSL. This means that Truffle supports now the 2 main type of interpreters: Tree Parsing Interpreters and bytecode interpreters.
I found this a bit odd at first, as it was not clear to me how to reconcile this with what I understood as the main super-powers of Truffle. The "traditional" approach in Truffle is writing Tree Parsing Interpreters (AST interpreters). Summarizing what I explain in some of my previous posts, the nodes in this Tree correspond to java methods (Java bytecodes) that the interpreter invokes. These nodes can get specialized to more specific nodes thanks to profiling, and then when a guest language method is hot, the Java bytecodes for the nodes making up that method are sent to Graal for compiling it to native code (this is the Partial Evaluation part). The equivalent to specializing the AST nodes also exists for the bytecodes case, those bytecodes can be specialized/quickened in a way very similar to what the Python Adaptive Specializing Interpreter does. But for the compilation part, if with the bytecode DSL we no longer have a tree made up of nodes, are we missing the Partial Evaluation magic?
No, we are not missing anything. For each Guest language bytecode of the program we are executing we'll have a Java method that executes it. When a guest language method is hot, the Java methods for the bytecodes making up that method will be sent to Graal for compilation, so this is the same we do with AST nodes.
Confirming this with a GPT has provided me with a better understanding of the Partial Evaluation and optimizations. When a method is hot and it's nodes (or guest language bytecode) are sent for compilation Truffle can decide to send only part of the method, not the whole method (path specific compilation). When Truffle does partial evaluation, it traces the actual execution path that was taken during profiling. This means that if we have an "if-else" and the profiling shows that the condition is always true it will only send for compilation the "if part". Of course it adds guards so that if the assumptions taken becomes false it can deoptimize the code (transfer back to the interpreter)
There's an additional element in how Truffle can achieve such excellent performance, inlining (both for AST and bytecode interpreters). When Truffle sends the java methods for the nodes or bytecodes of a method (or of part of a method based on optimizations) for compilation, it will also send those methods called from that method, and will inline them in the generated native code.
A common taxonomy of JIT compilers is Method-based (Graal, HotSpot, .Net...) vs Tracing JITs (LuaJIT, older TraceMonkey).
Method-Based JITs (like Graal/Truffle, HotSpot, .NET)
- Compilation unit: entire method
- When a method becomes hot, compile it
- Can still do path specialization within that method
- Inlining: pulls called methods into the caller during compilation
Tracing JITs (like LuaJIT, older TraceMonkey)
- Compilation unit: hot trace across multiple methods
- Traces execution through method calls, loops, returns
- The "trace" might start in methodA(), call into methodB(), and return - all one compiled unit
- More aggressive cross-method optimization
The interesting thing is that while Graal is primarily method-based, with very aggressive inlining it can achieve trace-like behavior.
Pure tracing JITs can cross method boundaries more naturally, but modern method-based JITs like Graal blur this distinction through aggressive inlining. The end result can be quite similar, just with different conceptual models!
Example:
Tracing JITs Hot loop detected spanning multiple methods
- Record exact execution path
- Compile: loop_header → methodA → methodB → loop_back
- One flat piece of native code
Graal methodA is hot
- Compile methodA
- Inline methodB call
- Inline methodC call
- Result looks similar but structured around methodA
No comments:
Post a Comment