Sunday, 26 November 2023

Graalpy

A couple of years ago I wrote about my fascination with GraalVM and the Truffle Interpreters framework, but somehow I got the impression that the different truffle implementations (for Ruby, for Python...) would end up mainly as an amazing intellectual achievement but with litte real use save for a few Java applications wanting to get some scripting power or to have access to python AI-science libraries. I think was pretty wrong. I've recently looked into the truffle python implementation, graalpy, and the project is really alive and well (same as truffle ruby). Graalpy provides a python3.10 implementation that you can use as a replacement of the standard CPython implementation as long as it's only using the standard library modules or external modules written in pure python. If your application uses external extension modules, you'll have to look into the compability table.

They claim that graalpy can run certain applications up to 3 or 4 times faster than CPython, which is amazing. I'm not going to enter into that discussion, I just want to review how graalpy works.

So graalpy is a python interpreter built with the truffle framework. That means that (as you can read in my aforementioned post) indeed it's not just an interpreter, but an interpreter (written in Java) and a high performance JIT (the graalVM compiler). When graalpy executes a python souce file it compiles (as CPython does) that file to a .pyc file, containing python bytecodes. Then the graal interpreter creates an AST for those python bytecodes and interprets them. Overtime, we have the especialization/partial evaluation of the interpreter code (Java bytecodes) that interprets that AST, that is sent to the Graal Compiler to generate high performance native code. Then you go through all the crazy optimizations and deoptimizations that the ultra-smart people that build these amazing JIT's have developed. Et voilá, you have a high performance interpreter!

That's more or less what I had managed to understand 2 years ago, but graalpy comes with an extra, it makes good use of the Native Image technology. First, we have to bear in mind that the Graal JIT compiler, that is written in Java, can be compiled to native code. From here

There are two operating modes of the Graal compiler when used as the HotSpot JIT compiler: as pre-compiled machine code (“libgraal”), or as dynamically executed Java bytecode (“jargraal”).

libgraal: the Graal compiler is compiled ahead-of-time into a native shared library. In this operating mode, the shared library is loaded by the HotSpot VM. The compiler uses memory separate from the HotSpot heap. It runs fast from the start since it does not need to warm up. This is the default and recommended mode of operation.

jargraal: the Graal compiler goes through the same warm-up phase that the rest of the Java application does. That is, it is first interpreted before its hot methods are compiled. This mode is selected with the -XX:-UseJVMCINativeLibrary command line option.

That's pretty interesting. As we know the Graal compiler can be used not just as a JIT compiler, but also as an AOT compiler, and it's written in Java. For using it as JIT in your JVM process you'll import it as a library, either as native library (.so, .dll...) or as a Java jar. For importing it as a native library you'll have to previously compile it AOT. If used as Java jar, it will be first interpreted and then (for its hot methods) JIT compiled by the other JIT compiler, the less optimized C1. From here:

The primary benefit of libgraal is that compilations are fast from the start. This is because the compiler is running compiled from the get-go, by-passing the HotSpot interpreter altogether. Furthermore, it’s compiled by itself. By contrast, jargraal is compiled by C1. The result is that the compiled code of the compiler is more optimized with libgraal than with jargraal.

When you download graalpy (a .tar.gz that you just uncompress and that's all, no installation process as such) you can choose between a native version and a "normal" java version. The native version does not only mean using the native libgraal that I've previously mentioned, but that the java code for the truffle interpreter (and all the java standard libraries it depends on) has been compiled to native code. From here:

You can download GraalPy as a standalone distribution for Oracle GraalVM or GraalVM Community Edition. There are two standalone types to choose from:

Native Standalone: This contains a Native Image compiled launcher
JVM Standalone: This contains Python in the JVM configuration

If you want to use graalpy as a drop-in replacement for cpython to see if you get better performance, you want the Native Standalone version, as explained in this interesting quick reference:

The native runtime is most compatible with CPython. It runs as an ahead-of-time compiled Python launcher, which starts up faster and uses less memory.

The JVM runtime can interoperate with Java and other GraalVM languages can be added to it. You can identify it by the -jvm suffix in its name.

After installing (just unzipping the tar.gz) that "Native Standalone" version on Ubuntu, you mainly have the launcher app (graalpy), a folder with the python3.10 python modules, and a massive 350 MBs libpythonvm.so shared library, that I guesss contains the SubstrateVM and the result of compiling to native code the Java code of the Truffle interpreter and of all the Java Standard library packages used by the interpreter.

And now comes the big question. If Truffle power comes from optimizing with the Graal JIT the Java bytecodes of especialized parts of the interpreter, and now we are using a version where that interpreter has already been AOT compiled to Native code... does this make any sense? Well, others have gone through the same doubts, and there's a very interesting explanation in stackoverflow.

- Question
My understanding is that AOT compilation with native-image will result in methods compiled to native code that are run in the special-purpose SubstrateVM. Also, that the Truffle framework relies on dynamically gathered profiling information to determine which trees of nodes to partially evaluate. And that PE works by taking the JVM bytecode of the nodes in question and analyzing it with the help of the Graal JIT compiler. And here's where I'm confused. If we pass a Truffle interpreter through native-image, the code for each node's methods will be native code. How can PE proceed, then? In fact, is Graal even available in SubstrateVM?

- Answer
Besides the native code of the interpreter, Substrate VM also stores in the image a representation of the interpreter (a group of methods that conform the interpreter) for partial evaluation. The format of this representation is not JVM bytecodes, but the graphs already parsed into Graal IR form. PE runs on these graphs producing even smaller, optimized graphs which are then fed to the Graal compiler, so yes SVM ships the Graal compiler as well in the native image. Why the Graal graphs and not the bytecodes? Bytecodes were used in the past, but storing the graphs directly saves the (bytecodes to Graal IR) parsing step.

As I've said, that answer is really, really interesting. I thought that the Substrate VM used in Native Image applications provided little more than memory management (the JVM memory model) and Garbage Collection, but at least in this case it also includes the Graal Compiler. Additionally you can invoke the Graal Compiler (through the JVM Compiler Interface - JVMCI) passing over not just Java bytecodes, but directly Graal IR graphs.

Out of excitement I was thinking that graalpy would also bring an end to that huge CPython problem, the GIL (Global Interpreter Lock). Unfortunately that's not the case so far. Some of the graalpy guys have taken part in the discussions about the diffeent approaches to removing the GIL from CPython, and have explained that so far they also use a GIL, mainly to avoid problems with C-Extensions. It's quite a pity, seen that in the ruby world, while the current main ruby interpreter YARV (sort of the equivalent to CPython) built in C also has a GIL, Truffle Ruby has got rid of it.

On several occasions while checking some "academic" stuff about Truffle interpreters (not just graalpy) I've seen some references to PyPy (a python implementation using a JIT), comparing its Tracing JIT approach to the Partial Evaluation approach taken by Truffle. It seems now that the guys from graalpy and PyPy are working together on HPy a better API for C-extensions for python.

Thursday, 16 November 2023

Python Lambdas to the Limit

The limited support for lambdas in python (the body of a lambda must be a single expression) is something that felt really painful to me when I went back into python programming last year. Overtime I've realised that this is not so horrible, as this limitation forces you to use a local function (that if) declared with a meaningful name, that can make your code more clear than using a relativelly long lambda declared in place. Anyway, I continue to find this limitation annoying, as in many cases I would prefer to use a lambda than a local function

It's important to make clear that the lambda limitation is not about having just a single line. You can have a lambda with multiple lines, as long as those lines make up an expression. What you can not have in a lambda is statements (not even one).

Related to this, I've recently come across a trick that allows us to overcome this limitation in many use cases. We can declare the body of our lambda as a sequence or a list with expressions as items. That sequence/list is an expression with nested expressions (its items). I mean, this is perfectly valid:


f1 = lambda x: (
    print(f"{x} received"),
    x+2
)

OK, that's good, but a normal use case involves assigning values to variables in the lambda and doing something with those variables. Assigning a value to a variable is a statement... well, not necessarily, as in python we have Assignment Expressions (aka walrus operator)! Add to the mix Conditional Expressions (aka ternary operator), and we are pretty well covered. We can write things like this:


fn = lambda x, y: (
    print("started"),
    x := x.upper(),
    y := y.upper(),
    print(f"result: {x} - {y}")
)

fn("aaa", "bbb")
#started
#result: AAA - BBB


Notice that the lambda returns the expression that makes up its body, so in these cases it's returning the sequence (with the different evaluated expressions), e.g. (None, 'AAA', 'BBB', None). That's fine in cases like the above where we are not interested in the return, but in the common case where the lambda is intended to "calculate" something and return it (that would be the last expression in the sequence), we'll have to invoke the lambda and access to its last item, I mean: my_fn("a", "b")[-1] which reads a bit strange. To avoid this, we can write a factory function (I'll call it "multi" for "multi-lambda") that wraps the lambda in a function that takes care of invoking it and returning the last value. I mean:



def multi(fn: Callable) -> Callable:
    return lambda *args: fn(*args)[-1]
    

That we can use in a rather complete example like this:



cities = ["Paris", "Xixon"]

cities2 = map(multi(lambda x: (
    print(f"formatting: {x}"),
    y := x.upper() if x.startswith("P") else x.lower(),
    f"||{x}-{y}||"
)), cities)

print(list(cities2))

#formatting: Paris
#formatting: Xixon
#['||Paris-PARIS||', '||Xixon-xixon||']

  

I would have not thought we could write code like this in python!

I can't close this post without mentioning coconut python, a functional superset of python that among many other cool features comes with (multi) statement lambdas.

Monday, 6 November 2023

JavaScript LazyProperty and Lookup

In my previous post I talked about lazy/cached properties in Python. We can use similar techniques in JavaScript, as explained here and here. The second post leverages decorators, available only in (very) recent versions.

I have nothing to add to those articles, and this post is indeed about the basis: properties lookup and getters-setters. While in python we call attributes to the fields in one object, in JavaScript we call them properties. Each property has a Property Descriptor object associated to it, that defines the behaviour of the property. We have 2 types of descriptors (which means that we have 2 types of properties), data descriptors (that correspond with a normal data field in other languages) and accessor descriptors (that correspond with getters/setters in other languages).

I have not found any complete reference about property look-up in JavaScript, so what I'll write below is based on experience and some recent tests. Notice that accessor descriptors in JavaScript correspond to Python descriptors, but as we saw last week Python differentiates between Data and Non-Data Descriptors (depending on the combination of __get__, __set__, __delete__ that they feature).

When looking up a property in an object JavaScript will check first in that object, and if not there it'll check in its prototype chain. When getting the property there's no extra logic. However, when setting it, it's a bit more complex.
- If the property exists in the object, it's used/updated.
- If the property does not exist in the object:
--- if it exists in the prototype chain:
------ if what it finds in the prototype chain is an accessor it'll use that accessor. This means that if the accessor has a set(), it'll use that setter, but if it has no set() it'll do nothing.
------ If the property exists in the prototype chain as a data property, it'll set it in the object, not in the prototype.
--- if it does not exist in the prototype chain it'll set it in the instance

Let's see some code:


class Book {
    constructor(name) {
        this.name = name;
    }

    get isbn() {
        console.log("isbn property getter");
        return `${this.name}-ISBN`;
    }
}

So my Book class has an isbn getter property (it's defined in the class, so it's in Book.prototype, reachable from a Book instance through the prototype chain), that automatically "calculates" the isbn, but let's say that for a specific instace I want to set a particular value. This will do nothing:


// this line has no effect, no crash, but it will set nothing
// as the getter only was defined in the class, not in the instance, trying to set the value in the instance does nothing
console.log("- try to set it to aaaa")
b1.isbn = "aaaa";
console.log(b1.isbn);
// Atomka-ISBN

I have to define a property in the instance (I'll just use a data property) using Object.defineProperty


// I have to do it like this:
console.log("- define isbn property in the instance and set to 'aaaa'")
Object.defineProperty(b1, "isbn", {
    value: "aaaa",
    writable: true,
    enumerable: true,
    configurable: true,
  });
console.log(b1.isbn);
//aaaaa

Now that the property exists in the instance, trying to set it again will work fine, as it will just use the data property in the instance, and not the accessor property in the prototype chain:


//now that it's been set in the instance I can update it
console.log("- update it to 'bbbb'")
b1.isbn = "bbbb"
console.log(b1.isbn);

An now an example of how getting a data property from an object will search it in the prototype chain if it's not directly in the object, but setting it will set it in the object.


Book.prototype.canBeRead = true
let b1 = new Book("Atomka");

console.log(`b1 can be read: ${b1.canBeRead}`); //it gets it from the prototype chain
// true

b1.canBeRead = false;
console.log(`b1 can be read: ${b1.canBeRead}`); //but it sets it in the instance
// false

b2 = new Book("Ange Rouge")
console.log(`b2 can be read: ${b2.canBeRead}`); //it remains as true in the prototype chain
// true


Additionally, notice that the delete operator only works in direct properties (own properties) of an object, it does not go through the prototype chain at all.