Tuesday 17 September 2024

Function Currying Revisited (fixed)

For whatever the reason some days ago I was looking into adding a curry function to a python micro-library with some utility functions that I use in some small projects. I took a look to this post that I wrote some years ago, and I realized that my javascript implementation was wrong. I'll start by writing it again with the same logic, but supporting calls with several parameters at one time, that was not supported in that old version.



//This one is WRONG
function buggyCurry(originalFunc) {
    let providedArgs = [];
    return function curriedFn(...args) {
        providedArgs.push(...args);
        return providedArgs.length >= originalFunc.length
            ? originalFunc(...providedArgs)
            : curriedFn;
    };
}


Well, the idea is that to create a curried version of a provided function we have to create a function with state, where that state is the original function and the list of already provided parameters. As we invoke the curried function these parameters are appended to that list and the same curried function is returned. Once we have provided all the paremeters the original function is invoked and its return value returned. OK, but the above implementation is wrong. Keeping the list of provided parameters as state of the curried function means that when we start another round of calls to the curried function (a "curried-flow"), those parameters are already there and it fails. I mean:


function formatMessages(msg1, msg2, msg3) {
    console.log(`${msg1}-${msg2}-${msg3}`);
}

let curriedFormat = buggyCurry(formatMessages);

try {
    curriedFormat("a")("b")("c");
    curriedFormat("d")("e")("f");
}
catch (ex) {
    console.log(`Error: ${ex}`);
}
// a-b-c
// a-b-c
// c:\@MyProjects\MyJavaScriptPlayground\currying\curry.js:35
// curriedFormat("d")("e")("f");
//TypeError: curriedFormat(...) is not a function

The first invocation is fine. But the second invocation fails, cause rather than starting again with an empty list of parameters it already starts with those of the previous execution.

To fix that, each time we start a new invocation of the curried function (a "curried-flow") we have to start with a new parameters list. We have to separate the curried function, from the function that stores the state and returns itself until all parameters are provided (the curry logic). The function that starts the "curried-flow" is a sort of bootstrapper or factory. This is what I mean:



//This one is better, but still not fully correct
function curry(originalFunc) {
    // no difference between using an arrow function or a named function expressions, save that the named expression makes clear to me that it's a bootstrapper (or factory)
    //return (...args) => {
    return function curryBootStrap(...args) {
        let providedArgs = [];
        //II(named)FE
        return (function curriedFn(...args) {
            providedArgs.push(...args);
            return providedArgs.length >= originalFunc.length
                ? originalFunc(...providedArgs)
                : curriedFn;
        })(...args);
    };
}


function formatMessages(msg1, msg2, msg3) {
    console.log(`${msg1}-${msg2}-${msg3}`);
}

let curriedFormat = curry(formatMessages);
console.log(curriedFormat.name);

curriedFormat("a")("b")("c");
curriedFormat("d")("e")("f");
curriedFormat("g", "h")("i");
curriedFormat("j", "k", "l");

//a-b-c
//d-e-f
//g-h-i
//j-k-l

That (apparently) works!. But indeed I've just realised that it's not fully correct. It works fine in that use case where we only invoke multiple times the initial function, but it could be that we want to store a reference to some of the intermediate functions and then invoke it multiple times. In that case we would face the same problem as with the first version. That explains whay when checking some articles I always came across with a different implementation that seemed less natural to me, but that is the good one. Based on what I've read here (excellent article where he also talks about the bug that I've just shown), I've implemented it "my way":



// THIS ONE IS CORRECT
function curry(fn) {
    function saveArgs(previousArgs) {
        return (...args) => {
            const newArgs = [...previousArgs, ...args];
            return newArgs.length >= fn.length 
                ? fn(...newArgs)
                : saveArgs(newArgs);
        };
    }
    // Start with no arguments passed.
    return saveArgs([]);
}

In this version, on each invocation of the "curried-flow" either we have reached the end and the original function is invoked, or a new function that traps the existing parameters and joins them in a new array with the ones that it receives, and contains this "curry-logic", is created. The creation of that function is done through another function that I've called saveArgs (the original article named it accumulator). saveAs receives the existing parameters, allowing its child function to trap them. Notice that saveArgs is not a recursive function (neither direct nor indirect), the next invocation to saveArgs is done from a different function (the child function) that is called from the next "curried-flow" invocation, so these calls do not get stacked.

In the above implementations I'm using named function expressions in several places where I could use an arrow function. I'm aware of the differences between normal functions and arrow functions (dynamic "this" vs lexical "this"), that in these cases do not play any role as none of those functions is making use of "this". Using a named function is useful to me cause I try to use semantic names (it's sort of like adding a comment). Additionally there is some trendy use of arrows for factories of other functions that I don't find particularly appealing. I mean, rather than this compact version:


let formatFactory = (pre, pos) => (msg) => `${pre}${msg}${pos}`;

let format1 = formatFactory("-- ", " ..");
console.log(format1("name"));


I prefer writing this more verbose version:


function formatFactory(pre, pos) {
    return (msg) => `${pre}${msg}${pos}`;
}
let format = formatFactory2("-- ", " ..");
console.log(format("name"));

I don't write much JavaScript these days, and I was getting some problems cause I was mixing Kotlin behaviour with JavaScript behaviour. In Kotlin lambdas, if you don't explicitly return a value using the qualified return syntax, the value of the last expression is implicitly returned. This is so regardless of how many expression, declarations or statements it contains. In JavaScript, arrow functions only have an implicit return when they are made up of just one expression.

Wednesday 11 September 2024

Literal Syntax for Iterable with Lazy values

The Python discussion forum is a great source of interesting ideas (and also of frustration when some ideas are rejected in a rather harsh way based on the "it's not pythonic" bullshit). The other day one guy was proposing something that was like a sort of "generator literal" or "literal syntax for an iterable with lazy values". I mean, a syntax to define an iterable with values that won't be calculated until each of them is needed (that is, lazily). A straightforward way to express a "collection" made up of values where we need that at least some of them gets generated by a function call just at the moment when the value is needed by the iteration. These are "lazy values"

The syntax he was proposing was not feasible as it would clash with other existing uses, but what is particularly interesting is the workaround that he's using (I'm simplifying one of the lines to make it more clear). It took me a bit to understand it:


values: Iterator[str] = (lambda: (
    (yield "abc"),
    (yield expensive_func("bbb")),
    (yield "something else"),
))()


So, what's that? First he's working around the painful Python's lambda limitations (they can't contain statements) with the technique that I mentioned in this post, the expression (a tuple) with nested expressions (each element of the tuple). We can yield from each nested expression (notice that we have to wrap yield in parentheses so that it's considered as an expression, not as a statement). The above code is equivalent to this:



def gen_fn():
    yield "abc"
    yield expensive_func("bbb")
    yield "something else"
	return (None, None, None)

values: Iterator[str] = gen_fn()


There's another simple, obvious approach. A helper class that wraps the values and "lazy values" (with the lazy values represented as functions that it has to invoke). Something like this:



class LazyIterable:
    def __init__(self, items: Iterable):
        self.items = items

    def __iter__(self):
        for item in self.items:
            yield item() if callable(item) else item


gen_ob = LazyIterable([
    "abc",
    lambda: expensive_func("bbb"),
    "something else",
])



There's not this literal syntax in JavaScript either, so we would have to use the same techniques I've just described, for example:


let genOb = (function*() {
    yield "abc";
    yield expensiveFunc("bbb");
    yield "something else";
})();

console.log("genOb created")

for (let it of genOb) {
    console.log(it);
}

Notice that we can not use arrow functions to define a generator (there's a proposal for that from 5 years ago...), so we are using a "traditional" function expression.

Thursday 5 September 2024

JVM Tiered Compilation

Over the years I've written multiple posts with basic information about JIT's and interpreters (I'll just link a few of them: [1], [2]). We could say that the combination of an interpreter, multi-tier JIT compilation, profiling, optimization and deoptimization... really fascinates me. The truffle interpreters and Graal JIT compilation combination [1], [2] is one of the last advances in the area that has kept me dreming with better understanding how all this works. While rethinking a bit about this topic I've realised that I had some misconceptions and doubts (kind of: "I think it's like that, but indeed I'm not sure about having read it or just being assuming it"). For example, I had got to wrongly think that the cycle of profiling-optimizing a given method was sort of neverending, which is not the case (at least in the JVM, either with C2 or Graal). So as of late I've been reading some pretty interesting material on this topic, mainly about the JVM, and I'll summarize here my findings.

When we have an interpreter, a "simple" and fast JIT and a more advanced and slower (in compilation time, of course the code it generates is faster) JIT, the base principle that drives the interaction between these 3 components is that the interpreter counts how many times it invokes a method and when it exceeds a threshold (the method is hot) it compiles it with the "simple" JIT compiler. This compiler adds "counting logic" to the code it generates, so that the method invokation counting continues and when another threshold is exceeded the method is invoked with the advanced JIT compiler. OK, things are more complex than that in the JVM. This article gives an excellent explanation (just notice that if we are using the Graal VM in not AOT mode we don't have C2, but Graal)

First, the interpreter is not just counting method invokations, it's also gathering profiling information. Well, from what I've read, the interpreter will only start to gather profiling information after a certain threshold of invokations. The "simple" C1 compiler also adds logic to gather profile info to the native code that it generates, so that in the future the "advanced" C2 compiler can use that profiling information to generate hyper-optimized code. It's important to note that profiling has a cost. When your code is not just doing its job but also gathering profile information it's doing extra work. That's why the C2 compiler is the ending point, it will not do further optimizations. It does not generate code with "invokation counting" and profile gathering logic so that in the future another more optimized version can be compiled. The code generated by C2 is just as optimized as "necessary", and that's all. Well, indeed there's an additional player, deoptimization. Both C1 and C2 do assumptions when compiling, with the intent to get optimized code, but those assumptions could be proven wrong in the future (the generated code contains assertions/guards to verify this), so in that case the code has to be deoptimized, returning to the interpretation stage and starting over. This was another source of doubts for me, I thought deoptimization would jump to a previous native code version, but no, it's mentioned in multiple sources that it just returns to the interpreted form. These 2 images are crystal clear:

Furthermore, there are 3 levels of C1 compilations. It's well explained in that article, and also in this one.

So the levels of compilation are as follows:

  • 0: Interpreted code
  • 1: Simple C1 compiled code
  • 2: Limited C1 compiled code
  • 3: Full C1 compiled code
  • 4: C2 compiled code

C1 and C2 compilers run in separate threads (even several threads for each compiler depending on your number of cores). Each compiler has a queue where it receives compilation requests (so your program does not suddenly "stop" waiting for a compilation). Trivial methods that would not benefit of further profiling and C2 compilation are sent from the interpreter to C1 for a level 1 compilation, that won't add any profiling to the generated code (it won't even add the invokation-counting logic to it). That code is "final" unless it has to be deoptimized. Non trivial methods are sent to C1 level 3 (that adds full profiling) or C1 level 2 (that adds light profiling). It's not clear to me yet when it sends to level2 rather than level3. Finally, when a method is hot enough it's sent to the C2 compiler along with all the profiling information.

There's another point that I should detail a bit more. When I say that the interpreter and the C1 generated code are counting how many times a method is invoked, it's indeed more complex. It also counts how many loop iterations run inside that method. Invoking a method once, but then running a loop for 1000 iterations in that method makes it hotter than invoking 100 times a method that has no loops.

Compilation is based on two counters in the JVM: the number of times the method has been called, - and the number of times any loops in the method have branched back.

As one methods evolves from its bytecode form (to be interpreted) to one of several native forms (and then back to bytecode form if it has been deoptimized), how does the code that invokes that method know where to jump to? We are not going to be patching the many possible callsites to that method, so the solution seems simple (once you read it from someone smarter than you), adding an extra level of indirection. So calls to a method will go through a table that points to the actual code (the native code version or code that continues interpreting bytecodes). So it's that table who gets patched to point to the right address as the method evolves. We already have this kind of tables (i-tables and v-tables) for virtual methods, so I wonder if this patching is just applied on such v-table, or entries in that v-table point to an additional "bytecode vs native" table on which this patching happens. Wow, VM's are amazingly complex.

Another point that seems amazingly complex to me is the deoptimization part. So you are executing the Jitted compiled version of one method and an assertion/guard fails and we have to go back to the bytecode interpreted version. This is not only that future invokations of that method will have to jump into the interpreted version (the indirection table I've just talked about), it's that we are in the middle of the execution of one method (well, this sound also like the on stack replacement thing...). There's an article where they talk about this in the truffle-graal world.

The first problem is how to jump from compiled code back into an interpreter. First of all, we need to be able to jump from compiled code back into the interpreter. This isn’t as simple as a goto. We not only need to jump from one place to another, but we also need to recreate the state that the interpreter expects. That means that we need to recreate the stack frames for every method that was compiled but now is being interpreted. We then just have to restart execution in the interpreter in the AST node that corresponds to the machine code where we deoptimized - so we maintain a mapping between the two for all generated code.

Reading the different articles cited above I've been amazed by the amount of options that the JVM accepts (parameters that you pass to the java process in the command line). I was familiar with the -Xms and -Xmx settings to set the Heap size, and also the -Xss option to set the Thread stack size. But there are so many more! For example:

  • XX:CompileThreshold=invocations. Sets the number of interpreted method invocations before compilation. By default, in the server JVM, the JIT compiler performs 10,000 interpreted method invocations to gather information for efficient compilation.
  • -XX:-TieredCompilation. To disable tiered compilation (it's enabled by default)
  • -XX:+PrintCompilation. When enabling this optione every time a method is compiled the JVM prints out a line with information about what has just been compiled

I won't close this post without linking these 2 articles about meta-compilation, tracing and partial evaluation, [1] and [2]

Meta-compilation reduces the individual implementation effort by making just-in-time compilers reusable and independent of any particular language. RPython and GraalVM are the main systems in this space. Both approach the problem by compiling the executing program together with the interpreter that is executing it. This combination provides the necessary specialization opportunities to produce efficient native code that rivals custom-built state-of-the-art JIT compilers.