Thursday 10 October 2024

Partial Function Application in Python

I won't explain here the difference between partial function application and currying, there are millions of articles, posts and discussions about it everywhere. As well explained by wikipedia, partial application refers to the process of fixing a number of arguments of a function, producing another function of smaller arity. JavaScript comes with function.bind, and Python comes with functools.partial. I rarely use it though. Most times I just need to bind a parameter to a function expecting few parameters, and for that I tend to just use a lambda, I mean:


from functools import partial

def wrap_msg(wrapper: str, msg: str) -> str:
    return f"{wrapper}{msg}{wrapper}"

wrap1 = partial(wrap_msg, "||")
print(wrap1("aaa"))

wrap2 = lambda msg: wrap_msg("||", msg)
print(wrap2("aaa"))


In the above case it's a matter of taste, but for other use cases there are clear differences. First, let's see what functools.partial exactly does:

Return a new partial object which when called will behave like func called with the positional arguments args and keyword arguments keywords. If more arguments are supplied to the call, they are appended to args. If additional keyword arguments are supplied, they extend and override keywords. Roughly equivalent to:

So if we want to bind a parameter that is not the first one we'll have to bind it as a named parameter, and then we will have to invoke the resulting function passing all the parameters located after it as named ones. That's a constraint that probably makes more convenient using a lambda. On the other side, given the they extend and override keywords feature, binding a parameter as a named one gives as a very nice feature, we can use it to turn parameters into default parameters. We use partial to bind those parameters for which we've realised we want a default value, and then we still can invoke the resulting function providing those values if we want. I mean:


cities = ["Xixon", "Paris", "Lisbon"]
def join_and_wrap(items: Iterable[Any], jn: str, wr: str) -> str:
    return f"{wr}{jn.join(items)}{wr}"

join_and_wrap_with_defaults = partial(join_and_wrap, jn=".", wr="||")

# make use of both defaults
print(join_and_wrap_with_defaults(cities))

# "overwrite" one of the defaults
print(join_and_wrap_with_defaults(cities, jn=";"))
print(join_and_wrap_with_defaults(cities, wr="--"))

I was checking what others had to say about all this, and this post is an interesting one. They mention how functools.partial is pretty convenient to prevent the Closure in a loop issue. As you can see in the code below it's less verbose than the strategy of creating a wrapper function that works as scope. Notice that indeed with partial we are not creating a closure (it return a callable object that is different from a function, and traps the bound arguments as attributes of the object, not in the cells of a __closure__).


greetings = ["Bonjour", "Bon día", "Hola", "Hi"]

# remember the "closure in a loop issue" as the the variable has not a "iteration scope"

print("'- Closure in a loop' issue")
greet_fns = []
for greeting in greetings:
    def say_hi(name):
        return f"{greeting} {name}"
    greet_fns.append(say_hi)

for fn in greet_fns:
    print(fn("Francois"))

#Hi Francois
#Hi Francois
#Hi Francois
#Hi Francois

print("------------------------------")

print("- Fixed 1")
greet_fns = []
def create_scope(greeting):
    def say_hi(name):
        return f"{greeting} {name}"
    return say_hi
for greeting in greetings:
    greet_fns.append(create_scope(greeting))

for fn in greet_fns:
    print(fn("Francois"))

#Bonjour Francois
#Bon día Francois
#Hola Francois
#Hi Francois

print("------------------------------")

print("- Fixed 2")
greet_fns = []
def say_hi(greeting, name):
    return f"{greeting} {name}"
for greeting in greetings:
    greet_fns.append(partial(say_hi, greeting))

for fn in greet_fns:
    print(fn("Francois"))

#Bonjour Francois
#Bon día Francois
#Hola Francois
#Hi Francois

Reading about this stuff I've come across the better partial module, that implements an interesting approach to partial application, that feels like something in between the "classic" partial and currying. It creates partial functions that create other partial functions. You invoke them with real values and place-holders, until you have provided all the parameters. Sure this explanation is rather unclear, so I better copy-paste here an example from the module github.



from better_partial import partial, _
# Note that "_" can be imported under a different name if it clashes with your conventions

@partial
def f(a, b, c, d, e):
  return a, b, c, d, e

f(1, 2, 3, 4, 5)
f(_, 2, _, 4, _)(1, 3, 5)
f(1, _, 3, ...)(2, 4, 5)
f(..., a=1, e=5)(_, 3, _)(2, 4)
f(..., e=5)(..., d=4)(1, 2, 3)
f(1, ..., e=5)(2, ..., d=4)(3)
f(..., e=5)(..., d=4)(..., c=3)(..., b=2)(..., a=1)
f(_, _, _, _, _)(1, 2, 3, 4, 5)
f(...)(1, 2, 3, 4, 5)


Wednesday 2 October 2024

Using Suspend Functions to Prevent Stack Overflows in Kotlin

This post is the Kotlin counterpart to these 2 previous JavaScript and Python posts. Avoiding stack overflows by means of using suspendable functions (or async-await in JavaScript, Python). This technique does not have much of a real practical purpose, the standard mechanism to prevent stack overflows is using trampolines, so consider this post as a brain exercise that has helped me to better understand how coroutines and suspendable functions work (same as the aformentioned posts helped me better understand async/await).

When a kotlin suspend function gets suspended at a suspension point it returns to the calling function, so the corresponding stack frame is released (the function will be later resumed, in the sentence following that suspension point, by "something" calling the resume method of the continuation object corresponding to that function. If we are not stacking things in the stack, there's no risk of a stack overflow. So if we add in our recursive function a call to a trivial suspend function before the next call to the recursive function, we should avoid the stack overflow. The main thing here is clarifying what "trivial" suspend function we can use for this.

Let's start by showing a recursive function that causes a stack overflow:


fun wrapText1(txt: String, times: Int): String = 
    if (times <= 0) txt
    else wrapText1("[$txt]", times - 1)
	
try {
	println(wrapText1("b", 10000))
}
catch (ex: StackOverflowError) {
	println("StackOverflow!!!")
}

// StackOverflow!!!


First of all, that case could be fixed by leveraging Kotlin's compiler support for Tail Call Optimizations. That function is Tail Recursive, so we can just mark it with the tailrec modifier and the compiler will transform that recursion into a loop, so no more overflows.


tailrec fun wrapText1(txt: String, times: Int): String = 
    if (times <= 0) txt
    else wrapText1("[$txt]", times - 1)

That said, let's forget about the tailrec power. That works for tail recursive functions, while the idea of leveraging suspendable functions to prevent stack overflows will work the same for tail and non-tail recursion. Notice that I´ll run my recursive function in a coroutine created with the runBlocking builder using an event loop dispatcher, so everything runs in a single thread, as in JavaScript and Python asyncio.

Let's first try using a suspend function that indeed does nothing. It does not invoke another "real" suspendable function. As you can see in my post from 2021, in JavaScript awaiting on a resolved Promise (or just on a normal object) is enough to avoid the stack overflow, as it causes the function to be suspended, returning and releasing its stack frame. The "then" part to run after the await is not executed inmmediatelly, but put on the microtask queue, meaning that the function returns to the event loop (and this will execute the "then handler" taken from the microtasks queue). Asynchrony in Kotlin is implemented in a quite different way (continuations...) and I was quite convinced that this "doNothing" function would not cause a suspension, but I had to confirm it.


suspend fun doNothingSuspend(): String {
    return "a"
}

// as expected this one DOES NOT prevent Stack Overflows. Calling a function marked as suspend but that is not really suspending is useless for this
// there won't be suspension. So it's like Python asyncio
suspend fun wrapTextAsync1(txt: String, times: Int): String {
    return if (times <= 0) txt
    else {
        doNothingSuspend()
        wrapTextAsync1("[$txt]", times - 1)
    }
}

try {
	println(wrapText("b", 10000))
}
catch (ex: StackOverflowError) {
	println("StackOverflow!!!")
}

//StackOverflow!!!


Yes, we get an ugly stack overflow. Calling that suspend function does not cause a suspension cause the complex code in which a suspend function is transformed by the compiler will not return an COROUTINE_SUSPENDED value, that is what the calling function checks in turn for returning to its calling function (and so on down to the event loop) or continuing with the next sentence (that is what happens in this case). This is a bit related to what happens in Python, where marking a function that "does not really do something asynchronous" as async (a sort of "fake async function") will not really suspend the function. Because of how coroutines are implemented in Python (similar to generators), the chain of async calls will get down to the Task that controls the chain of coroutine calls, and as what has been returned is not an awaitable, the Task will just invoke send() in the first coroutine in the chain, which will go through the different coroutines in the chain recreating their stacks, up to the coroutine from which we had invoked the "fake async function", that will continue, hence running sequentially, not interrupted by a suspension, without having returned the control to the event loop.

So if we want to have a real suspension in our recursive function, returning from the function and avoiding a stack overflow we'll have to invoke a function that really suspends. The immediate option that comes to mind is using a sort of sleep (delay, in the Kotlin coroutines universe) with a minimum value. I say "minimum value" cause passing just 0 to kotlinx.coroutines.delay will return immediatelly without causing any suspension (it won't return a COROUTINE_SUSPENDED value).


suspend fun wrapTextAsync2(txt: String, times: Int): String {
    return if (times <= 0) txt
    else {
        // using a delay(0) is NOT an option, as it returns immediatelly and no suspension happens
        delay(1)
        wrapTextAsync2("[$txt]", times - 1)
    }
}


That works fine. delay(1) returns to the event loop and after that delay the corresponding continuation will resume the recursive function in the next sentence in the function (the next call to that wrapTextAsync2) with a clean stack. But obviously there's an important performance penalty due to the delay.

If you've read this previous post you'll be also familiar with kotlinx.coroutines.yield. Reading the implementation details I was a bit dubious as to whether running my code in an environment consisting of an event loop created by runBlocking with no multiple suspend functions running concurrently (so no items waiting in the event loop queue) would be enough for yield() to cause a suspension, but if you invoke the function below, with whatever huge times value, you'll see that no stack overflow happens. So yes, yield() prevents a stack overflow.


suspend fun wrapTextAsync3(txt: String, times: Int): String {
    return if (times <= 0) txt
    else {
        yield()
        wrapTextAsync3("[$txt]", times - 1)
    }
}


Tuesday 24 September 2024

Recursive IIFE's

I guess most of my hobbyist programming revolves around stuff with no real use other than making me think. A good example of this is that the other idea it came to my mind such an existential question like this "can we write recursive Immeditelly Invokable Function Expressions?"

In modern JavaScript we have Function Expressions and Arrow Function Expressions. As Funtion Expressions can be named, making it recursive is not a problem at all. For example:


let txt = function fn(txt, times) {
    return times <= 0 
        ? txt
        : fn(`[${txt}]`, --times);
}("aa", 3);
    
console.log(txt);
// [[[aa]]]


Arrow functions do not have a name, so they have no way to call themselves save if they have been assigned to a variable. So you have to assign it to a variable and then invoke it, and for running that you'll have to wrap both things in an additional function (an IIA(rrow)E.


txt = (() => {
    const fn = (txt, times) => times <= 0 
        ? txt
        : fn(`[${txt}]`, --times);
    return fn("aa", 3);
})();
console.log(txt);
// [[[aa]]]


That's way more verbose and unintuitive than a Function Expression, so the only reason for using it that I can think of is if that code was running inside a method and the recursive function wanted to use/trap the "this" of that method (arrow functions use the lexical "this"). That would be more convenient than the old-times "self = this;" trick.


class A {
    constructor(logger) {
        this.logger = logger;
    }

    doSomething() {
        // 2 nested arrow functions trapping the "lexical this"
        return (() => {
            const fn = (txt, times) => {
                this.logger.log("in call");
                return times <= 0 
                    ? txt
                    : fn(`[${txt}]`, --times);
            };
            return fn("aa", 3);
        })();        
    }
}

let a = new A(console);
txt = a.doSomething();
console.log(txt);
// [[[aa]]]


Lambda functions in Python are anonymous, but the Assignment Expressions (aka walrus operator) comes to rescue so we can write something so nice like this:



txt = (fn := lambda txt, times: txt 
    if times <= 0
    else fn(f"[{txt}]", times - 1)
)("aa", 3)

print(txt)
# [[[aa]]] 


So in the above code we have an expression that defines a lambda, assigns it to the 'fn' variable and retuns it. Then we immediatelly invoke the returned function. That function has trapped the 'fn' variable in its closure (it's a freevar) and it can invoke it recursivelly.

Tuesday 17 September 2024

Function Currying Revisited (fixed)

For whatever the reason some days ago I was looking into adding a curry function to a python micro-library with some utility functions that I use in some small projects. I took a look to this post that I wrote some years ago, and I realized that my javascript implementation was wrong. I'll start by writing it again with the same logic, but supporting calls with several parameters at one time, that was not supported in that old version.



//This one is WRONG
function buggyCurry(originalFunc) {
    let providedArgs = [];
    return function curriedFn(...args) {
        providedArgs.push(...args);
        return providedArgs.length >= originalFunc.length
            ? originalFunc(...providedArgs)
            : curriedFn;
    };
}


Well, the idea is that to create a curried version of a provided function we have to create a function with state, where that state is the original function and the list of already provided parameters. As we invoke the curried function these parameters are appended to that list and the same curried function is returned. Once we have provided all the paremeters the original function is invoked and its return value returned. OK, but the above implementation is wrong. Keeping the list of provided parameters as state of the curried function means that when we start another round of calls to the curried function (a "curried-flow"), those parameters are already there and it fails. I mean:


function formatMessages(msg1, msg2, msg3) {
    console.log(`${msg1}-${msg2}-${msg3}`);
}

let curriedFormat = buggyCurry(formatMessages);

try {
    curriedFormat("a")("b")("c");
    curriedFormat("d")("e")("f");
}
catch (ex) {
    console.log(`Error: ${ex}`);
}
// a-b-c
// a-b-c
// c:\@MyProjects\MyJavaScriptPlayground\currying\curry.js:35
// curriedFormat("d")("e")("f");
//TypeError: curriedFormat(...) is not a function

The first invocation is fine. But the second invocation fails, cause rather than starting again with an empty list of parameters it already starts with those of the previous execution.

To fix that, each time we start a new invocation of the curried function (a "curried-flow") we have to start with a new parameters list. We have to separate the curried function, from the function that stores the state and returns itself until all parameters are provided (the curry logic). The function that starts the "curried-flow" is a sort of bootstrapper or factory. This is what I mean:



//This one is better, but still not fully correct
function curry(originalFunc) {
    // no difference between using an arrow function or a named function expressions, save that the named expression makes clear to me that it's a bootstrapper (or factory)
    //return (...args) => {
    return function curryBootStrap(...args) {
        let providedArgs = [];
        //II(named)FE
        return (function curriedFn(...args) {
            providedArgs.push(...args);
            return providedArgs.length >= originalFunc.length
                ? originalFunc(...providedArgs)
                : curriedFn;
        })(...args);
    };
}


function formatMessages(msg1, msg2, msg3) {
    console.log(`${msg1}-${msg2}-${msg3}`);
}

let curriedFormat = curry(formatMessages);
console.log(curriedFormat.name);

curriedFormat("a")("b")("c");
curriedFormat("d")("e")("f");
curriedFormat("g", "h")("i");
curriedFormat("j", "k", "l");

//a-b-c
//d-e-f
//g-h-i
//j-k-l

That (apparently) works!. But indeed I've just realised that it's not fully correct. It works fine in that use case where we only invoke multiple times the initial function, but it could be that we want to store a reference to some of the intermediate functions and then invoke it multiple times. In that case we would face the same problem as with the first version. That explains whay when checking some articles I always came across with a different implementation that seemed less natural to me, but that is the good one. Based on what I've read here (excellent article where he also talks about the bug that I've just shown), I've implemented it "my way":



// THIS ONE IS CORRECT
function curry(fn) {
    function saveArgs(previousArgs) {
        return (...args) => {
            const newArgs = [...previousArgs, ...args];
            return newArgs.length >= fn.length 
                ? fn(...newArgs)
                : saveArgs(newArgs);
        };
    }
    // Start with no arguments passed.
    return saveArgs([]);
}

In this version, on each invocation of the "curried-flow" either we have reached the end and the original function is invoked, or a new function that traps the existing parameters and joins them in a new array with the ones that it receives, and contains this "curry-logic", is created. The creation of that function is done through another function that I've called saveArgs (the original article named it accumulator). saveAs receives the existing parameters, allowing its child function to trap them. Notice that saveArgs is not a recursive function (neither direct nor indirect), the next invocation to saveArgs is done from a different function (the child function) that is called from the next "curried-flow" invocation, so these calls do not get stacked.

In the above implementations I'm using named function expressions in several places where I could use an arrow function. I'm aware of the differences between normal functions and arrow functions (dynamic "this" vs lexical "this"), that in these cases do not play any role as none of those functions is making use of "this". Using a named function is useful to me cause I try to use semantic names (it's sort of like adding a comment). Additionally there is some trendy use of arrows for factories of other functions that I don't find particularly appealing. I mean, rather than this compact version:


let formatFactory = (pre, pos) => (msg) => `${pre}${msg}${pos}`;

let format1 = formatFactory("-- ", " ..");
console.log(format1("name"));


I prefer writing this more verbose version:


function formatFactory(pre, pos) {
    return (msg) => `${pre}${msg}${pos}`;
}
let format = formatFactory2("-- ", " ..");
console.log(format("name"));

I don't write much JavaScript these days, and I was getting some problems cause I was mixing Kotlin behaviour with JavaScript behaviour. In Kotlin lambdas, if you don't explicitly return a value using the qualified return syntax, the value of the last expression is implicitly returned. This is so regardless of how many expression, declarations or statements it contains. In JavaScript, arrow functions only have an implicit return when they are made up of just one expression.

Wednesday 11 September 2024

Literal Syntax for Iterable with Lazy values

The Python discussion forum is a great source of interesting ideas (and also of frustration when some ideas are rejected in a rather harsh way based on the "it's not pythonic" bullshit). The other day one guy was proposing something that was like a sort of "generator literal" or "literal syntax for an iterable with lazy values". I mean, a syntax to define an iterable with values that won't be calculated until each of them is needed (that is, lazily). A straightforward way to express a "collection" made up of values where we need that at least some of them gets generated by a function call just at the moment when the value is needed by the iteration. These are "lazy values"

The syntax he was proposing was not feasible as it would clash with other existing uses, but what is particularly interesting is the workaround that he's using (I'm simplifying one of the lines to make it more clear). It took me a bit to understand it:


values: Iterator[str] = (lambda: (
    (yield "abc"),
    (yield expensive_func("bbb")),
    (yield "something else"),
))()


So, what's that? First he's working around the painful Python's lambda limitations (they can't contain statements) with the technique that I mentioned in this post, the expression (a tuple) with nested expressions (each element of the tuple). We can yield from each nested expression (notice that we have to wrap yield in parentheses so that it's considered as an expression, not as a statement). The above code is equivalent to this:



def gen_fn():
    yield "abc"
    yield expensive_func("bbb")
    yield "something else"
	return (None, None, None)

values: Iterator[str] = gen_fn()


There's another simple, obvious approach. A helper class that wraps the values and "lazy values" (with the lazy values represented as functions that it has to invoke). Something like this:



class LazyIterable:
    def __init__(self, items: Iterable):
        self.items = items

    def __iter__(self):
        for item in self.items:
            yield item() if callable(item) else item


gen_ob = LazyIterable([
    "abc",
    lambda: expensive_func("bbb"),
    "something else",
])



There's not this literal syntax in JavaScript either, so we would have to use the same techniques I've just described, for example:


let genOb = (function*() {
    yield "abc";
    yield expensiveFunc("bbb");
    yield "something else";
})();

console.log("genOb created")

for (let it of genOb) {
    console.log(it);
}

Notice that we can not use arrow functions to define a generator (there's a proposal for that from 5 years ago...), so we are using a "traditional" function expression.

Thursday 5 September 2024

JVM Tiered Compilation

Over the years I've written multiple posts with basic information about JIT's and interpreters (I'll just link a few of them: [1], [2]). We could say that the combination of an interpreter, multi-tier JIT compilation, profiling, optimization and deoptimization... really fascinates me. The truffle interpreters and Graal JIT compilation combination [1], [2] is one of the last advances in the area that has kept me dreming with better understanding how all this works. While rethinking a bit about this topic I've realised that I had some misconceptions and doubts (kind of: "I think it's like that, but indeed I'm not sure about having read it or just being assuming it"). For example, I had got to wrongly think that the cycle of profiling-optimizing a given method was sort of neverending, which is not the case (at least in the JVM, either with C2 or Graal). So as of late I've been reading some pretty interesting material on this topic, mainly about the JVM, and I'll summarize here my findings.

When we have an interpreter, a "simple" and fast JIT and a more advanced and slower (in compilation time, of course the code it generates is faster) JIT, the base principle that drives the interaction between these 3 components is that the interpreter counts how many times it invokes a method and when it exceeds a threshold (the method is hot) it compiles it with the "simple" JIT compiler. This compiler adds "counting logic" to the code it generates, so that the method invokation counting continues and when another threshold is exceeded the method is invoked with the advanced JIT compiler. OK, things are more complex than that in the JVM. This article gives an excellent explanation (just notice that if we are using the Graal VM in not AOT mode we don't have C2, but Graal)

First, the interpreter is not just counting method invokations, it's also gathering profiling information. Well, from what I've read, the interpreter will only start to gather profiling information after a certain threshold of invokations. The "simple" C1 compiler also adds logic to gather profile info to the native code that it generates, so that in the future the "advanced" C2 compiler can use that profiling information to generate hyper-optimized code. It's important to note that profiling has a cost. When your code is not just doing its job but also gathering profile information it's doing extra work. That's why the C2 compiler is the ending point, it will not do further optimizations. It does not generate code with "invokation counting" and profile gathering logic so that in the future another more optimized version can be compiled. The code generated by C2 is just as optimized as "necessary", and that's all. Well, indeed there's an additional player, deoptimization. Both C1 and C2 do assumptions when compiling, with the intent to get optimized code, but those assumptions could be proven wrong in the future (the generated code contains assertions/guards to verify this), so in that case the code has to be deoptimized, returning to the interpretation stage and starting over. This was another source of doubts for me, I thought deoptimization would jump to a previous native code version, but no, it's mentioned in multiple sources that it just returns to the interpreted form. These 2 images are crystal clear:

Furthermore, there are 3 levels of C1 compilations. It's well explained in that article, and also in this one.

So the levels of compilation are as follows:

  • 0: Interpreted code
  • 1: Simple C1 compiled code
  • 2: Limited C1 compiled code
  • 3: Full C1 compiled code
  • 4: C2 compiled code

C1 and C2 compilers run in separate threads (even several threads for each compiler depending on your number of cores). Each compiler has a queue where it receives compilation requests (so your program does not suddenly "stop" waiting for a compilation). Trivial methods that would not benefit of further profiling and C2 compilation are sent from the interpreter to C1 for a level 1 compilation, that won't add any profiling to the generated code (it won't even add the invokation-counting logic to it). That code is "final" unless it has to be deoptimized. Non trivial methods are sent to C1 level 3 (that adds full profiling) or C1 level 2 (that adds light profiling). It's not clear to me yet when it sends to level2 rather than level3. Finally, when a method is hot enough it's sent to the C2 compiler along with all the profiling information.

There's another point that I should detail a bit more. When I say that the interpreter and the C1 generated code are counting how many times a method is invoked, it's indeed more complex. It also counts how many loop iterations run inside that method. Invoking a method once, but then running a loop for 1000 iterations in that method makes it hotter than invoking 100 times a method that has no loops.

Compilation is based on two counters in the JVM: the number of times the method has been called, - and the number of times any loops in the method have branched back.

As one methods evolves from its bytecode form (to be interpreted) to one of several native forms (and then back to bytecode form if it has been deoptimized), how does the code that invokes that method know where to jump to? We are not going to be patching the many possible callsites to that method, so the solution seems simple (once you read it from someone smarter than you), adding an extra level of indirection. So calls to a method will go through a table that points to the actual code (the native code version or code that continues interpreting bytecodes). So it's that table who gets patched to point to the right address as the method evolves. We already have this kind of tables (i-tables and v-tables) for virtual methods, so I wonder if this patching is just applied on such v-table, or entries in that v-table point to an additional "bytecode vs native" table on which this patching happens. Wow, VM's are amazingly complex.

Another point that seems amazingly complex to me is the deoptimization part. So you are executing the Jitted compiled version of one method and an assertion/guard fails and we have to go back to the bytecode interpreted version. This is not only that future invokations of that method will have to jump into the interpreted version (the indirection table I've just talked about), it's that we are in the middle of the execution of one method (well, this sound also like the on stack replacement thing...). There's an article where they talk about this in the truffle-graal world.

The first problem is how to jump from compiled code back into an interpreter. First of all, we need to be able to jump from compiled code back into the interpreter. This isn’t as simple as a goto. We not only need to jump from one place to another, but we also need to recreate the state that the interpreter expects. That means that we need to recreate the stack frames for every method that was compiled but now is being interpreted. We then just have to restart execution in the interpreter in the AST node that corresponds to the machine code where we deoptimized - so we maintain a mapping between the two for all generated code.

Reading the different articles cited above I've been amazed by the amount of options that the JVM accepts (parameters that you pass to the java process in the command line). I was familiar with the -Xms and -Xmx settings to set the Heap size, and also the -Xss option to set the Thread stack size. But there are so many more! For example:

  • XX:CompileThreshold=invocations. Sets the number of interpreted method invocations before compilation. By default, in the server JVM, the JIT compiler performs 10,000 interpreted method invocations to gather information for efficient compilation.
  • -XX:-TieredCompilation. To disable tiered compilation (it's enabled by default)
  • -XX:+PrintCompilation. When enabling this optione every time a method is compiled the JVM prints out a line with information about what has just been compiled

I won't close this post without linking these 2 articles about meta-compilation, tracing and partial evaluation, [1] and [2]

Meta-compilation reduces the individual implementation effort by making just-in-time compilers reusable and independent of any particular language. RPython and GraalVM are the main systems in this space. Both approach the problem by compiling the executing program together with the interpreter that is executing it. This combination provides the necessary specialization opportunities to produce efficient native code that rivals custom-built state-of-the-art JIT compilers.

Thursday 29 August 2024

Getting Source Code at Runtime

One surprising feature present both in JavaScript and Python is that we can get access at runtime to the source code of a function. The internals are a bit different, and from that stems the fact that in Python this feature is slightly more limited than in JavaScript. Let's see.

In JavaScript user defined functions have a toString() method that returns a function's source code (including comments).


function sayHi(name) {
    name = name.toUpperCase();
    console.log(`Hi ${name}`)
}

console.log(sayHi.toString())
// function sayHi(name) {
//     name = name.toUpperCase();
//     console.log(`Hi ${name}`)
// }


For "native" functions or bound functions (obtained with .bind()) toString will just return: function xxx() { [native code] }.


function prepend(pr, msg) {
    return `${pr}${msg}${pr}`;
}
let fn = prepend.bind(null, "||");

console.log(`- bound function: ${fn.toString()}`);
//- bound function: function () { [native code] }

Notice that toString() works perfectly fine for functions created at runtime via eval():


let fn4 = eval("() => console.log('hi');");

console.log(fn4.toString());
//() => console.log('hi')


toString also works for classes, returning the whole class. It feels a bit surprising to me, cause under the covers classes are syntax sugar, and for a class Person what we really have is a Person function that corresponds to the constructor. So indeed, I don't know how we would get just the constructor code (other than extracting its substring from MyClass.toString()).


class Person {
    constructor(name) {
        this.name = name;
    }

    walk() {
        console.log("I'm waking")
    }
}

console.log(Person.toString())
// class Person {
//     constructor(name) {
//         this.name = name;
//     }

//     walk() {
//         console.log("I'm waking")
//     }
// }

console.log(typeof Person); 
//function


So it seems that when we define a function its source code is stored in some internal property of the corresponding function object. Checking the Language Specification it seems function objects have [[SourceText]] internal slot for that.

Things in Python are a bit different. We can obtain the source code of non native functions, classes and modules, but with some limitations, basically: getsource only works if it can open the file the source code exists in. The functionality is provided by the inspect.getsource() function. inspect is a standard python module, but the implementation feels like a bit "hackerish", like a functionality that was not initially intended and was added by means of leveraging some low level details. I've just said that in JavaScript functions have an slot pointing to its source code. This is not so straightforward in Python.

In Python a Function object has an associated code object (attribute __code__) that gives us access to the code of that function (through the co_code attribute). But that's a bytes object containing the Python bytecodes, not the Python source code. The __code__ object has 2 extra attributes: co_filename (with the full path to the python module where the function is defined) and co_firstlineno (with the line in that file where the function starts) (this is well explained here. So if we have the file where that function was defined, inspect.getsource can extract its source code, like this:


def format(txt: str):
    return f"[[{txt.upper()}]]"

print(inspect.getsource(format))
print(f"filename: {format.__code__.co_filename}")
print(f"firstlineno: {format.__code__.co_firstlineno}")

# def format(txt: str):
#     return f"[[{txt.upper()}]]"

# filename: /media/ntfsData/@MyProjects/MyPython_3.10_Playground/inspect/inspect_tests.py
# firstlineno: 27

This technique won't work for functions defined dynamically with exec-eval. There's not a file from which to get the source, and we'll get an exception: OSError: could not get source code.


format2_st = """
def format2(txt: str):
    return f'[[{txt.upper()}]]'
"""

def create_function(fn_st, fn_name):
	exec(fn_st)
	return eval(fn_name)

format2 = create_function(format2_st, "format2")
print(format2("aaa"))
# [[AAA]]

try:
    print(inspect.getsource(format2))
except Exception as ex:
    print(ex)
    # OSError: could not get source code

print(f"filename: {format2.__code__.co_filename}")
print(f"firstlineno: {format2.__code__.co_firstlineno}")

# [[AAA]]
# could not get source code
# filename: 
# firstlineno: 2
-----------------

inspect.getsource can also get the source code of a class. Classes do not have an associated code object, so the technique used has to be a bit different. You can check the inspect.py source code if you feel much intrigued.


class Person:
    def __init__(self):
        super().__init__()
        print("Person.__init__")

    def say_hi_to(self, to: str):
        return f"{self.name} says Hi to {to}"
        
print(inspect.getsource(Person))

# class Person:
#     def __init__(self):
#         super().__init__()
#         print("Person.__init__")

#     def say_hi_to(self, to: str):
#         return f"{self.name} says Hi to {to}"

By the way, inspect.getsource() can retrieve its own source code! nice :-)


inspect.getsource(inspect.getsource)
Out[14]: 'def getsource(object):\n    """Return the text of the source code for an object.\n\n    The argument may be a module, class, method, function, traceback, frame,\n    or code object.  The source code is returned as a single string.  An\n    OSError is raised if the source code cannot be retrieved."""\n    lines, lnum = getsourcelines(object)\n    return \'\'.join(lines)\n'