Wednesday 24 April 2024

Python "Power Instances"

We saw in my recent post about making an instance callable (__call__()) that the implicit search for __dunder__ methods will start searching for them in the class of the instance, not in the instance itself. So this happens also with __getattribute__, __setattribute__, we can change the look-up mechanism for an object by defining a __getattribute__, __setattribute__ in its class, but defining it directly in the instance will have no effect. Something similar happens also for descriptors, they are executed (__get__(), __set__()...) when found in a class, if found in an instance they are returned as such. I showed in that post a technique for overcoming this "limitation". We create a new class, add the dunder method(s) to it, and change the class of our instance to that new class (by reassigning the __class__ attribute). We could generalize this technique, having classes that create "power-instances", instances that we can easily not only do callable, but also define a __getattribute__, add descriptors, and why not, change its inheritance mechanism. All this for an specific instance, not for the whole class (which would affect all its instances).

So I've come up with the idea of having a decorator that decorates classes, turning them into classes which instances are "power-instances" on which we can alter the __getattribute__ behaviour, do __callable__ etc. The decorator returns a new class (PowerInstance class) inheriting from the original one and with additional methods (to do it callable, for adding instance methods, for adding properties, for interception, for adding a base class...). When this class is used for creating an instance it will return an instance not of this class, but of a new class (yes, we create a new class on each instantiation), that inherits from this PowerInstance class. The different additional methods that we defined in PowerInstance will perform the actions (add the specific __call__, __getattribute__(), modify inheritance...) on this new class. This is possible because we can return an instance of a new class rather than of the expected one by defining a new __new__() method in the class. OK, let's see the implementation:


from types import MethodType

# decorator that enhances a class so that we can add functionality just to the instance, not to the whole class
# it creates a new class PowerInstance that inherits from the original class
def power_instance(cls):
    class PowerInstance(cls):
        # each time we create an instance of this class, we create an instance of new Child class 
        # (in this Child class is where we add the extra things that we can't directly add to the instance)
        def __new__(_cls, *args, **kwargs):
            # it's interesting to note what someone says in OS that returning instances of different classes 
            # in a sense breaks object orientation, cause you would expect getting instances of a same class
            class PerInstanceClass(_cls):
                pass
            return super().__new__(PerInstanceClass)
            # remember this is how a normal __new__() looks like:
                # def __new__(cls, *args, **kwargs):
                #     return super().__new__(cls)
                  
        def _add_to_instance(self, item, item_name):
            # type(self) returns me the specific class (PerInstanceClass) created for this particular instance           
            setattr(type(self), item_name, item)

        def add_instance_method(self, fn, fn_name: str):
            self._add_to_instance(fn, fn_name)

        def add_instance_property(self, prop, prop_name: str):
            self._add_to_instance(prop, prop_name)

        def do_callable(self, call_fn):
            type(self).__call__ = call_fn

        def intercept_getattribute(self, call_fn):
            type(self).__getattribute__ = call_fn

        def do_instance_inherit_from(self, cls):
            # create a new class that inherits from my current class and from the provided one
            class PerInstanceNewChild(type(self), cls):
                pass
            #NewChild.__name__ = type(self).__name__
            self.__class__ = PerInstanceNewChild
    
    # there's no functools.wraps for a class, but we can do this, so the new class has more meaningful attributes
    for attr in '__doc__', '__name__', '__qualname__', '__module__':
        setattr(PowerInstance, attr, getattr(cls, attr))
    return PowerInstance
    

And now one usage example. Notice how we enrich the p1 instance and this affects only that specific instance, if I create a new instance of that class, it's "clean" of those features.



@power_instance
class Person:
    def __init__(self, name: str):
        self.name = name

    def say_hi(self, who: str) -> str:
        return f"{self.name} says Bonjour to {who}"

print(f"Person.__name__: {Person.__name__}")
p1 = Person("Iyan")
print(p1.say_hi("Antoine"))

def say_bye(self, who: str):
    return f"{self.name} says Bye to {who}"
p1.add_instance_method(say_bye, "say_bye")
print(p1.say_bye("Marc"))

p1.add_instance_property(property(lambda self: self.name.upper()), "upper_name")
print(p1.upper_name)

p1.do_callable(lambda self: f"[[{self.name.upper()}]]")
print(p1())

# Person.__name__: Person
# Iyan says Bonjour to Antoine
# Iyan says Bye to Marc
# IYAN
# [[IYAN]]

print("------------------")

# verify that say_bye is only accessible by p1, not by p2
p2 = Person("Francois")
print(p2.say_hi("Antoine"))

try:
    print(p2.say_bye("Marc"))
except Exception as ex:
    print(f"Exception! {ex}")

try:
    print(p2())
except Exception as ex:
    print(f"Exception! {ex}")

# Francois says Bonjour to Antoine
# Exception! 'PerInstanceClass' object has no attribute 'say_bye'
# Exception! 'PerInstanceClass' object is not callable


Let's do our instance extend another class:



class Animal:
    def growl(self, who: str) -> str:
        return f"{self.name} is growling to {who}"
        
p1.do_instance_inherit_from(Animal)

print(p1.growl("aa"))
print(p1.say_hi("bb"))
print(p1.say_bye("cc"))
print(p1())
print(f"mro: {type(p1).__mro__}")
print(f"bases: {type(p1).__bases__}")

# Iyan is growling to aa
# Iyan says Bonjour to bb
# Iyan says Bye to cc
# [[IYAN]]
# mro: (.PowerInstance.do_instance_inherit_from..PerInstanceNewChild'>, .PowerInstance.__new__..PerInstanceClass'>, , , , )
# bases: (.PowerInstance.__new__..PerInstanceClass'>, )


And now some interception via __getattribute__:



# intercept attribute access in the instance
print("- Interception:")

# interceptor that does NOT use "self" in the interception code
def interceptor(instance, attr_name):
    attr = object.__getattribute__(instance, attr_name)
    if not callable(attr):
        return attr
    
    def wrapper(*args, **kwargs):
        print(f"before invoking {attr_name}")
        # attr is already a bound method (if that's the case)
        res = attr(*args, **kwargs)
        print(f"after invoking {attr_name}")
        return res
    return wrapper

p1.intercept_getattribute(interceptor)
print(p1.say_hi("Antoine"))

p3 = Person("Francois")
# interceptor that does use "self" in the interception code
def interceptor2(instance, attr_name):
    attr = object.__getattribute__(instance, attr_name)
    if not callable(attr):
        return attr
    
    def wrapper(self, *args, **kwargs):
        print(f"before invoking {attr_name} in instance: {type(self)}")
        # attr is already a bound method (if that's the case)
        res = attr(*args, **kwargs)
        print(f"after invoking {attr_name} in instance: {type(self)}")
        return res
    
    return MethodType(wrapper, instance)        

p3.intercept_getattribute(interceptor2)
print(p3.say_hi("Antoine"))

# before invoking say_hi
# after invoking say_hi
# Iyan says Bonjour to Antoine

# before invoking say_hi in instance: .PowerInstance.__new__..PerInstanceClass'>
# after invoking say_hi in instance: .PowerInstance.__new__..PerInstanceClass'>
# Francois says Bonjour to Antoine


This thing of being able to have "constructors" that return something different from the expected "new instance of the class" is not something unique to Python. In JavaScript a function used as constructor (invoked via new) can return a new object, different from the one that new has created and passed over to it as "this". In Python construction-initialization is done in 2 steps, __new__ and __init__ methods, that are invoked by the __call__ method of the metaclass, so we normally end up in: type.__call__. In this discussion you can see a rough implementation of such type.__call__


# A regular instance method of type; we use cls instead of self to
# emphasize that an instance of a metaclass is a class
def __call__(cls, *args, **kwargs):
    rv = cls.__new__(cls, *args, **kwargs)  # Because __new__ is static!
    if isinstance(rv, cls):
        rv.__init__(*args, **kwargs)
    return rv

# Note that __init__ is only invoked if __new__ actually returns an instance of the class calling __new__ in the first place.


It's interesting to note that someone says in stackoverflow that returning from a constructor an instance of class different to that of the constructor, particularly a different class each time, in a sense breaks object orientation, cause normally you expect all instances returned by a constructor to be of the same class (this expectation does not exist if what you use is a factory function).

Thursday 18 April 2024

Static Members Comparison

Companion objects is a rather surprising Kotlin feature. Being a replacement (allegedly an improvement) for the static members that we find in most other languages (Java, C#, Python), in order to grasp its advantages I've needed first to review how static members work in these other languages. That's what this post is about.

In Java static members (fields or methods) can be accessed from the class and also from instances of the class (which is not recommended because of what I'm going to explain). Static members are inherited, but static methods are not virtual, their resolution is done at compile-time, based on the compile-time type, which is something pretty important to take into account if we are going to invoke them through an instance rather than through the class (it seems to be a source of confusion, and one of the reasons why Kotlin designers decided not to add "static" to the language). If we define a same static method in a Parent class and its Child class, and we invoke it through a Parent variable pointing to a Child instance, as the resolution is done at compile time (there's no polymorphism for static members) the method being invoked will be the one in Parent rather than in Child. You can read more here.

Things are a bit different in C#. Probably aware of that problem in Java, C# designers decided to make static members only accessible from the class, not from instances. static members are inherited (you can use class Child to access a static member defined in class Parent) and you can redefine a static method (hide the inherited one with a new one) in a Child class using the new modifier.

Recent versions of JavaScript have seen the addition of static members to classes (of course remember that classes in JavaScript are just syntactic sugar, the language continues to be prototype based). They work in the same way as in C#. They can be accessed only through the class, not through instances. You have access to them using a Child class (they are inherited) and you can also redefine them in a Child class.


class Person {
    static planet = "Earth"
    
    constructor(name) {
        this.name = name;
    }

    static shout() {
        return `${this.planet} inhabitant AAAAAAAAAAAA`;
    }

}

class ExtendedPerson extends Person {

}

console.log(Person.shout())

try {
    console.log(new Person("Francois").shout());
}
catch (ex) {
    console.log(ex);
}

// inheritance of static fields/methods works OK
console.log(ExtendedPerson.shout());

//it works because of this:
console.log(Object.getPrototypeOf(ExtendedPerson) === Person);
//true

I assume static members are implemented by just setting properties in the object for that class (Person is indeed a function object), I mean: Person.shout = function(){};. Inheritance works because as you can see in the last line [[Prototype]] of a Child "class" points to the Parent.

An interesting thing is that from a static method you can (and should) access other static methods of the same class using "this". This makes pretty good sense, "this" is dynamic, it's the "receiver" and in a static method such receiver is the class itself. Using "this" rather than the class name allows a form of polymorphism, let's see:


class Person {
    static shout() {
        return "I'm shouting";
    }

    static kick() {
        return "I'm kicking";
    }

    static makeTrouble() {
        return `${this.shout()}, ${Person.kick()}`;
    }

}

class StrongPerson extends Person {
    static shout() {
        return "I'm shouting Loud";
    }
    static kick() {
        return "I'm kicking Hard";
    }    
}

console.log(Person.makeTrouble());
console.log("--------------");
console.log(StrongPerson.makeTrouble());

// I'm shouting, I'm kicking
// --------------
// I'm shouting Loud, I'm kicking


Notice how thanks to using this we end up invoking the Child.shout() method, while for kick() we are stuck in the Parent.kick()

Static/class members in Python have some particularities. In Python any attribute declared in a standard class belongs to the class. This means that for static data attributes we don't have to use any extra keyword, we just add them at the class level (rather than in the __init__() method). For static/class methods we have to use the @classmethod decorator (if it's going to call other class methods) of the @staticmethod decorator if not. When we invoke a method in an object Python uses the attribute lookup algorithm to get the function that then will be invoked. As explained here Functions are indeed data-descriptors that have a __get__ method, so when we retrieve this function via the attribute lookup the __get__ method of the descriptor is executed, creating a bound method object, bound to the instance or to the class (if the function has been decorated with classmethod) or a staticmethod object, that is not bound, if the function has been decorated with staticmethod. Based on this we have that class/static methods can be invoked both via the class or also via an instance, that they are inherited, and that the polymorphism we saw in JavaScript works also nicely in Python. Let's see some code:


class Person:
    planet = "Earth"
    
    def __init__(self, name: str):
        self.name = name

    def say_hi(self):
        return f"Bonjour, je m'appelle {self.name}"
    
    @staticmethod
    def shout():
        return "I'm shouting"

    @staticmethod   
    def kick():
        return "I'm kicking"

    @classmethod
    def makeTrouble(cls):
        return f"{cls.shout()}, {cls.kick()}"


class StrongPerson(Person):
    @staticmethod
    def shout():
        return "I'm shouting Loud"

    @staticmethod   
    def kick():
        return "I'm kicking hard"


print(Person.makeTrouble())
p1 = Person("Iyan")
print(p1.makeTrouble())

print("--------------")

# inheritance works fine, with polymorphism, both invoked through the class or through an instance
print(StrongPerson.makeTrouble())
p2 = StrongPerson("Iyan")
print(p2.makeTrouble())

# I'm shouting, I'm kicking
# I'm shouting, I'm kicking
# --------------
# I'm shouting Loud, I'm kicking hard
# I'm shouting Loud, I'm kicking hard


print(Person.planet) # Earth
print(p1.planet) # Earth

Person.planet = "New Earth"
print(Person.planet) # New Earth
print(p1.planet) # New Earth

# this assignment will set the attibute in the instance, not in the class
p1.planet = "Earth 22"
print(Person.planet) # New Earth
print(p1.planet) # Earth 22

Notice how we can read a static attribute (planet) both via the class or via an instance, but if we modify it via an instance the attribute will added to the instance rather than updated in the class.

One extra note. We know that when using dataclasses we declare the instance members at the class level (then the dataclass decorator will take care of the logic for setting them in the instance in each instantiation), so for declaring static/class attributes in our dataclasses we have to use ClassVar type annotation: cvar: ClassVar[float] = 0.5

Sunday 24 March 2024

Python Object Literals

Kotlin object expressions are pretty nice. They can be used as the object literals that we like so much in JavaScript, but furthermore you can extend other classes or implement interfaces. In Kotlin for the JVM these object expressions are instances of (anonymous) classes that the compiler creates for us.

Python does not provide syntax for object literals, but indeed it's rather easy to get something similar to what we have in Kotlin. For easily creating a "bag of attributes" we can leverage the SimpleNamespace class. To add methods to our object we have to be careful, cause if we just set an attribute to a function, when we later invoke it it will be called without the "receiver-self". We have to simulate the "bound methods" magic applied to functions declared inside a class (that are indeed descriptors that return bound-methods). We just have to use types.MethodType to bind the function to the "receiver". Of course, we also want to allow our "literal objects" to extend other classes. This turns out to be pretty easy too, given that in Python we can change the class of an existing object via the __class__ attribute (I tend to complain about Python syntax, but its dynamic features are so cool!) a feature that we combine with the sometimes disdained multiple inheritance of classes. So we'll create a new class (with types.new_class) that extends the old and the new class, and assign this class to our object.
So, less talk and more code! I ended up with a class extending SimpleNamespace with bind and extend methods. Both methods modify the object in place and return it to allow chaining.


class ObjectLiteral(SimpleNamespace):
    def bind(self, name: str, fn: Callable) -> "ObjectLiteral":
        setattr(self, name, MethodType(fn, self))
        return self
    
    def extend(self, cls, *args, **kwargs) -> "ObjectLiteral":
        parent_cls = types.new_class("Parent", (self.__class__, cls))
        self.__class__ = parent_cls
        cls.__init__(self, *args, **kwargs)
        return self

We'll use it like this:



class Formatter:
    def __init__(self, w1: str):
        self.w1 = w1
    
    def format(self, txt) -> str:
        return f"{self.w1}{txt}{self.w1}"
    

p1 = (ObjectLiteral(
        name="Xuan",
        age="50",
    )
    .extend(Formatter, "|")
    .bind("format2", lambda x, wr: f"{wr}{x.name}-{x.age}{wr}")
    .bind("say_hi", lambda x: f"Bonjour, je m'appelle {x.name} et j'ai {x.age} ans")
)

print(p1.format("Hey"))
print(p1.format2("|"))
print(p1.say_hi())
print(f"mro: {p1.__class__.__mro__}")
# let's add new attributes
p1.extra = "aaaa"
print(p1.extra)

# let's extend another class
class Calculator:
    def __init__(self, v1: int):
        self.v1 = v1
    
    def calculate(self, v2: int) -> str:
        return self.v1 * v2
    
p1.extend(Calculator, 2)
print(f"mro: {p1.__class__.__mro__}")
print(p1.format("Hey"))
print(p1.format2("|"))
print(p1.say_hi())
print(p1.calculate(4))

print(f"instance Formatter: {isinstance(p1, Formatter)}")
print(f"instance Formatter: {isinstance(p1, Calculator)}")


# |Xuan-50|
# Bonjour, je m'appelle Xuan et j'ai 50 ans
# |Hey|
# |Xuan-50|
# Bonjour, je m'appelle Xuan et j'ai 50 ans
# mro: (, , , , )
# aaaa
# mro: (, , , , , , )
# |Hey|
# |Xuan-50|
# Bonjour, je m'appelle Xuan et j'ai 50 ans
# 8
# instance Formatter: True
# instance Formatter: True

Notice how after the initial creation of our object we've continued to expand it with additional attributes, binding new methods and extending other classes.

Tuesday 19 March 2024

Named Arguments Differences

In 2 previous posts [1] and [2] we saw that Python, JavaScript and Kotlin have some differences (python limitations we can say) in how they deal with Default arguments. I was wondering if there are any differences in how they handle Named Arguments.

As for JavaScript, given that it does not support named arguments (which is rather surprising), there's not much to say.

The main difference between Python and Kotlin Named Arguments (also knows as Keyword Arguments in Python) has to do with mixing positional and named arguments in the same call. In Python the named arguments can be provided in any order, but you can not use a not-named argument (positional) after a named argument. In Kotlin, you can also provide named arguments in any order when you are not providing not-named arguments after them. So yes, that "when" means that you are also allowed to provide not-named arguments after named arguments, but if you do this, all the arguments (positional ones and named ones) have to be provided in just the same order they were defined in the function signature.

Let's see some examples in Python:


def format(w1, w2, txt):
    return f"{w1}{w2}{txt}{w2}{w1}"

print(format("-", "|", "Bonjour"))

print(format("-", "|", txt="Bonjour"))

print(format("-", w2="|", txt="Bonjour"))

print(format("-", txt="Bonjour", w2="|"))

# Positional argument cannot appear after keyword arguments
#print(format("-", txt="Bonjour", "|"))
#print(format("-", l2="|", "Bonjour"))


And in Kotlin:


package namedParameters


fun format(w1: String, w2: String, txt: String): String {
    return "$w1$w2$txt$w2$w1"
}


fun main() {
    println(format("-", "|", "Bonjour"))

    println(format("-", "|", txt = "Bonjour"))
    
    println(format("-", w2 = "|", txt = "Bonjour"))
    
    println(format("-", txt = "Bonjour", w2 = "|"))

    //error: Mixing named and positioned arguments is not allowed
    //println(format("-", txt="Bonjour", "|"))
    
    //If I'm going to provide an unnamed parameter after a named one (not available in Python), all the parameters in the call have to be passed in order
    
    println(format("-", w2 = "|", "Bonjour"))

}

From what I've read here named arguments in C# have the same behaviour as in Kotlin (you can provide not-named arguments after named ones, but this forces you to provide all of them in order). At first sight this feature seemed of little use to me. Once you provide some named argument, why would you skip the name of an ensuing argument if that's going to force you to provide all the arguments in the order defined in the signature? Well, indeed it makes sense, let me explain. When we have variables that have the same names as function parameters it can seem a bit redundant to use named arguments, so let's say we are passing as arguments some "inline values" and some variables with a different name from the parameter. This is a case where even if we are keeping the same order as in the function signature, using named arguments for those "inline values" and variables will do our code more clear. But if after those arguments we are going to pass a variable that has the same name as the parameter, using a named argument is redundant, so it's nice not to be forced to name it.

A bit related to this it has come to my mind a discussion in Python (there's a recently created PEP draft for it) about a syntax for shortening the use of named arguments when a variable and a parameter have the same name. The idea seems to be inspired by ruby and would look like this:


#For example, the function invocation:

my_function(my_first_variable=, my_second_variable=, my_third_variable=)

#Will be interpreted exactly equivalently to following in existing syntax:

my_function(
  my_first_variable=my_first_variable,
  my_second_variable=my_second_variable,
  my_third_variable=my_third_variable,
)


This looks like an excellent idea to me that would make unnecessary implementing the Kotlin/C# feature that we've been discussing.

There's one feature of named arguments in Python that I mentioned in my previous post. In Python a function can force us to provide certain (or all) arguments as named ones by means of "*", you can read more here and here. It seems there's been some discussion about adding this to Kotlin, but nothing has been done so far.

Sunday 10 March 2024

Default Arguments differences

After my previous post about late-bound default arguments/parameters (again I continue to be confused about saying "argument" or "parameter", and indeed the Kotlin documentation also seems to use them indistinctively in some occasions) I realised that there's another limitation of Python default arguments when compared to JavaScript or Kotlin. In python once you define a default argument, all the ensuing arguments also have to be defined as default ones, otherwise you'll get a Syntax Error.

This is not the case in Kotlin (and it seems in Ruby either), where:

If a default parameter precedes a parameter with no default value, the default value can only be used by calling the function with named arguments.


fun foo(
    bar: Int = 0,
    baz: Int,
) { /*...*/ }

foo(baz = 1) // The default value bar = 0 is used

So it's OK not to provide one default argument that has non-default arguments after it, but in that case you have to invoke the funcion naming the remaining arguments. There's an exception to this, if the last of those argument is a lambda (trailing lambda), you can just use the nice syntax of passing it outside the parentheses:

If the last argument after default parameters is a lambda, you can pass it either as a named argument or outside the parentheses

As all this ends up being a bit confusing, I think it's generally adviced to put all the default arguments at the end of the function signature (if the last one is a lambda this one is excluded from the rule).

In JavaScript you can also define them as in Kotlin, but given that there's no support for named arguments the behaviour is that if you pass less parameters than the function defines in its signature, it gets them in order, so it seems like it's not particularly useful. Well, it's useful if we explicitely pass undefined as the value for a default parameter, as in that case the default argument is used, rather than value (that's not the case if we pass null)


const format = (l1, l2 = "_", txt) => `${l1}${l2}${txt}${l2}${l1}`;

// not what we would like
console.log(format("|", "Iyan"));
//|IyanundefinedIyan|

console.log(format("|"));
// |_undefined_|

console.log(format("|", "_", "Iyan"));
// |_Iyan_|

// the thing of having declared a default parameter before a non default parameter is only useful if we decide to pass undefined, that will be replaced by the default value
console.log(format("|", undefined, "Iyan"));
// |_Iyan_|

Well there is a syntax "trick" in Python that sort of allows the kotlin behaviour as explained here. It leverages a Python feature that was unknown to me so far, Keyword-Only Arguments. So you can declare non default arguments after default arguments by placing a ", *" between them, as it forces you to pass by name the remaing non default arguments (so you get the same behaviour as in Kotlin). The disadvantage here is that indeed you are forced to pass those parameters by name always, even when you are providing all the values and not using any of the defaults.


# We're not allowed to write this:
# Non-default argument follows default argumentPylance
# def format(l1, l2 = "_", txt):
#     return f"{l1}{l2}{txt}{l2}{l1}"

# but we can use the * feature
def format(l1, l2 = "_", *, txt):
    return f"{l1}{l2}{txt}{l2}{l1}"


# both fail:
# format() missing 1 required keyword-only argument: 'txt'
#print(format("|", "Iyan"))
#print(format("|"))

# works fine
print(format("|", txt="Iyan"))
# |_Iyan_|

# the disadvantage is that we are forced to always pass txt by name, even when we are passing all values
#TypeError: format() takes from 1 to 2 positional arguments but 3 were given
#print(format("|", "-", "Iyan"))

print(format("|", "-", txt="Iyan"))

Tuesday 5 March 2024

Destructuring, astuple, attrgetter

I already talked about destructuring in JavaScript/Python/Kotlin in this previos post. So, to sum up, we can use destructuring with any iterable Python object (that's equivalent to JavaScript Array destructuring). That's nice, but it would be event more cute to have something that we could use with any object (a bit in the vein of JavaScript Object destructuring) with no need to make it iterable. I've been looking into some possibilities:

If you are working with simple dataclasses the astuple() function comes pretty handy. Notice though, that it will recursivelly retrieve elements of other dataclasses, lists, dictionaries... which probably is not what you want./p>


@dataclass
class Person:
    name: str
    city: str
    post_code: int
    age: int
    job: str

p1 = Person("Iyan", "Xixon", 33200, 49, "Coder")

name, _, pc, job, _ =  astuple(p1)
print(name, pc, job)
#Iyan 33200 Coder


That approach works in a positional way. We know the order of the the attributes of the class and we take those positions we want and discard others (with the _ convention).

Another option, this one more nominal, as we take attributes based on their name, is using operator.attrgetter, like this:


name, pc, job = operator.attrgetter("name", "post_code", "job")(p1)
print(name, pc, job)
#Iyan 33200 Coder


It looks ok, but using attribute names in its string form is a huge risk. If you rename an attribute refactoring tools will know nothing about your string, and you have to remember that you are accessing that attribute as a string to manually fix it. It's easy to make a mess... With that in mind, I think I prefer the redundancy of writing the object name multiple times:


name, pc, job = p1.name, p1.post_code, p1.job
print(name, pc, job)
#Iyan 33200 Coder


Related to this, what if we want to transform each of the values we are destructuring, but with a different transformation for each value? I'm not talking about a normal map() call where we apply the same funcion, I'm talking about applying different functions. For that, a transform function like this will do the trick:


def transform(funcs, items):
    return [func(item) for func, item in zip(funcs, items)]


That way we can write something like this:



name, pc, job = transform(
    (str.upper, lambda x: x, str.lower), 
    attrgetter("name", "post_code", "job")(p1), 
)
print(name, pc, job)
#IYAN 33200 coder


name, pc, job = transform(
    (str.upper, lambda x: x, str.lower), 
    (p1.name, p1.post_code, p1.job), 
)
print(name, pc, job)
IYAN 33200 coder

Monday 19 February 2024

Yielding aka Allowing others to Run

At the end of my previous post I talked about the yield() Kotlin function, and how we use it to cooperate with other coroutines by asking if another coroutine wants to run, and if so, suspending the caller coroutine so that another coroutine gets scheduled. We don't have a specific yield function in JavaScript or Python, we just leverage another more generic function for that purpose, let's see.

In Python we use await asyncio.sleep(0) for that. asyncio.sleep() will suspend the current coroutine in all cases, regardless of the interval provided (from the documentation: "sleep() always suspends the current task, allowing other tasks to run."). The event loop will take control and will check if another coroutine wants to run, and if not, if the interval is 0, will resume the sleeping coroutine immediatelly. So it's the same as yield in Kotlin

Javascript follows the same strategy as Python for this. The setTimeout() function always returns control to the event loop, even when the interval is 0, with the callback being added to the macrotask queue (from the documentation: "If this parameter is omitted, a value of 0 is used, meaning execute "immediately", or more accurately, the next event cycle". If there's nothing in the microtask queue or the macrotask queue, the callback we've just added will run immediatelly (in the next event loop iteration), as we've provided a 0 wait interval. OK, I'm talking about callbacks, which seems a bit odd, so notice that we can easily "promisify" the setTimeout function getting an equivalent to asyncio.sleep(), like this:


async function asleep(timeout) {
    return new Promise(res => setTimeout(res, timeout));
}

Additionally, in JavaScript we can await any value, not just a Promise. We can just write await "aa" or await Promise.resolve("aa") (both are equivalent, await "aa" is indeed transformed in await Promise.resolve("aa")). Awaiting for an already resolved Promise will add the "then callback" for that promise (in an async function that's the invokation to the next state of the state-machine for that function) to the microtask queue. When the event loop gets back the control it will schedule the next task in the microtask queue. So if there was some previous task it will be executed, else our "then callback" task will be executed, so this is equivalent to yielding.

So all in all in JavaScript we have 2 levels of yielding. We can yield to tasks in the microtask queue by awaiting a resolved Promise, or yield to tasks in the macrotask queue, by using setTimeout(0). You can read more about this here and here.

We know that Kotlin has a suspend delay() function, but we can not use it as a replacement for yield(), because delay() only suspends when the interval is bigger than 0 (from the documentation: "If the given timeMillis is non-positive, this function returns immediately". So delay(0) has no effect at all, the next line will run sequentially, no suspension and chances for other coroutines to run will happen.

I must say that I pretty like the Kotlin approach. Having a specific yield function rather than leveraging a particular case of another function is more semantic

Thursday 15 February 2024

Cancelling Kotlin Coroutines

Kotlin coroutines come with a powerful cancellation mechanism. The first thing to make clear is that Cancellation is Cooperative. This is not like killing a process with kill -9, a coroutine has to cooperate with the cancellation mechanism by checking if a cancellation has been requested.

When we create a coroutine with a CoroutineBuilder like launch() or async() we obtain a Job (Deferred derives from Job) that we can cancel by invoking its cancel() method, that makes it transition into the cancelling state. The CoroutineContext has a reference to the Job, and a suspend function has always access to its context through the "magical" coroutineContext property defined in the kotlin.Coroutines package. I call it magical because its source code looks like this:


@SinceKotlin("1.3")
@Suppress("WRONG_MODIFIER_TARGET")
@InlineOnly
public suspend inline val coroutineContext: CoroutineContext
    get() {
        throw NotImplementedError("Implemented as intrinsic")
    }

That "implemented as intrinsic" means this:

Intrinsic basically means that the implementation is internal to the compiler.
...
In general intrinsic thus means that it is something that is built in to the “translation” system rather than provided in other ways (library) but is not a specified element of the language syntax (the language itself is always built in to the compiler).

If we compile to bytecodes some Kotlin suspend function using that property and then transpile to Java code, the Java code looks like this: ((Continuation)$continuation).getContext()), which makes pretty much sense. We know that each suspend function is associated to a continuation object, and a continuation object references the CoroutineContext of the coroutine (that as I've said in turn references a Job, a Dispatcher...). So the Kotlin implementation of coroutines and suspend makes Cancellation very simple. As we know, every suspend function in a chain of suspend calls in a coroutine receives as parameter the continuation of the caller suspend function (we end up with a chain of continuations), and that continuation gives access to the CoroutineContext, hence the Job and hence checking if the Job has been cancelled. Notice that JavaScript promises do not support cancellation (though there are alternative implementations like bluebird that do so).

In JavaScript the chain of Promises of an "async call stack" is created from the deepest Promise outwards, while in Kotlin the chain of Continuations of an "async call stack" is created from the outer caller to the deepest callee, and I guess that this makes implementing cancellation more straightforward in Kotlin.

I mentioned above that "a coroutine has to cooperate with the cancellation mechanism by checking if a cancellation has been requested" but indeed most times we don't have to do anything, this is implicitly done for us, as stated in the documentation:

Coroutine cancellation is cooperative. A coroutine code has to cooperate to be cancellable. All the suspending functions in kotlinx.coroutines are cancellable. They check for cancellation of coroutine and throw CancellationException when cancelled.

So in most occasions your suspend functions will just call to other suspend functions that already take care of cancellation (not only the ones in kotlinx.coroutines, but for example also Ktor requests. Let's think of a case where that's not the case. For example a function performing several sequential CPU-bound operations. We'll run that function in the ThreadPool (in its own Coroutine with the Dispatchers.Default dispatcher) and it will check after each operation if a cancellation has been requested. We use for that coroutineContext.ensureActive()


class Message(
    val header: String,
    val content: String,
    val footer: String,
    ) {}

// Example 1, as our suspend functions invoke delay, that is cancellable, our functions are also cancellable
suspend fun decryptHeader(txt: String): String {
    println("decryptHeader started")
    Thread.sleep(2000)
    return "[[$txt]]"
}

suspend fun decryptContent(txt: String): String {
    println("decryptContent started")
    Thread.sleep(2000)
    return "[[$txt]]"
}

suspend fun decryptFooter(txt: String): String {
    println("decryptFooter started")
    Thread.sleep(2000)
    return "[[$txt]]"
}

suspend fun decryptMessage(): String {
    val message = Message("Title", "Main", "notes")
    println("decryptMessage started")
    val header = decryptHeader(message.header)
    println("header obtained: $header")
    coroutineContext.ensureActive()

    val content = decryptContent(message.content)
    println("content obtained: $content")
    coroutineContext.ensureActive()

    val footer = decryptFooter(message.footer)
    println("footer obtained: $footer")
    coroutineContext.ensureActive()

    return "$header - $content - $footer"
}

suspend fun cancelComputation(c1: Deferred) {
    val elapsed = measureTimeMillis {
        delay(1000)
        c1.cancel()
        println("after invoking cancel")
        val res = try {
            c1.await()
        } catch (ex: Exception) {
            ex.message
        }
        println("result: $res")
    }
    println("elapsed time: $elapsed")
}

fun runCancellableComputation() {
    println("started")
    runBlocking {
        // this runs in the eventloop
        val c1 = async (Dispatchers.Default) {
            // this runs in the ThreadPool
            decryptMessage()
        }
        cancelComputation(c1)

    }
    println("Finished, current thread: ${Thread.currentThread().name}")
}

/*
started
decryptMessage started
decryptHeader started
after cancelling
header obtained: [[Title]]
result: DeferredCoroutine was cancelled
elapsed time: 2023
Finished, current thread: main
*/

After running the first CPU bound function we check in one step with ensureActive() if a cancellation has been requested and exit the coroutine in that case, pretty nice. I've seen some articles where they mention using yield() rather than ensureActive(). yield will also work, as it's a "cancel aware" suspend function but it's intended for something different. With yield we are cooperating, telling other coroutines in my same dispatcher to run if they need. In cases like this where we have less coroutines (just 1) than threads in the threadpool, that yielding would have no effect, but in cases where many coroutines fight for some thread, using yield is normally a good thing (if you are coopperative), but could be different from what you want, as on some occasions you'll want that task to finish as soon as possible in detriment of other tasks. Of course, do not confuse this yield with the SequenceScope.yield used for the "generator-like functionality".

There's a pretty nice explanation about this in stackoverflow:

I would answer the question in the context of 4 related things:

Sequence yield(value: T) is totally unrelated to coroutine yield()

isActive is just a flag to identify if the coroutine is still active or cancelled. You can check this flag periodically and decide to stop current coroutine or continue. Of course, normally, we only continue if it's true. Otherwise don't run anything or throws exception, ex. CancellationException.

ensureActive() checks the isActive flag above and throws CancellationException if it's false.

Coroutine yield() not only calls ensureActive() first, but then also politely tells other coroutines in the same dispatcher that: "Hey, you guys could go first, then I will continue." The reason could be "My job is not so important at the moment." or "I am sorry to block you guys for so long. I am not a selfish person, so it's your turn now." You can understand here exactly like this meaning in dictionary: "yield (to somebody/something): to allow vehicles on a bigger road to go first." SYNONYM: give way.

An additional note. I guess the check for cancellation could have been implemented implicitly after each call to a suspend function. When a suspension function finishes the coroutine invokes continuation.resume (or resumeWith) to continue on, so that resume() method could perform that check each time it's invoked.

Thursday 8 February 2024

Kotlin Coroutines and CPU and IO bound code

Time for a short follow-up to this recent post that showed my first non-theorical fiddling with Kotlin Coroutines. In that post I was running several suspend functions concurrently, by starting several coroutines concurrently (with the async() coroutine builder), but everything was running in the same Thread, in an event-loop created by the coroutine created by the runBlocking() coroutine builder. Obviously that's great for I/O bound code, but what if we have CPU bound code that we want to run in parallel in separate threads? Particularly let's say we have functions that both do CPU crushing and IO bound tasks. We can use for this coroutines with a dispatcher that leverages the ThreadPool.

If we have a CPU bound function and we launch 2 invokations using the async coroutine builder just as we did in my previous post, the code will run sequentially. This is so because we have no suspension points inside the CPU bound function, so once the first coroutine is launched by async() it will run without suspending and it's not until when it finishes that the second coroutine is launched (everyting running in the event-loop thread)


fun doCalculation(id: Int, duration: Int): Int{
    var count = 0
    while (count++ < duration) {
        println("calculation: $id, step: $count, thread: ${Thread.currentThread().name}")
        Thread.sleep(500)
    }
    println("Finishing calculation: $id, step: $count, thread: ${Thread.currentThread().name}")
    return id
}

fun performCalculationsSequentially() {
    runBlocking {
        val time = measureTimeMillis {
            val c1 = async { doCalculation(id = 1, duration = 3) }
            val c2 = async { doCalculation(id = 2, duration = 6) }
            val results = listOf(c1, c2).awaitAll()
            println("tasks finished, ${results.joinToString(", ")}")
        }
        println("Time taken: $time") 
    }
}

/*
Started, cpubound
calculation: 1, step: 1, thread: main
calculation: 1, step: 2, thread: main
calculation: 1, step: 3, thread: main
Finishing calculation: 1, step: 4, thread: main
calculation: 2, step: 1, thread: main
calculation: 2, step: 2, thread: main
calculation: 2, step: 3, thread: main
calculation: 2, step: 4, thread: main
calculation: 2, step: 5, thread: main
calculation: 2, step: 6, thread: main
Finishing calculation: 2, step: 7, thread: main
tasks finished, 1, 2
Time taken: 4591
*/

So running coroutines with an event-loop is great for cooperating between multiple suspend functions that really have suspension points (yes, the equivalent to JavaScript async code or Python asyncio) but is useless when the function is just doing processing without suspending.

If we invoke the async coroutine builder with Dispatchers.Default. I mean: async (Dispatchers.Default), the coroutine will run in a ThreadPool, so in the below code, the calculations will run in parallel, each of them in a Thread from the ThreadPool.



fun doCalculation(id: Int, duration: Int): Int{
    var count = 0
    while (count++ < duration) {
        println("calculation: $id, step: $count, thread: ${Thread.currentThread().name}")
        Thread.sleep(500)
    }
    println("Finishing calculation: $id, step: $count, thread: ${Thread.currentThread().name}")
    return id
}

fun performCalculations3() {
    runBlocking {
        println("coroutineContext $coroutineContext, thread: ${Thread.currentThread().name}")
        val time = measureTimeMillis {
            val c1 = async (Dispatchers.Default){
                println("inside async, coroutineContext $coroutineContext")
                doCalculation(id = 1, duration = 3)
            }
            val c2 = async (Dispatchers.Default){
                println("inside async, coroutineContext $coroutineContext")
                doCalculation(id = 2, duration = 6)
            }
            val results = listOf(c1, c2).awaitAll()
            println("tasks finished: ${results.joinToString(", ")}, coroutineContext $coroutineContext, thread: ${Thread.currentThread().name}")
        }
        println("Time taken: $time") // Time taken: 3079
    }
}

/*
coroutineContext [BlockingCoroutine{Active}@46f5f779, BlockingEventLoop@1c2c22f3], thread: main
inside async, coroutineContext [DeferredCoroutine{Active}@3a286e59, Dispatchers.Default]
inside async, coroutineContext [DeferredCoroutine{Active}@5e11571d, Dispatchers.Default]
calculation: 2, step: 1, thread: DefaultDispatcher-worker-2
calculation: 1, step: 1, thread: DefaultDispatcher-worker-1
calculation: 1, step: 2, thread: DefaultDispatcher-worker-1
calculation: 2, step: 2, thread: DefaultDispatcher-worker-2
calculation: 1, step: 3, thread: DefaultDispatcher-worker-1
calculation: 2, step: 3, thread: DefaultDispatcher-worker-2
Finishing calculation: 1, step: 4, thread: DefaultDispatcher-worker-1
calculation: 2, step: 4, thread: DefaultDispatcher-worker-2
calculation: 2, step: 5, thread: DefaultDispatcher-worker-2
calculation: 2, step: 6, thread: DefaultDispatcher-worker-2
Finishing calculation: 2, step: 7, thread: DefaultDispatcher-worker-2
Time taken: 3079
*/

So the main coroutine is using the event-looop, but the 2 coroutines created with async (Dispatchers.Default) are using the ThreadPool. It's clear that the 2 coroutines are running in parallel as the code takes 3 seconds to finish (duration 6 * 500 milliseconds), otherwise it wold take 4.5 seconds (9 * 500).

In this example, given that our doCalculation function is just CPU-bound and never suspending, we could wonder why we are using coroutines at all if what we want is to run our code in a ThreadPool. Rather than that we could directly use the ThreadPoolExecutor. Well thanks to the async CoroutineBuilder we obtain Deferred objects that we can await very easily, while I guess with the ThreadPoolExecutor this is not so straight forward. Anyway, the need for coroutines with thread-pool dispatchers is way more evident when we have a suspend function that is both suspending at some IO call and running some CPU-bound code (like doing a http request to obtain a text and then encrypting it).

It is also possible to switch the kind of dispatcher that a coroutine is already using, by means of the withContext function. This does not create a new coroutine, it creates a new CoroutineContext (with that Dispatcher that we've passed to it), and the block will run with that Context and Dispatcher. Remember that when we create a coroutine it gets a CoroutineContext (containing among other things a Dispatcher). For each suspend function invoked from that coroutine a Continuation object is created, that also points to that CoroutineContext. When we invoke withContext(), the Continuation objects corresponding to the suspend functions invoked from the block passed to withContext will use that new CoroutineContext, rather than the one of the coroutine. This way we can start a coroutine running in an event-loop, and at some point invoke some suspend function that will run in the ThreadPool.

Friday 2 February 2024

Python late-bound default parameters

Last year I wrote a post about the odd behaviour of default arguments/parameters in Python (by the way, I always get confused about whether I should say arguments or parameters in this case). This behaviour comes from the fact that default parameters are bound at function definition time rather than at call time (late bound). As explained in that post, JavaScript, Kotlin and Ruby behave differently, the value for a default parameter is evaluated each time the function is call. At that time I had not paid attention to how powerful such a feature is. Looking into the MDN documentation I've seen that parameters can use previous parameters in its definition:


function greet(name, greeting, message = `${greeting} ${name}`) {
  return [name, greeting, message];
}

Kotlin documentation does not stress that much these advanced uses, but of course it also comes with them:


fun read(
    b: ByteArray,
    off: Int = 0,
    len: Int = b.size,
) { /*...*/ }

What has led me to review what default parameters allow in these languages is that I recently came across with a draft for a PEP (671), that of course was not received with particular interest by part of the community (probably the same ones that tells us that optional chaining has no particular use case...), that proposes taking Python default arguments to the next level by making them late-bound (and allowing them access to other parameters). As interesting as the proposal is the fact that one smart guy has sort of implemented it in the late module, by means of decorators.

The way to implement such thing in pure Python is not so misterious (particularly after checking the source code :-D Given that parameters are bound at definition time, let's bind something that will produce a value, rather than a value itself. Then, in order to make that producer run each time the function is invoked, let's wrap that function with another one that will do the calling. Join to this the powerful inspect(fn).signature() method, and we are done. What the late guy has implemented is really nice, but it does not seem so powerful as what the PEP proposes (and that is the same that we have in JavaScript and Kotlin). It does not allow late-bound parameters to depend on other parameters (either also late-bounds or normal). So after having checked the source code I went ahead with implementing my own version of late-binding (or call-time binding) for default parameters. Here it is:



from dataclasses import dataclass
from typing import Callable, Any
import functools
import inspect

@dataclass
class LateBound:
	resolver: Callable

def _invoke_late_bound(callable: Callable, arg_name_to_value: dict[str, Any]) -> Any:
    """
    invokes a callable passing over to it the parameters defined in its signature
    we obtain those values from the arg_name_to_value dictionary
    """
    expected_params = inspect.signature(callable).parameters.keys()
    kwargs = {name: arg_name_to_value[name]
         for name in expected_params
    }
    return callable(**kwargs)


def _add_late_bounds(arg_name_to_value: dict[str, Any], late_bounds: list[str, Callable]):
    """resolves late-bound values and adds them to the arg_name_to_value dictionary"""
    for name, callable in late_bounds:
        val = _invoke_late_bound(callable, arg_name_to_value)
        #this way one late bound can depend on a previous late boud 
        arg_name_to_value[name] = val
    

def _resolve_args(target_fn: Callable, *args, **kwargs) -> dict[str, Any]:
    """returns a dictionary with the name and value all the parameters (the ones already provided, the calculated latebounds and the normal defaults)"""
    # dictionary of the arguments and values received by the function at runtime
    # we use it to be able to calculate late_bound values based on other parameters
    arg_name_to_value: dict[str, Any] = {}
    arg_names = list(inspect.signature(target_fn).parameters.keys())
    for index, arg in enumerate(args):
        arg_name_to_value[arg_names[index]] = arg
    arg_name_to_value = {**arg_name_to_value, **kwargs}
    
    # obtain the values for all default parameters that have not been provided
    # we obtain them all here so that late_bounds can depend on other (compile-time or late-bound) default parameters
    #late bounds to calculate (were not provided in args-kwargs)
    not_late_bounds  = {name: param.default 
        for name, param in inspect.signature(target_fn).parameters.items()
        if not isinstance(param.default, LateBound) and not name in arg_name_to_value
    }
    arg_name_to_value = {**arg_name_to_value, **not_late_bounds}

    # list rather than dictionary as order matters (so that a late-bound can depend on a previous late-bound)
    late_bounds = [(name, param.default.resolver) 
        for name, param in inspect.signature(target_fn).parameters.items()
        if isinstance(param.default, LateBound) and not name in arg_name_to_value
    ]

    _add_late_bounds(arg_name_to_value, late_bounds)
    return arg_name_to_value


#decorator function
def late_bind(target_fn: Callable | type) -> Callable | type:
    """decorates a function enabling late-binding of default parameters for it"""
    @functools.wraps(target_fn)
    def wrapper(*args, **kwargs):
        kwargs = _resolve_args(target_fn, *args, **kwargs)
        return target_fn(**kwargs)

    return wrapper

And you can use it like this:


from datetime import datetime
from dataclasses import dataclass
from late_bound_default_args import late_bind, LateBound

@late_bind
def say_hi(source: str, target: str, greet: str, 
    extra = LateBound(lambda: f"[{datetime.now():%Y-%m-%d_%H%M%S}]"),
    ):
    """"""
    return f"{greet} from {source} to {target}. {extra}"

@late_bind
def say_hi2(source: str, target: str, greet: str, 
    extra = LateBound(lambda greet: f"[{greet.upper()}!]"),
    ):
    """"""
    return f"{greet} from {source} to {target}. {extra}"

print(say_hi("Xuan", "Francois", "Bonjour"))
print(say_hi2("Xuan", "Francois", "Bonjour"))

#Bonjour from Xuan to Francois. [2024-02-02_002939]
#Bonjour from Xuan to Francois. [BONJOUR!]


# access to the "self" parameter in a late-bound method works also fine
@dataclass
class Person:
    name: str
    birth_place: str

    @late_bind
    def travel(self, by: str, 
        start_city: str = LateBound(lambda self: self.birth_place), 
        to: str = "Paris"
        ):
        """ """
        return(f"{self.name} is travelling from {start_city} to {to} by {by}")
    
p1 = Person("Xuan", "Xixon")
print(p1.travel("train"))
# Xuan is travelling from Xixon to Paris by train

I've uploaded it to a gist

Friday 26 January 2024

Kotlin Coroutines Basic Example Explained

As I explained in this previous post the approach used in Kotlin to asynchronous programming (and "generators") is a bit different from what I'm used to in JavaScript and Python (and even C#). Once I've managed to mainly understand how suspend, continuation passing style, compiler magic... work together, it's time to fiddle with some real code. What should be clear is that asynchronous programming in kotlin is more customizable than in JavaScript and Python. I'm thinking mainly about the different dispatchers, with which your code can run in an eventloop or a thread pool. Indeed we have at least 2 different thread pools. We use Dispatchers.Default for a pool for "CPU bound tasks" (with as many threads as CPU's in your machine), and Dispatchers.IO for a pool for "IO bound tasks"). Read this excellent article if you want to know more.

So I've started with one of the most common tasks in asynchronous programming that I can think of, lauching several non blocking IO operations (a simulated http request to get a blog-post), wait for all of them to complete and do something with their results.

The Python code for that looks like this:


delays = {
    "A1": 3,
    "B1": 1,
    "C1": 0.1,
}

async def get_post(id: str) -> str:
    print(f"getting post: {id}")
    if not id in delays:
        await asyncio.sleep(0.5)
        raise Exception(f"Missing post: {id}")   
     
    await asyncio.sleep(delays[id])
    return f"POST: [[{id}]]"

# async def retrieve_posts_sequentially():
#     print("started")
#     post_ids = ["A1", "B1", "C1"]
#     retrieved_posts = []
#     for id in post_ids:
#         retrieved_posts.append(await get_post(id))
#     print("all posts retrieved")
#     for post in retrieved_posts:
#         print(f"post: {post}")

async def retrieve_posts():
    print("started")
    post_ids = ["A1", "B1", "C1", "D1"]
    requested_posts = [asyncio.create_task(get_post(id)) for id in post_ids]
    retrieved_posts = await asyncio.gather(*requested_posts, return_exceptions= True)
    print("all posts retrieved")
    for post in retrieved_posts:
        print(f"post: {post}")


asyncio.run(retrieve_posts())	


First of all we have to create an event loop (with asyncio.run) to run our asyncio code in it. Then we have an async function (called coroutine function in python). As we know, invoking a coroutine creates a coroutine object, but does not start running the code. To run it we have to either await it, or launch it through create_task. Awaiting the coroutine (await get_post()) would launch it and suspend the calling function, so we would not do the second call to get_post() until the first one finishes and the event loop resumes the calling function. So the requests would run sequentially one after another. That's what would happen in the commented retrieve_posts_sequentially(). On the other side, starting the coroutine through create_task, launches the function without forcing us to await it. So we can launch all our requests and then wait for all of them to complete (with await asyncio.gather()). This way the requests run concurrently in the event loop thread. That's what happens in retrieve_posts().

Let's see now the Kotlin equivalent to the above:


val idsToDelays = mapOf(
    "A1" to 1000,
    "B1" to 2000,
    "C1" to 500,
)

suspend fun getPost(id: String): String {
    println("getPost $id start, context: $coroutineContext Thread: ${Thread.currentThread().name}")
    delay(idsToDelays[id] ?: 500)
    println("getPost $id end, context: $coroutineContext Thread: ${Thread.currentThread().name}")
    return if (idsToDelays[id] !== null) "POST: [[${id}]]"
        else throw Exception("missing ID")
		
val postIds = listOf("A1", "B1", "C1")

//runBlocking {
//	println("context: $coroutineContext")
//	postIds.forEach { getPost(it) }
//}
	
runBlocking {
	// same as the previous test, but now causing an exception and handling it
	val futures = postIds.map { async {
		//this is a try-catch expression, so we don't need to write "return"
		try {
			getPost(it)
		}
		catch (ex: Exception) {
			ex.message
		}
	}}

	val posts = futures.awaitAll()
	posts.forEach(::println)
}

/*
getPost A1 start, context: [DeferredCoroutine{Active}@566776ad, BlockingEventLoop@6108b2d7] Thread: main
getPost B1 start, context: [DeferredCoroutine{Active}@1554909b, BlockingEventLoop@6108b2d7] Thread: main
getPost C1 start, context: [DeferredCoroutine{Active}@6bf256fa, BlockingEventLoop@6108b2d7] Thread: main
getPost D1 start, context: [DeferredCoroutine{Active}@6cd8737, BlockingEventLoop@6108b2d7] Thread: main
getPost C1 end, context: [DeferredCoroutine{Active}@6bf256fa, BlockingEventLoop@6108b2d7] Thread: main
getPost D1 end, context: [DeferredCoroutine{Active}@6cd8737, BlockingEventLoop@6108b2d7] Thread: main
getPost A1 end, context: [DeferredCoroutine{Active}@566776ad, BlockingEventLoop@6108b2d7] Thread: main
getPost B1 end, context: [DeferredCoroutine{Active}@1554909b, BlockingEventLoop@6108b2d7] Thread: main
POST: [[A1]]
POST: [[B1]]
POST: [[C1]]
missing ID
Finished, current thread: main
*/

So as expected we use the suspend keyword (there's no async keyword in Kotlin) to designate our suspendable (asynchronous) function. To run our suspend functions we have to do it from a coroutine (that is quite different from a Python coroutine), that we create through the runBlocking() function, that is a coroutine builder. Calling runBlocking without providing a CoroutineContext (as in this case) runs the code in an event loop, everything in the same thread. From the documentation:

Runs a new coroutine and blocks the current thread interruptibly until its completion. This function should not be used from a coroutine. It is designed to bridge regular blocking code to libraries that are written in suspending style, to be used in main functions and in tests.
The default CoroutineDispatcher for this builder is an internal implementation of event loop that processes continuations in this blocked thread until the completion of this coroutine.

If we call the suspend function directly (see the commented block) we would have the same behaviour as in the commented python block. The calling function would be suspended and would not resume until getPost is completed. It means that we would be gathering the posts sequentially. To run the suspend functions concurrently we use the async() function (another coroutine builder). This way a new coroutine is created to run each getPost(), with these coroutines running concurrently. Calling async() without providing a specific dispatcher means that the new coroutines will use the same dispatcher as the parent coroutine (the coroutine that we created with runBlocking), that is, the event loop dispatcher. async() returns a Deferred object (similar to a Python Task-Future) that will complete when the corresponding coroutine completes. We can wait for all these Deferred to complete with the awaitAll() function. So this code works almost exactly the same as the Python code, there are no additional threads, everything runs in an event loop in its single thread. Notice what I mentioned in the first paragraph, asynchronous programming in Kotlin is amazingly powerful, and we can "easily" write coroutines that use a thread pool rather than an event loop, but that will be for another post.

To summarize:
- asyncio.run() function == runBlocking() function == start an event loop
- asyncio.createTask() function == async() function
- asyncio.gather(list[Task]) function == List[Deferred].awaitAll() function

Well, as my other main language is JavaScript I think I should also write the JavaScript version:



let postIds = ["A1", "B1", "C1"];
let postPromises = postIds.map(getPost);
let retrivedPosts = await Promise.all(postPromises);
retrivedPosts.forEach(console.log);


As we know in JavaScript invoking an async function returns a Promise, and there's no need of using await for launching the execution of the function. If we await the Promise the calling function gets suspended, and hence we would be in the sequential case. So we first do all the getPost calls and gather its returned Promises, and then we await for the completion of all of them with an await Promise.all(). We don't need to start an event loop on our own, as that's the basic driver of any JavaScript runtime.

Saturday 20 January 2024

JavaScript async generators oddity

Recently I've found a stackoverflow discussion where a weird feature of JavaScript async generators is mentioned. If the async generator yields a Promise, the generator itself (well, the next() method in the corresponding generator object) will wait for the resolution of the Promise, returning its value.

I mean, this sentence for example:
yield Promise.resolve("AAA");
is replaced by this one:
yield await Promise.resolve("AAA");

This means that in this code below, the try-catch inside the generator will catch the exception:


function asleep(timeout) {
    return new Promise(res => setTimeout(res, timeout));
}

async function* asyncCities(){
    await asleep(1000);
    try {
        yield Promise.reject("rejected City");
    }
    catch (ex) {
        console.log(`Exception: ${ex} caught in the async generator`)
        yield "fixed";
    }
}

async function main() {
    let citiesGenerator = asyncCities();
    try {
        let city = await citiesGenerator.next();
        console.log(`city: ${city.value}`);
    }
    catch (ex){
        console.log(`Exception: ${ex} caught in the main function`)
    }
    //Exception: rejected City caught in the async generator
    //city: fixed
}

main();


This is slightly different from returns in asynchronous functions, where (as we saw in my previous post) if we return a Promise, the wrapping Promise will resolve to the resolution of that inner Promise, but all this is managed by the caller, not by an "invisible await" inside the async function. This means that in the below code the exception will be caught in the main function:


async function getCapital() {
    await asleep(1000);
    // this try-catch here is useless, returning a rejected promise is fine, it's outside when they wayt for it that they will get an exception and will have to handle it
    try {
        return Promise.reject("rejected City");
    }
    catch (ex) {
        console.log(`Exception: ${ex} caught in the async function`)
        return "fixed";
    }    
}

async function main() {
    // an exception happens here, as obviosuly in an async function the "return Promise" was not replaced by a "return await Promise" as it happens with the async generator
    // so the internal try-catch was useless
    try {
        let capital = await getCapital();
        console.log(`capital: ${capital}`);

    }
    catch (ex) {
        console.log(`Exception: ${ex} caught in main function`)
    }
    //Exception: rejected City caught in main function

}


I hardly can think of any situation where this behaviour will cause any gotcha, but it seemed interesting to me to mention it here.

Wednesday 17 January 2024

Promise.race and Access to the Resolved Promise

With the JavaScript Promise.race method we obtain the result/exception of the first resolved/rejected Promise. Normally that's all we need, but in some cases we would like to know the Promise itself that caused that resolution or rejection (notice that in Python asyncio.wait we obtain the Futures, not its values/exceptions), so what could we do? The solution for me is creating a "wrapper" Promise that resolves/rejects when the original Promise resolves/rejects. This wrapper Promise will resolve to the original Promise (not to its result). Initially I was thinking of resolving to a tuple with the result and the original Promise, but that's not necessary, as we can get the result by awaiting again the already resolved original Promise.

What we have to bear in mind is that a Promise A does not resolve to another Promise B, it will wait for that Promise B to be resolved to a "normal value", and then Promise A will resolve to that "normal value". I mean:


    let result = await Promise.resolve(Promise.resolve("AA"));
    console.log(result);
	//AA

    async function getMsg() {
        return Promise.resolve("AA");
    }
    result = await getMsg();
    console.log(result);
	//AA

    result = await Promise.resolve("").then(result => Promise.resolve("AA"));
    console.log(result);
	//AA
    
	result = await new Promise(resFn => resFn(Promise.resolve("AA")));
    console.log(result);
	//AA

Because of that, we will resolve the wrapper Promise to an array containing the original Promise, rather than directly to the original Promise. Else, when awaiting for the wrapper Promise we would end up getting the result of the original Promise, rather than the resolved original Promise.
So let's say we have this async getPost function.


async function asleep(timeout) {
    return new Promise(res => setTimeout(res, timeout));
}

const delays = {
    "A1": 1000,
    "B1": 2000,
    "C1": 50,
}

async function getPost(id) {
    console.log(`getting post: ${id}`)
    if (delays[id] === undefined) {
        await asleep(500);
        throw new Error(`Missing post: ${id}`);
    }

    await asleep(delays[id]);
    return `POST: [[${id}]]`;
}


We will launch a bunch of getPost actions, and we want to perform an action each time one of the posts is retrieved. We'll use Promise.race() for that, but we need to know the Promise that got resolved, so that then we can filter it out and invoke Promise.race again with the remaining ones (it's what we typically do in Python with asyncio.wait).


async function runPromises1(postPromises) {
    while (postPromises.length) {
        // Notice how we wrap the Promise in an array. That way we have a Promise that resolves to an Array of a Promise
        // if we had a Promise p1 resolving to a Promise p2, then we would have the thing that p1 would not really resolve until p2 resolves, resolving to its result
        
        // this simple syntax works fine:
        let [pr] = await Promise.race(postPromises.map(p => p.then(result => [p]).catch(ex => [p])));
        try {
			// the "internal" pr Promise is already resolved/rejected at this point
            let result = await pr;
            console.log(`resolved index: ${pr._index}, result: ${result}`);
        }
        catch (ex) {
            console.log(`Error: resolved index: ${pr._index}, exception: ${ex}`);
        }        
        postPromises = postPromises.filter(p => p !== pr);
    }
}

async function main() {
    let postPromises = ["A1", "B1", "C1", "D1"].map(getPost); // (id => getPost(id));
    postPromises.forEach((pr, index) => pr._index = index);
    await runPromises1(postPromises);
}

main();


As you can see, the important thing is this line:
await Promise.race(postPromises.map(p => p.then(result => [p]).catch(ex => [p])))
where the .then and .catch create the new Promise that will resolve/reject to an array containing the original Promise.

Rather than using then-catch we could write the above leveraging async, using an Immediately Invoked Async Arrow Function. An async function returns a new Promise that gets resolved/rejected when the function completes. As in the previous case we have to use the trick of returning the original Promise wrapped in an Array.


async function runPromises2(postPromises) {
    while (postPromises.length) {
        // Notice how we wrap the Promise in an array. That way we have a Promise that resolves to an Array of a Promise
        // if we had a Promise p1 resolving to a Promise p2, then we would have the thing that p1 would not really resolve until p2 resolves 
        
        // this more complex syntax also works fine, it's the same idea as the above
        // we have an Immediatelly Invoked Async Function Expression, it creates a Promise that resolves when the internal promise is resolved, returnig the promise itself (wrapped in an array)
        let [pr] = await Promise.race(postPromises.map(p => (async () => {
            try {
                await p;
            }
            catch {}
            return [p];
        })()));
        try {
            let result = await pr;
            console.log(`resolved index: ${pr._index}, result: ${result}`);
        }
        catch (ex) {
            console.log(`Error: resolved index: ${pr._index}, exception: ${ex}`);
        }        
        postPromises = postPromises.filter(p => p !== pr);
    }
}

This is one of those few cases where using .then().catch() looks cleaner than using async-await. Also, this need of knowing the Promise that has been resolved rather than just its value is not particularly realistic. In most cases we would just need to pass to .race/.wait... not just the getPost Promise, but a Promise for a function that both invokes getPromise and then performs the ensuing "print" action.

Tuesday 16 January 2024

Asyncio as_completed

When looking into one colleague's code recently I realised that I was missing the right/simple way to deal with one common asyncio situation. So I have a list of Awaitables (Futures/Tasks) and I want to run some code as soon as any of them completes, and continue to do so until all of them have completed. Given an async function like this that I will be callintg in paralell:



delays = {
    "A1": 2,
    "B1": 1,
    "C1": 0.1,
}

async def get_post(id: str) -> str:
    print(f"getting post: {id}")
    if not id in delays:
        await asyncio.sleep(0.5)
        raise Exception(f"Missing post: {id}")   
     
    await asyncio.sleep(delays[id])
    return f"POST: [[{id}]]"


I was using asyncio.wait in a loop, like this:



async def retrieve_posts():
    post_ids = ["A1", "B1", "C1", "D1"]
    pending_tasks = [asyncio.create_task(get_post(id)) for id in post_ids]
    while len(pending_tasks):
        done_tasks, pending_tasks = await asyncio.wait(pending_tasks, return_when=asyncio.FIRST_COMPLETED)
        for done_task in done_tasks:
            try:
                # at this point, where the done_block is a resolved Future, these 2 sentences are equivalent:
                #block_result = done_task.result()
                post = await done_task
                print(f"post obtained: {post}")
            except Exception as ex:
                print(f"Error retrieving post: {ex}")
                continue

asyncio.run(retrieve_posts())

That works, but my colleage was using a more elegant approach, asyncio.as_completed. as_completed returns an iterator that in each iteration returns a Future (a new one, not one of those that you've passed over to it). That new Future resolves as soon as one of the provided Futures returns. This means that you can rewrite the above like this:



async def retrieve_posts():
    post_ids = ["A1", "B1", "C1", "D1"]
    tasks = [asyncio.create_task(get_post(id)) for id in post_ids]
    for task in asyncio.as_completed(tasks):
        try:
            post = await task
            print(f"post obtained: {post}")
        except BaseException as ex:
            print("Exception getting post: {ex}")
            

asyncio.run(retrieve_posts())	

So asyncio.wait is a better fit when you have some awaitables and as they resolve you'll be launching additional async tasks. asyncio.as_completed should be used when you have beforehand all the awaitables that you're going to run.

Looping over awaitables brings to my mind that python feature, the async for construct (equivalent to JavaScript for-await). I wonder why as_complete returns an iterable-iterator rather than an asynchronous iterable-iterator, but well, we can easily leverage as_completed to create an asynchronous generator, like this:



    async def as_completed_generator(awaitables: list[Awaitable]):
        for aw in asyncio.as_completed(awaitables):
            try:
                res = await aw
                yield res
            except BaseException as ex:
                yield ex

    async def retrieve_posts():
        post_ids = ["A1", "B1", "C1", "D1"]
        tasks = [asyncio.create_task(get_post(id)) for id in post_ids]
        async for post in as_completed_generator(tasks):
            if not isinstance(post, Exception):
                print(f"post retrieved: {post}")
            else:
                print(f"Exception: {post}")	
				

Notice how in order to allow us manage rejected awaitables our async generator yields either values or exceptions.

I'll leverage this post about asyncio to mention something that I did not include in this previous post about Futures vs Futures. The result() method in concurent.futures.Future is a blocking method, that blocks the current thread until a result is available, while the result() method in asyncio.Future is not. It will immediatelly return a value/throw and exception if the Future has been resolved/rejected, or throw an exception (InvalidStateException) if it's still suspended.