Thursday 28 September 2023

Python, abc's vs Protocols

In languages like Kotlin or C# the differences between interfaces and abstract classes are quite clear. Both constructs have different features. Interfaces can not have state (save using very arcane Kotlin techniques) but abstract classes can. We can implement multiple interfaces but only extend one abstract class. In the past interfaces could not have "real" methods, just method declarations, but kotlin and modern C# provide default interface methods. Because of the multiple inheritance from interfaces but single inheritance from classes there's normally a semantic difference. Interfaces normally define capabilities (and a class can have multiple capabilities by implementing multiple interfaces), while using an abstract class implies an is arelation. This said, I've worked on projects where the architecs had defined interfaces without default methods, that were not intended for multiple inheritance (they were not capabilites, they were a "main responsability") and I think pure abstract classes would have made more sense.

In Python, given its dynamic nature and that it features multiple inheritance from classes, it didn't seem necessary to add interfaces to the language. I think we can say that abstract classes in Python (abc's, that were not initially present either) play the role of both Abstract classes and interfaces in other languages. Comparing the different capabilities of Python, Kotlin and C# the only reason that occurs to me for having a specific interface mechanism in Python would be for having a restricted form of abc's, lacking state. Well, there's < href="https://pypi.org/project/python-interface/">a module in pip for declaring interfaces, but reading the documentation on how they differ from abc's, the motivation is throwing errors earlier and making the methods signature part of the contract...

As we've seen in a previous post, a few releases ago Python introduced a new feature, Protocols. The main motivation was adding structural typing, but when you see the non basic examples and learn that they are a particular form of abc´s, it seems like Protocols are abc's with extra powers (structural typing) and that unless that you don't want the structural typing we should move from abc's to Protocols. Well, in python3.10 (and below) there was an important difference, protocols could not have an __init__ method, which means that they could not have useful state (you could add them attributes when executing other methods, but in most use cases state is needed at construction/inizialitation time). This lack of state indeed made them feel like interfaces. This limitation seemed more like a byproduct than like an intended feature. It was the way to prevent Protocols from being instantiated. This was implemented by replacing the __init__ method that you could have defined in the Protocol with an "empty" _no_init method. Problem with this was that it not only prevented you from instantiating the protocol (which is good) but prevented you from defining common state and initialization logic that could be invoked from child classes./p>

This limitation made sense when using protocols in the basic way, just for structural typing, but when you use them combining structural typing an abc's features, decorating the protocol methods with @abstractmethod, this __init__ replacement is not necessary, as having abstractmethods will already prevent instantiation of the protocol (same as they prevent instantiation of abstract classes).

You can read this stackoverflow discussion about the replacement of __init__ for _no_init and how it's considered by many as a problem

In python 3.11 this thing of __init__ being replaced by _no_init no longer happens (there's a long discussion about it here), so you can write code like this:



from abc import abstractmethod
from typing import Protocol

class Formatter(Protocol):
    # in python3.10 __init__ gets replaced by _no_init, so you can not instantiate  a protocol
    # but in python3.11 it's no longer the case
    def __init__(self, name):
        print("Formatter.__init__")
        self.name = name

    # If I don't declare it as abstract I'll be able to instantiate this Protocol, that makes not sense
    # so declare it as @abstract, and so I'll get the "Can't instantiate abstract class Formatter with abstract method format"
    @abstractmethod
    def format(self, txt):
        pass


class WrapFormatter(Formatter):
    def __init__(self, name, wrap):
        print("WrapFormatter.__init__")       
        # the __init__ method in the Protocol has been replaced by that _no_init thing
        super().__init__(name)
        self.wrap = wrap

    def format(self, txt):
        return f"{self.wrap}{txt}{self.wrap}"


try:
    f = Formatter("abstract formatter")
except Exception as ex:
    print(f"Exception: {ex}")
    # as I have an abstract method (format) I get: "Can't instantiate abstract class Formatter with abstract method format"

f1 = WrapFormatter("wrapper", "||")
# it prints:
# WrapFormatter.__init__
# Formatter.__init__
print(f1.format("Francois"))



With all the above my conclusion is that unless that for some reason you don't want the structural typing feature, we should use Protocol's instead of abc's.

Monday 18 September 2023

Sorting and Comparison

Comparison for sorting/ordering (that is, determining if one value is less than, equal to or greater than another value) in Python works slightly different from what I was used to in other languages. For example in C# and Kotlin the design is almost the same, so let's see that first.

Classes in C# can implement the IComparable interface to define how to sort them. Additionally you can define comparer objects (implementing the IComparer interface), so that you can define different comparison strategies for a same class. The CompareTo or Compare methods of those interfaces return an integer to indicate if one value is less, equal or greater. Basically the same goes for Kotlin, where you have the Comparable interface and the Comparator interface, which methods also return an integer.

So When we want to sort/order a collection in C# (with the order method) we have 2 options: either our collection is made up of IComparable objects, or we provide an IComparer object. There's another sorting method, OrderBy, that receives a function that for each element in the collection returns a "key", that is an IComparer object (this is a bit odd to me, it would seem more natural returning an IComparer).

The philosophy in Kotlin is almost the same (but it allows both sorting in place, with sort, and returning a new collection with sorted). So Collections have a sort method that can be used for sorting a collection of Comparable items, or any collection by providing a Comparer object. We also have a sortBy method that receives a "selector" function that for each object in the collection returns a Comparable object. As for operator overloading, the different comparison operators work on Comparable objects invoking the compareTo method.

In Python we make objects comparable by defining the Rich Comparison operators. Well, that makes a lot of methods, but indeed for sorting your objects you only need to define the __lt__ method, as explained in the documentation.

The sort routines use < when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an __lt__() method

There are some advanced cases (not just sorting comparisons) where defining the additional Rich methods is useful. Apart from its usage in numpy, I've read somewhere that SqlAlchemy does some use of them.

Same as in kotlin we can sort in place (using the sort method) or return a new collection (using the sorted function). If the collection contains comparable objects (implementing the __lt__ method) that's all, else we'll have to provide a key function (aka selector). This function receives a collection item and must return a comparable object (so it's like the sortBy - selector in Kotlin). I've never needed nothing more than this in Python, but if you compare it with C# - Kotlin where you can provide a Comparer object (apart from the key-selector function), it could seem like we are missing something. It would seem as if a Comparer object/function could better deal with complex comparison scenarios (for example compare based on several properties and several combinations of values and ranges in those properties). No, not really, but let's clarify this.

Let's say I have a Country class that I want to compare by default based on its population. It's just a matter of adding a __lt__ method to the class:



class Country:
    def __init__(self, name: str, population: int):
        self.name = name
        self.population = population
    
    def __lt__(self, other: "Country"):
        return self.population < other.population

countries = [
    Country("Germany", 80),
    Country("Spain", 48),
    Country("France", 68),
    Country("Russia", 150),
    Country("Portugal", 10),
    Country("China", 1000),
]

print("standard sorting (by population):")
sorted_countries = sorted(countries)
print([country.name for country in sorted_countries])

#standard sorting (by population):
#['Portugal', 'Spain', 'France', 'Germany', 'Russia', 'China']
       

Now let's say I want to sort it by name. OK, it's just using as key function one that returns the name.



print("sorting by name:")
sorted_countries = sorted(countries, key=lambda x: x.name)
print([country.name for country in sorted_countries])

#sorting by name:
#['China', 'France', 'Germany', 'Portugal', 'Russia', 'Spain']


And now the interesting part. I want to sort it by name, but giving the maximum weight to the France name, so it's not just a matter of returning as sorting key one property of the object. This could seem more natural to be managed with a sort of Comparer object/function, but we can do just the same with a key/selector function that returns a comparable object that implements the comparison logic that we would have set in the Comparer object. Let's see:


class FrenchNameCentricKey:
    """Think of this Key as a Comparer object that compares Country objects in an "odd" way"""
    def __init__(self, country: Country):
        self.country = country

    def __lt__(self, key: "FrenchNameCentricKey") -> bool:
        if self.country.name == "France" and key.country.name == "France":
            return False
        if self.country.name == "France":
            return False
        if key.country.name == "France":
            return True
        return self.country.name < key.country.name
        
print("French centric sorting:")
sorted_countries = sorted(countries, key=FrenchNameCentricKey)
print([country.name for country in sorted_countries])       

#French centric sorting:
#['China', 'Germany', 'Portugal', 'Russia', 'Spain', 'France']  	

The thing to bear in mind is that a comparer object/function receives 2 objects to compare, while a key-selector function returns a "comparable" object. This "comparable" object will get its comparison method invoked passing it over another instance of another "comparable" object.

In the past Python used compare functions (following the 1, 0, -1 school of thought) rather than key/selector functions. Because of that Python provides a cmp_to_key function that transforms a compare function into a key function. At first the idea seemed like magic to me, but with what I've explained above the implementation seems pretty clear to me now (notice that it returns a class not a function).



def cmp_to_key(mycmp):
    """Convert a cmp= function into a key= function"""
    class K(object):
        __slots__ = ['obj']
        def __init__(self, obj):
            self.obj = obj
        def __lt__(self, other):
            return mycmp(self.obj, other.obj) < 0
        def __gt__(self, other):
            return mycmp(self.obj, other.obj) > 0
        def __eq__(self, other):
            return mycmp(self.obj, other.obj) == 0
        def __le__(self, other):
            return mycmp(self.obj, other.obj) <= 0
        def __ge__(self, other):
            return mycmp(self.obj, other.obj) >= 0
        __hash__ = None
    return K