Sunday, 29 May 2022

Some Python Tricks

I've been writing a lot of python code these last months and there are some simple tricks that were not so obvious to me the first weeks and that now have become quite essential in my day to day. Let's see:

Dictionary access and default Initially, when getting a value from a dictionary and assigning a default value if the key was missing, I was using the "sort of" ternary operator, like this:


my_value = (my_dict["key"] if key in my_dict
	else "default")

Well, there's a much cleaner way to do this, the get method:


my_value = my_dict.get("key", "default")

First item matching condition I'm pretty used to the .Net First(predicate) and FirstOrDefault(predicate) Linq Extension Methods. It seems odd to me that python lacks a "first(predicate)" function (either builtin or in itertools), but well, I've managed to interiorize the pythonic way to do this. Use a generator expression to apply the filter (rather than a list comprehension, so that you get an iterator that lazily advances as you call it) and then call next on it.


def cities_generator():
    cities = ["Toulouse", "Paris", "Porto", "Prague", "Xixon"]
    for city in cities:
        print(f"reading: {city}")
        yield city

first_p = next(city 
    for city in cities_generator()
    if city.startswith("P")
)

That's equivalent to First(). As for FirstOrDefault(), the next() function happens to accept a default value to return if the iterator is finished. Nice:


first_o = next((city 
    for city in cities_generator() 
    if city.startswith("O")
    ), None
)

itertools.takewhile and itertools.dropwhile. These are (particularly the former) pretty useful functions. They are the equivalents to .Net TakeWhile and SkipWhile, nothing more to say.

I guess almost everyone agrees that slicing is one of the most beautiful python features. Slicing does not work with iterators, so the obvious solution is to convert to iterator to a list and apply the slicing. That's fine with small iterables, but if the iterator is going to return many items and you just need an intermediate slice, processing all of them with the list conversion is a real waste. itertools.islice comes to the rescue. Obviously, it will have to process all the previous items to those in the slice that you are requesting, but it won't process the ones that come after the slice, and furthermore it won't store in memory all the previous (not needed) elements, so you are also saving memory space, not just CPU cycles. An important notice, islice does not support negative indexes.

To get the last item in an iterator that matches a predicate I think the best that we can do is to convert the iterator to a list, create a reverse iterator with reversed, and then call next. I use the reversed() function rather than reversing the list with list.reverse() cause the second one would do extra work by traversing the whole list to create the (in place) reverse list.


last_p = next(city 
    for city in reversed(list(cities_generator()))
    if city.startswith("P")
)

Thursday, 19 May 2022

Dictionary Comprehensions and groupby

Dictinary comprehensions have been part of Python (2.7 and 3.1) since 2009-2010, so more or less when my long python hiatus started, meaning that I have not discovered them until recently. You won't need them with such frequency as List Comprehensions, but every now and then you'll have a chance to make your code more cute thanks to them. You can check several use cases here

I've recently combined them with itertools.groupby, and it seems worth to me to post it here. Let's say I have several cities that I want to group them in a countries dictionary.


class City:
    def __init__(self, name, country):
        self.name = name
        self.country = country

cities = [
    City("Toulouse", "France"),
    City("Prague", "Czech Republic"),
    City("Paris", "France"),
    City("Lisbon", "Portugal"),
    City("Porto", "Portugal")
]

A first approach to group them in a dictionary would be something like this:


def traditional_approach(cities):
    countries = {}
    for city in cities:
        if not city.country in countries:
            countries[city.country] = [city]
        else:
            countries[city.country].append(city)
    return countries

countries = traditional_approach(cities)
print(json.dumps(countries, indent=4, default=lambda x: x.__dict__))

Using group_by and dictionary comprehensions we have this:


def group_by_approach(cities):
    key_fn = lambda city: city.country
    cities.sort(key=key_fn)
    city_groups = itertools.groupby(cities, key_fn)   # iterable[str, grouper]
    return {key:list(group) for key,group in city_groups}

countries = group_by_approach(cities)
print(json.dumps(countries, indent=4, default=lambda x: x.__dict__))

Maybe the second code is not clearer than the first one, but it looks so nice :-) Notice that itertools.groupby has a huge gotcha if you are used to sql group by, or .Net Linq GroupBy: you have to previously sort your list by the key that you'll use later on to group:

It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order.

Update 2022/08/17. I've found out today that there's another way to do this, leverage the setdefault dictionary method:


def traditional_approach_2(cities):
    countries = {}
    for city in cities:
        countries.setdefault(city.country, []).append(city)
    return countries

Sunday, 8 May 2022

Python Lazy Object Initialization

Over the years I've posted several times about "Lazy objects" (Lazy initialization) in different languages. When implementing it in Python these days I've come up with a solution that I think is really interesting.

Of course the Lazy object is transparent. I mean, the client does not know if the object is being lazily initialized or not, "laziness" is an implementation detail that does not affect the interface. This has nothing to do with things like C#'s Lazy<T>. The class of the object that we're going to lazily initialize does not require any changes either. We design our class normally and then when we see it fit to use lazy initialization we create a lazy object from it.

Both in C# and in JavaScript I've used proxy objects to implement lazy initialization. I think this is the common approach and looks quite good to me, but it has some performance implications. Once the object has been initialized the proxy mechanism remains in place and it still checks in each property access if you already have done the initialization or not (so that you use the initialized instance or invoke the initialization method).

Python has an amazing combination of features that allows the clean lazy initialization technique that I'm going to show here:

  • The __getattribute__ and __setattr__ methods. When defined in a class these methods intercept the access to properties in instancess of that class or instances of derived classes. They intercept the access both "directly" (I mean, instance.attribute) and through the getattr and setattr functions. Notice that adding these methods directly to an instance of a class rather than the class, will have no effect.
  • Same as JavaScript Python allows us to create a class inside a function. This class can inherit from the value provided in one variable (so it can dinamically inherit from one class or another). We can return this dynamic class from our method for further use outside it.

I leverage these features to create lazy objects by means of creating a new class that inherits from the class of the instance that I want to behave lazily. In this class I define the __getattribute__ and __setattr__ hooks/traps, and in the __init__ method I store the parameters for later reuse. I carefully invoke the initialization method through the class rather than the instance, to avoid further triggering of the traps. What is very interesting is that once I do the object initialization, I remove both hooks from the class, so they will no longer interfere with ensuing attributes access, and hence no performance penalties.
So I have a factory function that creates instances of a dynamic _Lazy class:


def lazy(cls, *args, **kwargs):
    class _Lazy(cls):
        def __init__(self, *args, **kwargs):
            _Lazy.original_args = args
            _Lazy.original_kwargs = kwargs


        def _lazy_init(self):
            print(f"_lazy_init")
            # remove the traps so that they do not interfere in the next accesses
            del _Lazy.__setattr__
            del _Lazy.__getattribute__
            # invoke the __init__ of the "target" class
            super().__init__(*self.original_args, *self.original_kwargs)
            #change the parent class so that when we do a "type()" we no longer get "_Lazy", but the "real" class
            self.__class__ = _Lazy.__bases__[0]
            
        
        def __setattr__(self, name, value):
            print(f"setting attribute: {name}")
            #self._lazy_init() # can't do this as it will trigger the traps again
            # however, traps do not have effect on accesses to attributes through the class itself rather than through instances
            _Lazy._lazy_init(self)
            setattr(self, name, value)

        def __getattribute__(self, name):
            print(f"getting attribute: {name}")
            _Lazy._lazy_init(self)
            return getattr(self, name)
    
    return _Lazy(*args, **kwargs)

And I use it like this:


class Person:
    def __init__(self, name, age):
        print(f"{Person.__name__}.__init__")
        self.name = name
        self.age = age

    def say_hi(self, to_someone):
        print(f"{self.name} with age {self.age} says Bonjour to {to_someone}")

def test_1():
    lazy_p1 = lazy(Person, "Lazy Francois", 14)
    print(f"type: {type(lazy_p1).__name__}")
    # initialization takes place
    lazy_p1.say_hi("Xose") 
    
    # trap has been deactivated
    print(lazy_p1.name)
    lazy_p1.say_hi("Xose") 
    print(f"type: {type(lazy_p1).__name__}")

test_1()
# output:
# type: _Lazy

# getting attribute: say_hi
# _lazy_init
# Person.__init__

# Lazy Francois with age 14 says Bonjour to Xose
# Lazy Francois
# Lazy Francois with age 14 says Bonjour to Xose

# type: Person


As the icing of the cake, once the initialization is done I also change the __class__ of the instance (yes, one more Python powerful feature), to point again to the original class rather than the derived _Lazy one. This way, if we check the type with type we get the original type, meaning that after the initialization we no longer can know if the object was initialized normally or lazily.

I've uploaded the code to this gist.