Tuesday 24 October 2023

Python Lazy Property

Recently I've needed a sort of Lazy Property and found out that recent versions of Python feature the functools.cached_property decorator. From the documentation

Transform a method of a class into a property whose value is computed once and then cached as a normal attribute for the life of the instance. Similar to property(), with the addition of caching.

The mechanics of cached_property() are somewhat different from property(). A regular property blocks attribute writes unless a setter is defined. In contrast, a cached_property allows writes.

The cached_property decorator only runs on lookups and only when an attribute of the same name doesn’t exist. When it does run, the cached_property writes to the attribute with the same name. Subsequent attribute reads and writes take precedence over the cached_property method and it works like a normal attribute.

To understand how the property and cached_property can have such different behaviour this discussion is a good help. It's about a previous implementation by werkzeurg but it should be mainly valid for the funtools version. First of all we have to be aware of the complex attribute lookup process in python (that I partially mentioned in my post about descriptors. It's well explained here

instance attributes take precedence over class attributes – with a caveat. Contrary to what a quite lot of people think, this is not always the case, and sometimes class attributes shadow the instance attributes. Enter descriptors.

- Data descriptors (overriding) – they define the __set__() and/or the __delete__() method (but normally __set__() as well) and take precedence over instance attributes.

- Non-data descriptors (non-overriding) – they define only the __get__() method and are shadowed by an instance attribute of the same name.

Let's see some code:


@dataclass
class Book:
    name: str

    @property
    def short_name(self):
        return self.name[0:3]

    def _calculate_isbn(self):
        print("calculating ISBN")
        return f"ISBN{datetime.now():%Y-%m-%d_%H%M%S}"

    @cached_property
    def isbn(self):
        return self._calculate_isbn()
    

print("- Normal Property")
b1 = Book("Gattaca")
try:
    b1.short_name = "AAA"
except BaseException as ex:
    print(f"Exception: {ex}")
    #Exception: can't set attribute 'short_name'

try:
    del b1.short_name
except BaseException as ex:
    print(f"Exception: {ex}")
    #Exception: can't delete attribute 'short_name'


#I can set an attribute in the dictionary with that same name
b1.__dict__["short_name"] = "short"
print(f"b1.__dict__['short_name']: {b1.__dict__['short_name']}")
# but the normal attribute lookup will get the property
print(f"b1.short_name: {b1.short_name}")

#I can delete it from the dictionary
del b1.__dict__["short_name"]

"""
- Normal Property
Exception: can't set attribute 'short_name'
Exception: can't delete attribute 'short_name'
b1.__dict__['short_name']: short
b1.short_name: Gat
"""

So a normal @property decorator creates a Data descriptor. It has __get__, __set__ and __delete__ regardless of whether you define or not the .setter and .deleter. If you have not defined them, __set__ and __delete__ will throw an exception.

And now with the cached_property:


print("\n- Cached Property")
b1 = Book("Atomka")
print(f"b1: {b1.isbn}")
# the first access has set the value in the instance dict
print(f"b1.__dict__: {b1.__dict__}")

"""
calculating ISBN
b1: ISBN2023-10-24_210718
b1.__dict__: {'name': 'Atomka', 'isbn': 'ISBN2023-10-24_222659'}
"""

So a @funtools.cached_property creates a Non-data descriptor. It only has __get__, so because of that when it creates an attribute with that same name (the first time you access the property) this instance attribute will shadow the property in the class in the next lookups.

It's very nice that you can delete that attribute that has been cached in the instance, and this way "clear the cache".


# force the property to be recalculated
print("delete to force the property to be recalculated")
del b1.isbn
print(f"b1.__dict__: {b1.__dict__}")

# it gets recalculated
print(f"b1: {b1.isbn}")

"""
delete to force the property to be recalculated
b1.__dict__: {'name': 'Atomka'}

calculating ISBN
b1: ISBN2023-10-24_210718
"""

Notice that we can set (force) the value in the instance (both before and after it's been read-cached for the first time), skipping the "get calculation".


# we can manually set the attribute, overwriting what had been calculated, maybe this is not so good
print("manually setting b1")
b1.isbn = "BBB"
print(f"b1: {b1.isbn}")

# force the property to be recalculated
print("delete to force the property to be recalculated")
del b1.isbn
print(f"b1: {b1.isbn}")

b3 = Book("Ange Rouge")
# manually set the value before its first read, so skipping the "calculation"
print("manually setting b3")
b3.isbn = "CCC"
print(f"b3: {b3.isbn}")

"""
manually setting b1
b1: BBB

delete to force the property to be recalculated
calculating ISBN
b1: ISBN2023-10-24_222659

manually setting b3
b3: CCC
"""

If you try to delete the attribute from the instance before it has been cached you'll get an exception, so deleting works the same as setting, if it's not a Data descriptor the set and delete will be tried in the instance. On the other side, you can delete the property from the class, which will cause problems in new instances:


# if I have not cached the value yet and try to delete it from the instance I get an exception
b2 = Book("Syndrome E")
try:
    del b2.isbn
except Exception as ex:
    print(f"Exception: {ex}")
    #Exception: isbn

# I can delete the property itself from the class
print("del Book.isbn")
del Book.isbn
# so now I still have the one that was cached in the instance
print(f"b1: {b1.isbn}")

# but I no longer have the cached property in the class
b2 = Book("Syndrome E")
try:
    print(f"b2: {b2.isbn}")
except Exception as ex:
    print(f"Exception: {ex}")
    #Exception: 'Book' object has no attribute 'isbn'
	
"""
Exception: isbn

del Book.isbn
b1: ISBN2023-10-24_222659

Exception: 'Book' object has no attribute 'isbn'
"""

Sunday 15 October 2023

Python enums

Every now and then I go through the same basic doubts with enums in Python (how did I do that?, can I do that?) so it sounds sensible to write here those basic things (though I guess won't be more useful than what you can easily find at here). Python enums is another example (like abstract classes) of clever use of Metaclasses to provide features that in other languages have to be managed especifically by the compiler and extra syntax.

We create enums by inheriting from the enum.Enum class, that has EnumType (in older versions it was called EnumMeta) as metaclass. Then, when you define an enum by inheriting from Enum (let's say: class Color(Enum)), EnumType becomes its metaclass, and the __new__ method in the EnumType metaclass magically takes care of traversing the attributes that you have defined in your class (let's say: RED, BLUE...) and assigns to them instances of Color with the corresponding value. You can check the source code and see that it uses for that intermediate instances of the _proto_member class.

From the documentation:

The EnumType metaclass is responsible for providing the __contains__(), __dir__(), __iter__() and other methods that allow one to do things with an Enum class that fail on a typical class, such as list(Color) or some_enum_var in Color. EnumType is responsible for ensuring that various other methods on the final Enum class are correct (such as __new__(), __getnewargs__(), __str__() and __repr__()).

Thanks to having those dunder methods in the metaclass we can iterate an enum class, use the "in" operator and so on.

I guess we can think of an enum as a class with a limited number of instances, each one having a name and a value and where each instance is accessible as a static attribute of the class. Given a Query enum:


class Query(Enum):
    SELECT = "select"
    INSERT = "insert"
    UPDATE = "update"
    DELETE = "delete"
    
my_query = Query.INSERT
print(my_query.name) # INSERT
print(my_query.value) # insert


We can obtain the enum member corresponding to a name like this


# Obtain an enum member from its name:
ins1 = Query["INSERT"]


And the enum member corresponding to a value like this


# Obtain an enum member from its value:
ins2 = Query("insert")

And remember that we have a unique member for each value, so:


print(ins1) # Query.INSERT
print(ins2) # Query.INSERT
print(ins1 == ins2) # True
print(ins1 is ins2) # True

The operations above will cause an error if the name or value do not exist, so we should better write something like this:


try:
    ins1 = Query["INSSSEERRRRT"]
except KeyError as ex:
    print(f"error: {ex}")
    
# or:
name = "INSSSEERRRRT"
ins1 = Query[name] if any(name == query.name for query in Query) else None
print(f"ins1: {ins1}")

# ---------------------------

try:
    ins2 = Query("insssseeerrtt")
except ValueError as ex:
    print(f"error: {ex}")

# or:    
value = "insssseeerrtt"
ins2 = Query(value) if any(value == query.value for query in Query) else None
print(f"ins2: {ins2}")
    

The above makes me wonder why there is not a pair of static methods in the Enum class, something like names() and values() that would return me just that, the possible names and values of an enum.

The standard Python enums are enough in most cases, but if you want something more advanced like the Java/Kotlin enums, where we can have multiple attributes as values and have instance methods (the archetypical Java Planets example), you can use the powerful aenum module (advanced enums), that just implements the Planets example.

Sunday 8 October 2023

Multiple "this" in Kotlin

I wrote a post some months ago about the use of this in different languages. I explained there how Kotlin comes with this surprising Qualified this feature that allows access to the this of enclosing scopes (each of those this is the original receiver in that scope). Summarizing:

If this has no qualifiers, it refers to the innermost enclosing scope. To refer to this in other scopes, label qualifiers are used

I mentioned in that article that I do not particularly appreciate the feature in C#, Java, Kotlin of having an implicit this. I find it confusing and prefer to use an explicit this. To my surprise and greater confusion, I've learnt that Kotlin allows for multiple implict this. You can have access to the this of enclosing scopes (for which you would use a qualified this) in an implicit way, skippig the "this@scope". Let's see an example:



class Book (var title: String) {}

class Person (var name: String) {
    fun doTest(book: Book) {
        val fn: Book.(String) -> Unit = { how ->
            // I have access to 2 implicit "receivers": Book and Person
            println("${name} is reading ${title} with ${how}")
        }
        book.fn("interest")
    }
}

fun main() {
    val p1 = Person("Francois")
    p1.doTest(Book("Atomka"))
    //
}

The function with receiver fn, has access to 2 implict this. One, an instance of Book, for its receiver, and another, an instance of Person, for the receiver of the doTest member function. We can rewrite the above function using explicit, qualified, this:



class Person (var name: String) {
    fun doTest(book: Book) {
	val fn2: Book.(String) -> Unit = { how ->
            println("${this@Person.name} is reading ${this.title} with ${how}")
        }
        book.fn2("interest")
    }
}

If you wonder how the function has access to those this of other scopes, well it's basically the same as how closures are implemented. The function is desugared into a class that has those this of other scopes (and other variables that it could be trapping in a closure) as properties.

This feature is particularly useful for Type-safe builders for DSL's.

Apart from nested functions having access to the this of the enclosing scopes, there's another pretty odd case where a function has access to multiple this, when an extension function is declared as a member of another class. You can read about it here

You can declare extensions for one class inside another class. Inside such an extension, there are multiple implicit receivers - objects whose members can be accessed without a qualifier. An instance of a class in which the extension is declared is called a dispatch receiver, and an instance of the receiver type of the extension method is called an extension receiver.

Wednesday 4 October 2023

BrokenProcessPool Exception

Using some code at work pretty similar to this code from last December I've come across an exception that was confusing me quite a lot. So I have some code more or less like this:


async def parse_blocks(blocks):
	parser = BlockParser()
	pending_blocks = []
	for block in blocks:
		future_result = executor.submit(parser.parse, block, 0)
		pending_blocks.append(future_result)) 
	while len(pending_blocks):
		done_blocks, pending_blocks = await asyncio.wait(pending_blocks, return_when=asyncio.FIRST_COMPLETED)
		for done_block in done_blocks:
			#this is equivalent to the below: parsed_block = await done_block
			parsed_block = done_block.result()
			# do whatever
	print("all blocks parsed")


The parser.parse function that I submit to the pool has a try-except wrapping its content, I mean, something like this:


class Parser:
	def parse(self, block):
		try:
			#whatever
			result = NormalResult()
		except Exception as ex:
			result = FailedResult()
		return result
				

So there's no reason for a process in the process-pool of the ProcessPoolExecutor to crash. But to my surprise the main process was crashing in parse_blocks(), with a BrokenProcessPool exception happening in parsed_block = done_block.result(). When you access the result of a "rejected" Future (OK, yes, resolved-rejected is JavaScript terminology, I mean a Future that does not complete with a result, but with an exception), the exception that you have set with set_exception() is thrown. But, how can that Future finish with an exception if as I've said the code (running in the Process-Pool) that will "resolve-reject" that Future is in a try-except?

Well, that can happen if someone kills that child process. I guess the ProcessPoolExecutor detects that one of its process has died and sets an exception in the corresponding Future (that then is propagated to the asyncio.Future, remember from my december post that I'm wrapping the concurrent.futures.Future in an asyncio.Future). That way when you try to access the Future's result you get an exception.

As for who was killing that pool process another previous post comes into play. In some occasions some of the tasks that I submit to the pool involves some massive processing, and we end up using all the RAM and swap. At that point the OOM Killer comes into play and kills the process that more memory is using, that is one of the processes of the pool. Checking the kernel ring buffer with dmesg I could find an "Out of memory: Kill process" that corresponded with one of the processes of the Pool (I was writing to my application log the id's of these processes)

While writing this post I've been thinking again about why we have 2 kinds of Futures: concurrent.futures.Future and asyncio.Future. Given that the asyncio.Future is intended for use in an eventloop, you can not block on it to get a result. When you call to result() on it if the Future has not completed yet you'll get an exception. This is different in concurrent.futures.Future, where the result() method will block waiting for the Future to complete.