Thursday, 3 April 2025

Python, leverage f_locals with exec()

In this post from 2 years ago I talked about how to create a function from a string in Python. As I explain there, given that eval() only works with expressions and exec() returns nothing, to "return" the function we had to do something a bit tricky with an assignment, particularly tricky due to the limitations of how exec() can interact with variables in its surrounding scope. I'll replicate here the code sample from that post:


fn_st = (
"def multiply(num1, num2):\n"
"    print('multiplying numbers')\n"
"    return num1 * num2\n"
)

def create_function(fn_st, fn_name):
    d = {}
    exec(fn_st + "\n" + "d['fn'] = " + fn_name)
    return d["fn"]


new_fn = create_function(fn_st, "multiply")
print("created")
print(new_fn(2, 3))
# multiplying numbers
# 6

As I explain in that post code compiled/executed by exec() or eval()) can read variables from the surrounding scope, but if it writes to them or creates new variables, those changes won't be available in the surrounding scope. To circumvent that we set the function as an entry in a dictionary, rather than directly in a variable, so with the extra indirection level it works. After writing that post I had found somewhere another technique a bit cleaner, let's see:


def create_function4(fn_st, fn_name):
    exec(fn_st, scope:={})
    return scope[fn_name]

print("started option4")
new_fn = create_function4(fn_st, "multiply")
print("created")
print(new_fn(2, 3))
# multiplying numbers
# 6

We can pass to exec() dictionaries representing the global and local variables to use in the block to execute. So in this case we pass a dictionary, that will be used as both local and global scope for the block, so the function that we define in exec gets defined in that dictionary, and we can retrieve the function from the dictionary in the outer scope. This technique corresponds to this in the documentation:

Pass an explicit locals dictionary if you need to see effects of the code on locals after function exec() returns.

On the other hand, the "unable to modify variables in the outer scope" behavior that we experience when we don't explicitly provide the globals and locals arguments corresponds to this in the documentation

In an optimized scope (including functions, generators, and coroutines), each call to locals() instead returns a fresh dictionary containing the current bindings of the function’s local variables and any nonlocal cell references. In this case, name binding changes made via the returned dict are not written back to the corresponding local variables or nonlocal cell references, and assigning, reassigning, or deleting local variables and nonlocal cell references does not affect the contents of previously returned dictionaries.

This brings me back to my post from late December about f_locals

FrameType.f_locals now returns a write-through proxy to the frame’s local and locally referenced nonlocal variables in these scopes.

This means that we can write the above function also this way, passing the f_locals of the current frame:


def create_function5(fn_st, fn_name):
    # I can not pass f_locals as globals: a TypeError: exec() globals must be a dict, not FrameLocalsProxy
    # but I can pass it as locals:
    #exec(compile(fn_st, "", "exec"), locals=inspect.stack()[0].frame.f_locals)
    exec(fn_st, locals=inspect.stack()[0].frame.f_locals)
    return locals()[fn_name]

print("started option5")
new_fn = create_function5(fn_st, "multiply")
print("created")
print(new_fn(2, 3))
# multiplying numbers
# 6

Notice that we have to pass f_locals as the locals parameter rather than as globals, cause passing it as globals we get an TypeError: exec() globals must be a dict, not FrameLocalsProxy

For this "function creation" case where we just want to retrieve the function, passing as parameter f_locals or passing a new dictionary does not make a particular difference (indeed it's way more verbose), but for cases where we want to modify local variables of the surrounding scope. f_locals is a game changer!


def another_test():
    print("another_test")
    a1 = "aaa"
    exec("a1 += 'bbb'", locals=inspect.stack()[0].frame.f_locals)
    print(f"a1: {a1}")

another_test()
# a1: aaabbb


Thursday, 27 March 2025

type.__call__ and more

In the past I've written some posts about Python metaclasses [1], [2] and [3]. Metaclasses are very powerful and very interesting particularly because you won't find them in other languages (save in Smalltalk, that I think is what inspired Python metaclasses). Notice that Ruby has probably even more powerful metaprogramming constructs, but work differently. Additionally, it seems like each time I look into metaclasses I find something that I had not thought about. I have some new stuff not covered in my previous posts, so I'll write it down here

First, this is some of the best information about metaclasses that you can find. I guess each time I need to refresh my mind about how metaclasses work I'll jump into that article. Second, I've found a use of metaclasses I'd never though about. Here one guy is using metaclasses to implement lazy objects. There are other ways to do that, but the use of metaclasses for it is an interesting approach.

There's a method that plays a crucial role in object creation, both when creating an instance of a regular class, and when creating a class (an instance of a metaclass), the type.__call__ method.

Constructing an Instance of a Class:Given a "normal" class: class Person: , when we create an instance of that class: a = Person()
the runtime searches __call__ in Person's metaclass, that is type, so it ends up invoking type.__call__
Constructing a Class with a Custom Metaclass: Given a metaclass Meta1: class Meta1(type), creating an instance of that Metaclass : class A(metaclass=Meta1)
ends up in a call like this: A = Meta1(xxx)
that will search __call__ in Meta1's metaclass, that is also type, so also a type.__call__ invokation.

The confusing thing is that type.__call__ is represented in several places with 2 different signatures:
For regular classes, __call__(cls, *args, **kwargs) handles instance creation.
For metaclasses, __call__(metacls, name, bases, namespace, **kwargs) manages class

Those 2 different signatures we should see them as "virtual signatures". It's the parameters that we have to provide in each of those 2 cases. But underneath, type.__call__ is a C function that receives a bunch of unnamed and named arguments (don't ask me how that works in C...). It will check if the first argument is a class or a metaclass (in Python we would do: issubclass(cls, type)), and depending on that it will interpret the rest of parameters as if it were signature 1 or signature 2. In both cases, type.__call__ will invoke the __new__ and __init__ methods in the class or metaclass that it received as first parameter. Well, from this article I've learned that the call to __init__ won't happen if __new__ returns an object that is not an instance of cls:

Python will always first call __new__() and then call __init__(). How could I get one to run but not the other? It turns out that one way of doing this is by changing the type (i.e. class) of the object returned by __new__(). Python will call the __init__() constructor defined for the class of the object. If we change the object’s class to something else, then the original class’s __init__() will not get run. We can do this by modifying the __class__ attribute of the object returned by __new__(), swapping it to refer to some other class.

You can see that also in the first article, that provides this implementation of a metaclass that behaves just like type does:


class Meta:
    @classmethod
    def __prepare__(metacls, name, bases, **kwargs):
        assert issubclass(metacls, Meta)
        return {}
    def __new__(metacls, name, bases, namespace, **kwargs):
        """Construct a class object for a class whose metaclass is Meta."""
        assert issubclass(metacls, Meta)
        cls = type.__new__(metacls, name, bases, namespace)
        return cls
    def __init__(cls, name, bases, namespace, **kwargs):
        assert isinstance(cls, Meta)
    def __call__(cls, *args, **kwargs):
        """Construct an instance of a class whose metaclass is Meta."""
        assert isinstance(cls, Meta)
        obj = cls.__new__(cls, *args, **kwargs)
        if isinstance(obj, cls):
            cls.__init__(obj, *args, **kwargs)
        return obj

I'll leverage this post to mention that while the metaclass of a class does not play a role when looking up an attribute in an instance of that class (p.something), it will play that role when looking up an attribute in the class. Given a MetaPerson metaclass, a Person class and a user object, user.city will search city in user and in type(user) that is Person. And Person.city will search city in Person and in type(Person) that is MetaPerson.

Sometimes you'll find comments stating that metaclasses only affect the class creation process. That's only true if the metaclass only implements the __new__ and __init__ methods. However, if we implement the __call__ method, the metaclass will affect the creation of instances of classes of the metaclass. Furthermore, we can think of other interesting uses. If we define __getattribute__ in a metaclass, it will enter in action when an attribute look up is done in a class of that metaclass (not when it's done in the instance).

It's important to note also that we could say that in the last years metaclasses have become even more "esoteric" in the sense that after the inclusion of __init_subclass__ and __set_name__ they are not necessary for some of its most common use cases (but there are still things that can only be achieved via metaclasses. There's a good explanation here

Finally, if you think metaclasses are complex, enter the world of metametaclasses! A metametaclass is a metaclass that is used as the metaclass of another metaclass, not just of a class. Well, indeed this is not that surprising, type is meta-meta class. All classes have a metaclass that if not provided explicitly, is type. So when you define a Metaclass, that is indeed a class, it's metaclass is type. This smart response exposes the 2 main uses I can envision for metametaclasses: interfering in the normal class creation process, by defining __call__ in the metaclass of another metaclass and allowing composition of metaclasses by defining __add__ in their metaclass.

Sunday, 23 March 2025

FileSystems and Inodes

When writing this recent post about file-locking I came across an interesting Unix/Linux feature: you can remove (or move) an open file. The file entry in the Filesystem is removed (so you can not open it again), but if any process already has the file open, the data-blocks that make up the file will remain until any process that has the file open is closed. What they mention here is that the inode for that file remains (until all processes with a handle to the file are closed). This applies not just to files opened by a process, but to the process itself. I mean, a process corresponds to an executable file, I can remove that executable file and the process will remain running normally until it decides to finish. I've mentioned inodes, buff, I think I had not thought about what an inode is and how filesystems work in almost 2 decades! so I think it's time to refresh my mind and write down here a summary. From wikipedia

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data.[1] File-system object attributes may include metadata (times of last change,[2] access, modification), as well as owner and permission data.[3]

A directory is a list of inodes with their assigned names. The list includes an entry for itself, its parent, and each of its children.

So an inode contains data about a file (attributes containing metadata) and the file data (via pointers to the file data-blocks). How the file system converts a path "/usr/xose/myFile.txt" to an inode which data and metadata can use goes like this: A directory is a file, a file which data (what is stored in the data-blocks pointed out from that directory inode) are pairs of fileName -> inodeNumer. So in my example, for the "xose" directory we have a file (an inode) that contains an entry like this: "myFile.txt, 11111" (inode number). Walking back, usr is a file with an entry "xose, xose-inode", and the same for the "/" root directory. OK, and where is the entry that tells us the inode number for the "/" root directory? Well, that's a fixed number, that in principle for all unix filesystems is inode 2. This discussion makes a good read:

Directories are just special files that map an inode number to a string filename. Each inode is numbered and usually represents an offset in some array-like structure in the filesystem. This mapping between inode to filename is a hard link. A file must have 1 or more hard links to be accessible. If you create another hard link, you’re just pointing another filename to the same inode. All of them are equally “the file”, and there’s no way to detect which hard link came first. As part of the inode contents, there’s a counter of how many hard links each inode has. It’s eligible for cleanup and reuse when this count is zero.

The root directory is usually some specially reserved inode number.

They mention hard links. I have to shamefully admit that I'd always being a bit confused about the hard link vs symbolic link difference (I come from a Windows background...) when it's a damn simple thing. A hard link corresponds to the "name" part in the "name, inode number" pairs that we have in a directory (remember, a directory is a file containing "name to inode pairs"). You can have multiple hard links pointing to a same inode, and indeed you can not differentiate which one was created first. That's why you can read in the wikipedia article Inodes do not contain their hard link names, only other file metadata. Sure, cause as multiple names can reference that inode, it would be a mess to keep track of that in the inode itself. Symbolic-links (aka soft links) are quite a different thing. They are files that contain just a path to another file. As wikipedia explains:

A symbolic link contains a text string that is automatically interpreted and followed by the operating system as a path to another file or directory. This other file or directory is called the "target". The symbolic link is a second file that exists independently of its target.

So as I've aforementioned, given a hard link:
"/usr/xose/myFile.txt", inside the "xose" directory-file we have an entry: [myFile.txt, 11111 (inode number)]
while for a soft link "/apps/important/file1.txt -> /usr/xose/myFile.txt" we have:
- an entry inside the "important" directory-file: [file1.txt, 22222 (inode number)]
- the data contained inside the 22222 inode, that is just "/usr/xose/myFile.txt"
- The OS (that knows how to treat symbolic links) will handle that path as if it had given to it in first instance.

From the previously linked discussion:

You asked about symbolic links. As I mentioned above, they’re a special kind of file. The filesystem knows to interpret its contents differently. The content of a directory is the mapping for filenames, but the content for symbolic links (soft links) is a file path string. Symlinks consume new inodes, and they do not increment the destination file’s hard link count. Deleting the destination file does not update any symlinks pointing to them.

So notice that an inode contains a counter of how many hard links point to it. There's another great discussion here:

The term hardlink is actually somewhat misleading. While for symlinks source and destination are clearly distinguishable (the symlink has its own entry in the inode table), this is not true for hardlinks. If you create a hardlink for a file, the original entry and the hardlink are indistinguishable in terms of what was there first. (Since they refer to the same inode, they share their file attributes such as owner, permissions, timestamps etc.) This leads to the statement that every directory entry is actually a hardlink, and that hardlinking a file just means to create a second (or third, or fourth...) hardlink. In fact, each inode stores a counter for the number of hardlinks to that inode.

The directory entries of "original file" and "hard link" are totally indistinguishable in quality: both establish a reference between a file name and the inode of a file.

One of the main visible differences between hardlinks and symlinks (a.k.a. softlinks) is that symlinks work across filesystems while hardlinks are confined to one filesystem. That is, a file on partition A can be symlinked to from partition B, but it cannot be hardlinked from there. This is clear from the fact that a hardlink is actually an entry in a directory, which consists of a file name and an inode number, and that inode numbers are unique only per file system.

Notice that a process references the files that it has open by inode, not by path. Well, this all comes down to File Descriptors and will be the subject of a next post.

Normally users don't have to deal with inodes in their daily life, but there are at least a couple of situations where some basic knowledge about them will come handy. In inode based filesystems normally (for example with ext4) the number of inodes of that filesystem is determined when that filesystem is created. This is so because inodes are preallocated and stored in an inode table. If we have many small files in our filesystem it could happen that we use all the available inodes without having used all the disk space, so we won't be able to create new files though our friend df -h will tell us that there's disk space available. Using df -i we can see the inode usage.

There are chances that we have heard about another inodes related concept, orphan inodes. An orphan inode is an inode that is still allocated but there's no longer any directory entry pointing to it. This can be something normal, like when we have deleted a file by it's still open by a process, as when the process finishes the inode will be released, or problematic, due to some system crash during a write operation, or a crash in the previous situation, a process keeps a open file that has been deleted and the OS crashes, so no time for deleting the inode. Hopefully, filesystems keep a list of orphan inodes and normally will be able to delete them in the next boot.

Friday, 14 March 2025

Kotlin cast operator vs !! operator

We know that null-safety is a key element in Kotlin programming. Kotlin will try by all means to prevent you from writing code that could throw an infamous NPE (Null Pointer Exception). Anyway, as the article mentions, there are a few cases where you still can get a NPE, like a problematic "leaky constructor" (it's something I'd never though about) and the not-null assertion operator(!!).

The not-null assertion operator !! converts any value to a non-nullable type.

When you apply the !! operator to a variable whose value is not null, it's safely handled as a non-nullable type, and the code executes normally. However, if the value is null, the !! operator forces it to be treated as non-nullable, which results in an NPE.

So the !! operator allows us to "convert" a type from nullable to not nullable. "convert" here does not mean transforming a value (like transforming from int to string or viceversa) but transforming its "contract" (its type indeed). We had the contract that the value was "string or null" and now we narrow that contract to just "string". If we are sure that the value adheres to this more restrictive contract, there are cases when it will be useful. If our assumption fails and the value is indeed null, we'll get a terrible NPE


>>> val a: String? = null;
>>> a!!
java.lang.NullPointerException
	at Line_4.(Line_4.kts:1)

But, this thing of changing the type, it feels familiar to me, it's what in other languages I've always used casting for. Of course Kotlin supports casting, with the as "unsafe" cast operator and as? safe (nullable) cast operator. So we can write:


>>> val a: String? = "aaa"   

// this throws an exception
>>> val b: String = a
//error: type mismatch: inferred type is String? but String was expected
//val b: String = a

// but this works fine                
>>> val b: String = a as String

// and this one also
>>> val c: String = a!!

So using the not-null assertion operator(!!) and the "as" unsafe cast operator seem to be just equivalent, the only difference is that when they fail they throw different exceptions, NullPointerException for !! and ClassCastException for the "as" operator. So they are so similar that one can wonder why the !! operator was introduced? I think the reasoning for having (and using) a particular not-null assertion operator is that it's more specific. It only serves one purpose, converting from nullable to not nullable, while the cast operator is broader, it can convert from any type to any other type. So when using !! you are more clearly communicating the idea of skipping null safety.

A bit related to this I came across this StackOverflow question about the difference between "x as? String" and "x as String?". If x is a String or null, both cases are equivalent, we get a nullable String. The difference is when x is neither null nor String, in that case the safe cast will return null while the unsafe cast will throw an exception. I just copy-paste the code from here


fun  safeCast(t: T){
    val res = t as? String //Type: String?
}

fun  unsafeCast(t: T){
    val res = t as String? //Type: String?
}

fun test(){
    safeCast(1234);//No exception, `res` is null
    unsafeCast(null);//No exception, `res` is null
    unsafeCast(1234);//throws a ClassCastException
}

Wednesday, 5 March 2025

Exceptions vs Errors

I think I've always thought of Errors as "old-school" error codes, and Exceptions as that "more modern" thing that you throw/raise and catch. I also used to expect exceptions to have an Exception suffix (I guess my C# background). Over the years I've noticed how JavaScript and Python follow a different convention for Exceptions, and I've recently learnt about some Java particularities (beyond the checked-unchecked mess). So I've thought it would be a good idea to write a post about this.

Java has a Throwable class, and any object that you intend to throw has to be an instance of a class inheriting from Throwable. Then, Java makes a clear distinction between Errors and Exceptions, as we have different classes for them, both inheriting from Throwable, so can be thrown and caught, but there's a clear semantic difference. An Error represents something critical, you can catch it for logging it, but most likely you can not recover from it and there's nothing more you can do. An Exception represents and "exceptional" situation from which in principle you can recover. We have to take into account one extra thing, that awful idea of checked (you are forced to handling them) and unchecked exceptions (you are not). I think checked Exceptions represent expected situations/problems (so in the end they are not so exceptional) and as such you have to be ready to deal with them and recover. Unchecked Exception (those inheriting from RuntimeException) represent conditions that probably should never happen, but if they were to occur, maybe you could manage to recover from them. This discussion has helped me to wrap my head around all this.

In JavaScript you can throw any object, but there's an Error class, and your custom errors/exceptions should inherit from Error, as the different built-in errors/exceptions do. Most of these built-ins have the Error suffix, save for some cases like DOMException, but it also inherits from Error. I'm not sure if there's a reason for having suffixed a few classes as "Exception" rather than "Error" or if it's just an inconsistency. Based on the Java logic one should think that the "Error" suffix is used for serious, unrecoverable errors/exceptions, and the "Exception" suffix for less critical stuff you can recover from. I've read some discussions, like this and have not found a clear answer. ChatGPT and Claude seem to recommend to use the "Error" suffix for all custom exceptions, but I see much code that does just the contrary, so I'm inclined to use "Exception" if it's not critical. All in all we can say that in JavaScript does not impose any distinction between Error and Exception (as the base class used for all errors/exceptions in the standard library is named just Error), and we are free to establish that distinction by naming our classes one way or another.

In Python we raise objects that are instances of BaseException or any of its subclasses. For that we can either create the instance and raise it, or raise the class (and Python will take care of creating an instance):

The sole argument to raise indicates the exception to be raised. This must be either an exception instance or an exception class (a class that derives from BaseException, such as Exception or one of its subclasses). If an exception class is passed, it will be implicitly instantiated by calling its constructor with no arguments:

We also have the concept of fatal (not recoverable) exceptions vs non-fatal ones:

BaseException is the common base class of all exceptions. One of its subclasses, Exception, is the base class of all the non-fatal exceptions. Exceptions which are not subclasses of Exception are not typically handled, because they are used to indicate that the program should terminate. They include SystemExit which is raised by sys.exit() and KeyboardInterrupt which is raised when a user wishes to interrupt the program.

And now the confusing part. Checking the Exception Hierarchy you can see that many built-in exceptions inheriting from Exception are named with the "Error" suffix. PEP-8 says this about naming:

Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix “Error” on your exception names (if the exception actually is an error)

The "if the exception actually is an error" feels pretty important to me. In the Exception hierarchy you can also see that there are several xxxWarning classes, and the infamous StopIteration class. So in Python it seems like Errors are one kind of Exceptions, and there are other kinds of Exceptions that are not real Errors, just warnings or normal situations like finishing an iterator. I think we can say that Python does not differentiate between Exceptions and Errors (as the base class in the hierarchy is just BaseException), but between Exceptions that are Errors, and Exceptions that are not Errors, and does this based on the naming convention for the derived classes, using the Error suffix for errors.

I have to add that the use of an exception to indicate the end of an Iterator (StopIteration in Python, NoSuchElementException in Java) is something that has always feel rather odd to me. It's a totally normal situation, so using an Exception for something that is part of the normal flow feels strange to me. I quite prefer the JavaScript approach, returning an object with the done property set to true.

Sunday, 2 March 2025

Kotlin pseudo-constructor

There's an uncommon language feature that I first came across with in JavaScript, and then in Python. The ability for a constructor to return an object different from the one being constructed. I've talked about JavaScript "constructors" (I use quotes as any function save arrows and methods (meaning those defined with the modern ES6 class syntax) can be used with new to construct an object) several times in the past [1] and [2], and the main idea of how they work is:

The new operator takes a function F and arguments: new F(arguments...). It does three easy steps:
Create the instance of the class. It is an empty object with its __proto__ property set to F.prototype.
Initialize the instance.
The function F is called with the arguments passed and "this" set to be the instance.
If the function returns an object instead of undefined (which is the default return value), that object will "replace" "this" as the result of the new expression.

Just to make it clear:


class Person {
	constructor() {
		return "Not a real Person";
	}
}

We can do the same in Python by defining a custom __new__() method in our class. Object creation and initialization in Python works like this:
hen creating an instance of a class A (a = A()) it invokes its metaclass __call__ (so for "normal" classes it's type.__call__ and for classes with a custom metaclass it's CustomMetaclass.__call__). Then type.__call__(cls, *args, *kwargs) basically does this: invokes cls.__new__(cls) (that unless overriden is object.__new__) and with the returned instance object it invokes cls.__init__(instance).
So we can write something like this:


class Person:
	def __new__(cls, *args, **kwargs):
	    print("In new")
	    # this would be the normal implementation
	    #return super().__new__(cls,*args, **kwargs)
	    # and this our odd one:
	    return "Not a real Person"

The obvious question is, what's this feature useful for? Mainly I think it's a very nice way to implement a transparent Singleton or Object pool. You invoke the constructor and it decides to return always the same obect or one object form a pool. The client is not awaire of it, this "singletonness" or object caching. You could also think of a constructor that returns instances of derived classes based on its arguments, so the constructor becomes a factory.

Apparently Kotlin lacks this feature. Constructors implicitly return an instance of its class, that's all. Well, the thing is that there is a "hack" to get a behaviour similar to that of JavaScript and Kotlin, by combining the invoke() operator and companion objects. When a class has an invoke() method it makes its instances invokable, so it's the equivalent to Python's __call__ and callable objects. We know that when a class has a companion object its methods are accessble through the class. So if we have a MyClass class that has a companion object, and that companion has an invoke method, we can do "MyClass.invoke()" that can be just rewritten as "MyClass()", which just looks as a constructor call (thought it's not). The invoke in our companion can return whatever it wants, so we have a "constructor-like function" that can return whatever it wants. Notice that for the Companion.invoke to get invoked when called without "()" we can not have any real constructor in the main class with the same signature as our invoke() (making the real constructor private will work). I mean


class OsManager private constructor() {  // Private constructor prevents instantiation
    companion object {
        operator fun invoke(): String {
            println("inside invoke")
            return "OS Manager"
        }
    }
}


class OsManager2() {
    companion object {
        operator fun invoke(nm: String): String {
            println("inside invoke")
            return "OS Manager"
        }
    }
}

// These 2 will invoke the "pseudo-constructor"
val os1 = OsManager() // equivalent to OsManager.invoke()
val os2 = OsManager2("aa") // equivalent to OsManager2.invoke("aa")


This trick can be used hence for implementing transparent object pools or singletons (well, indeed a singleton is an object pool with a single element). Let's see a Singleton example ()


class Singleton private constructor() {  // Private constructor prevents instantiation
    init {
        println("Singleton instance created")
    }

    companion object {
        private val instance: Singleton by lazy { Singleton() }

        operator fun invoke(): Singleton {
            return instance
        }
    }
}

fun main() {
    val obj1 = Singleton()  // Calls the `invoke` operator
    val obj2 = Singleton()  // Calls the `invoke` operator again

    println(obj1 === obj2)  // true, same instance
}

An interface can also have a companion object, which allows this pretty nice pattern that I've found here. A sealed interface with a companion object with an invoke method that acts as a factory for instances of the different classes implementing that interface.

Saturday, 22 February 2025

Python walrus limitation

Sometime ago I talked about Python Assignment Expressions aka walrus operator, and overtime I've really come to appreciate it. Some weeks ago I came across and odd limitation of this operator, it can not be used for assigning to an attribute of an object (so you can use it only with variables), as you'll get an "SyntaxError: cannot use assignment expressions with attribute" error. I don't remember what I was trying when I hit this problem, but now I can think of an example like the verify method below:


class Country:
    def __init__(self, name):
        self.name = name
        self.cities = None
        self.last_verification = None
    
    def _lookup_cities(self):
        print("looking up cities")
        return ["Paris", "Toulouse", "Lyon"]

    def verify(self):  
	# [code to perform verification here]
        print(f"last verification done at: {(self.last_verification := datetime.now())}")
        # SyntaxError: cannot use assignment expressions with attribute
        

So the above throws a SyntaxError: cannot use assignment expressions with attribute. I can think of one technique to circunvent this limitation, leveraging an "assign()" custom function that I use sometimes to conveniently set several properties in one go).


def assign(val: Any, **kwargs) -> Any:
    for key, value in kwargs.items():
        setattr(val, key, value)
    return val

    def verify(self):
	# [code to perform verification here]
        print(f"last verification done at: {assign(self, last_verification=datetime.now()).last_verification}")

That syntax is cool, but having the print() call as the most visible part of the statement is probably confusing, as it makes us think that the important action in that line is print, while setting the last_verification attribute is the real deal in that line. So probably using the "traditional syntax" would make sense:


    def verify(self):
	# [code to perform verification here]
        self.last_verification = datetime.now()
        print(f"last verification done at: {self.last_verification}")

Another example for using this technique:


    def _lookup_cities(self):
        print("looking up cities")
        return ["Paris", "Toulouse", "Lyon"]
		
    def get_cities(self) -> list[str]:
        # return self.cities or (self.cities := self._lookup_cities())
        # SyntaxError: cannot use assignment expressions with attribute
        return self.cities or assign(self, cities=self._lookup_cities()).cities


Notice that this case could be rewritten using a lazy property via functools @cached_property


    @cached_property
    def cities(self):
        print(f"initializing lazy property")
        return self._lookup_cities()


That looks really neat, but notice that I think we should be careful with the use of cached/lazy properties. On one hand, cities represents data belonging to the object, it's part of its state, so using a property rather than a method feels natural. But on the other hand, to obtain those cities maybe we do a http or db request, an external request. This kind of external interaction can be considered as a side-effect, so in that sense we should use a method. In general, I think lazy properties should only be used for data that is calculated based on other data belonging to the object (and if that data is read-only or we observe it and update the property accordingly, and if that calculation is not too lenghty, as accessing a property should always be fast). This is an interesting topic and it has prompted me to revisit this stackoverflow question that I remember to have read several times over the last 15 years.

I have to add that these examples would look much nicer if we had a pipe operator (like in elixir) and we could write something like this:
return self.cities or (self | assign(cities=self._lookup_cities()) | .cities)