Thursday, 27 March 2025

type.__call__ and more

In the past I've written some posts about Python metaclasses [1], [2] and [3]. Metaclasses are very powerful and very interesting particularly because you won't find them in other languages (save in Smalltalk, that I think is what inspired Python metaclasses). Notice that Ruby has probably even more powerful metaprogramming constructs, but work differently. Additionally, it seems like each time I look into metaclasses I find something that I had not thought about. I have some new stuff not covered in my previous posts, so I'll write it down here

First, this is some of the best information about metaclasses that you can find. I guess each time I need to refresh my mind about how metaclasses work I'll jump into that article. Second, I've found a use of metaclasses I'd never though about. Here one guy is using metaclasses to implement lazy objects. There are other ways to do that, but the use of metaclasses for it is an interesting approach.

There's a method that plays a crucial role in object creation, both when creating an instance of a regular class, and when creating a class (an instance of a metaclass), the type.__call__ method.

Constructing an Instance of a Class:Given a "normal" class: class Person: , when we create an instance of that class: a = Person()
the runtime searches __call__ in Person's metaclass, that is type, so it ends up invoking type.__call__
Constructing a Class with a Custom Metaclass: Given a metaclass Meta1: class Meta1(type), creating an instance of that Metaclass : class A(metaclass=Meta1)
ends up in a call like this: A = Meta1(xxx)
that will search __call__ in Meta1's metaclass, that is also type, so also a type.__call__ invokation.

The confusing thing is that type.__call__ is represented in several places with 2 different signatures:
For regular classes, __call__(cls, *args, **kwargs) handles instance creation.
For metaclasses, __call__(metacls, name, bases, namespace, **kwargs) manages class

Those 2 different signatures we should see them as "virtual signatures". It's the parameters that we have to provide in each of those 2 cases. But underneath, type.__call__ is a C function that receives a bunch of unnamed and named arguments (don't ask me how that works in C...). It will check if the first argument is a class or a metaclass (in Python we would do: issubclass(cls, type)), and depending on that it will interpret the rest of parameters as if it were signature 1 or signature 2. In both cases, type.__call__ will invoke the __new__ and __init__ methods in the class or metaclass that it received as first parameter. Well, from this article I've learned that the call to __init__ won't happen if __new__ returns an object that is not an instance of cls:

Python will always first call __new__() and then call __init__(). How could I get one to run but not the other? It turns out that one way of doing this is by changing the type (i.e. class) of the object returned by __new__(). Python will call the __init__() constructor defined for the class of the object. If we change the object’s class to something else, then the original class’s __init__() will not get run. We can do this by modifying the __class__ attribute of the object returned by __new__(), swapping it to refer to some other class.

You can see that also in the first article, that provides this implementation of a metaclass that behaves just like type does:


class Meta:
    @classmethod
    def __prepare__(metacls, name, bases, **kwargs):
        assert issubclass(metacls, Meta)
        return {}
    def __new__(metacls, name, bases, namespace, **kwargs):
        """Construct a class object for a class whose metaclass is Meta."""
        assert issubclass(metacls, Meta)
        cls = type.__new__(metacls, name, bases, namespace)
        return cls
    def __init__(cls, name, bases, namespace, **kwargs):
        assert isinstance(cls, Meta)
    def __call__(cls, *args, **kwargs):
        """Construct an instance of a class whose metaclass is Meta."""
        assert isinstance(cls, Meta)
        obj = cls.__new__(cls, *args, **kwargs)
        if isinstance(obj, cls):
            cls.__init__(obj, *args, **kwargs)
        return obj

I'll leverage this post to mention that while the metaclass of a class does not play a role when looking up an attribute in an instance of that class (p.something), it will play that role when looking up an attribute in the class. Given a MetaPerson metaclass, a Person class and a user object, user.city will search city in user and in type(user) that is Person. And Person.city will search city in Person and in type(Person) that is MetaPerson.

Sometimes you'll find comments stating that metaclasses only affect the class creation process. That's only true if the metaclass only implements the __new__ and __init__ methods. However, if we implement the __call__ method, the metaclass will affect the creation of instances of classes of the metaclass. Furthermore, we can think of other interesting uses. If we define __getattribute__ in a metaclass, it will enter in action when an attribute look up is done in a class of that metaclass (not when it's done in the instance).

It's important to note also that we could say that in the last years metaclasses have become even more "esoteric" in the sense that after the inclusion of __init_subclass__ and __set_name__ they are not necessary for some of its most common use cases (but there are still things that can only be achieved via metaclasses. There's a good explanation here

Finally, if you think metaclasses are complex, enter the world of metametaclasses! A metametaclass is a metaclass that is used as the metaclass of another metaclass, not just of a class. Well, indeed this is not that surprising, type is meta-meta class. All classes have a metaclass that if not provided explicitly, is type. So when you define a Metaclass, that is indeed a class, it's metaclass is type. This smart response exposes the 2 main uses I can envision for metametaclasses: interfering in the normal class creation process, by defining __call__ in the metaclass of another metaclass and allowing composition of metaclasses by defining __add__ in their metaclass.

Sunday, 23 March 2025

FileSystems and Inodes

When writing this recent post about file-locking I came across an interesting Unix/Linux feature: you can remove (or move) an open file. The file entry in the Filesystem is removed (so you can not open it again), but if any process already has the file open, the data-blocks that make up the file will remain until any process that has the file open is closed. What they mention here is that the inode for that file remains (until all processes with a handle to the file are closed). This applies not just to files opened by a process, but to the process itself. I mean, a process corresponds to an executable file, I can remove that executable file and the process will remain running normally until it decides to finish. I've mentioned inodes, buff, I think I had not thought about what an inode is and how filesystems work in almost 2 decades! so I think it's time to refresh my mind and write down here a summary. From wikipedia

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data.[1] File-system object attributes may include metadata (times of last change,[2] access, modification), as well as owner and permission data.[3]

A directory is a list of inodes with their assigned names. The list includes an entry for itself, its parent, and each of its children.

So an inode contains data about a file (attributes containing metadata) and the file data (via pointers to the file data-blocks). How the file system converts a path "/usr/xose/myFile.txt" to an inode which data and metadata can use goes like this: A directory is a file, a file which data (what is stored in the data-blocks pointed out from that directory inode) are pairs of fileName -> inodeNumer. So in my example, for the "xose" directory we have a file (an inode) that contains an entry like this: "myFile.txt, 11111" (inode number). Walking back, usr is a file with an entry "xose, xose-inode", and the same for the "/" root directory. OK, and where is the entry that tells us the inode number for the "/" root directory? Well, that's a fixed number, that in principle for all unix filesystems is inode 2. This discussion makes a good read:

Directories are just special files that map an inode number to a string filename. Each inode is numbered and usually represents an offset in some array-like structure in the filesystem. This mapping between inode to filename is a hard link. A file must have 1 or more hard links to be accessible. If you create another hard link, you’re just pointing another filename to the same inode. All of them are equally “the file”, and there’s no way to detect which hard link came first. As part of the inode contents, there’s a counter of how many hard links each inode has. It’s eligible for cleanup and reuse when this count is zero.

The root directory is usually some specially reserved inode number.

They mention hard links. I have to shamefully admit that I'd always being a bit confused about the hard link vs symbolic link difference (I come from a Windows background...) when it's a damn simple thing. A hard link corresponds to the "name" part in the "name, inode number" pairs that we have in a directory (remember, a directory is a file containing "name to inode pairs"). You can have multiple hard links pointing to a same inode, and indeed you can not differentiate which one was created first. That's why you can read in the wikipedia article Inodes do not contain their hard link names, only other file metadata. Sure, cause as multiple names can reference that inode, it would be a mess to keep track of that in the inode itself. Symbolic-links (aka soft links) are quite a different thing. They are files that contain just a path to another file. As wikipedia explains:

A symbolic link contains a text string that is automatically interpreted and followed by the operating system as a path to another file or directory. This other file or directory is called the "target". The symbolic link is a second file that exists independently of its target.

So as I've aforementioned, given a hard link:
"/usr/xose/myFile.txt", inside the "xose" directory-file we have an entry: [myFile.txt, 11111 (inode number)]
while for a soft link "/apps/important/file1.txt -> /usr/xose/myFile.txt" we have:
- an entry inside the "important" directory-file: [file1.txt, 22222 (inode number)]
- the data contained inside the 22222 inode, that is just "/usr/xose/myFile.txt"
- The OS (that knows how to treat symbolic links) will handle that path as if it had given to it in first instance.

From the previously linked discussion:

You asked about symbolic links. As I mentioned above, they’re a special kind of file. The filesystem knows to interpret its contents differently. The content of a directory is the mapping for filenames, but the content for symbolic links (soft links) is a file path string. Symlinks consume new inodes, and they do not increment the destination file’s hard link count. Deleting the destination file does not update any symlinks pointing to them.

So notice that an inode contains a counter of how many hard links point to it. There's another great discussion here:

The term hardlink is actually somewhat misleading. While for symlinks source and destination are clearly distinguishable (the symlink has its own entry in the inode table), this is not true for hardlinks. If you create a hardlink for a file, the original entry and the hardlink are indistinguishable in terms of what was there first. (Since they refer to the same inode, they share their file attributes such as owner, permissions, timestamps etc.) This leads to the statement that every directory entry is actually a hardlink, and that hardlinking a file just means to create a second (or third, or fourth...) hardlink. In fact, each inode stores a counter for the number of hardlinks to that inode.

The directory entries of "original file" and "hard link" are totally indistinguishable in quality: both establish a reference between a file name and the inode of a file.

One of the main visible differences between hardlinks and symlinks (a.k.a. softlinks) is that symlinks work across filesystems while hardlinks are confined to one filesystem. That is, a file on partition A can be symlinked to from partition B, but it cannot be hardlinked from there. This is clear from the fact that a hardlink is actually an entry in a directory, which consists of a file name and an inode number, and that inode numbers are unique only per file system.

Notice that a process references the files that it has open by inode, not by path. Well, this all comes down to File Descriptors and will be the subject of a next post.

Normally users don't have to deal with inodes in their daily life, but there are at least a couple of situations where some basic knowledge about them will come handy. In inode based filesystems normally (for example with ext4) the number of inodes of that filesystem is determined when that filesystem is created. This is so because inodes are preallocated and stored in an inode table. If we have many small files in our filesystem it could happen that we use all the available inodes without having used all the disk space, so we won't be able to create new files though our friend df -h will tell us that there's disk space available. Using df -i we can see the inode usage.

There are chances that we have heard about another inodes related concept, orphan inodes. An orphan inode is an inode that is still allocated but there's no longer any directory entry pointing to it. This can be something normal, like when we have deleted a file by it's still open by a process, as when the process finishes the inode will be released, or problematic, due to some system crash during a write operation, or a crash in the previous situation, a process keeps a open file that has been deleted and the OS crashes, so no time for deleting the inode. Hopefully, filesystems keep a list of orphan inodes and normally will be able to delete them in the next boot.

Friday, 14 March 2025

Kotlin cast operator vs !! operator

We know that null-safety is a key element in Kotlin programming. Kotlin will try by all means to prevent you from writing code that could throw an infamous NPE (Null Pointer Exception). Anyway, as the article mentions, there are a few cases where you still can get a NPE, like a problematic "leaky constructor" (it's something I'd never though about) and the not-null assertion operator(!!).

The not-null assertion operator !! converts any value to a non-nullable type.

When you apply the !! operator to a variable whose value is not null, it's safely handled as a non-nullable type, and the code executes normally. However, if the value is null, the !! operator forces it to be treated as non-nullable, which results in an NPE.

So the !! operator allows us to "convert" a type from nullable to not nullable. "convert" here does not mean transforming a value (like transforming from int to string or viceversa) but transforming its "contract" (its type indeed). We had the contract that the value was "string or null" and now we narrow that contract to just "string". If we are sure that the value adheres to this more restrictive contract, there are cases when it will be useful. If our assumption fails and the value is indeed null, we'll get a terrible NPE


>>> val a: String? = null;
>>> a!!
java.lang.NullPointerException
	at Line_4.(Line_4.kts:1)

But, this thing of changing the type, it feels familiar to me, it's what in other languages I've always used casting for. Of course Kotlin supports casting, with the as "unsafe" cast operator and as? safe (nullable) cast operator. So we can write:


>>> val a: String? = "aaa"   

// this throws an exception
>>> val b: String = a
//error: type mismatch: inferred type is String? but String was expected
//val b: String = a

// but this works fine                
>>> val b: String = a as String

// and this one also
>>> val c: String = a!!

So using the not-null assertion operator(!!) and the "as" unsafe cast operator seem to be just equivalent, the only difference is that when they fail they throw different exceptions, NullPointerException for !! and ClassCastException for the "as" operator. So they are so similar that one can wonder why the !! operator was introduced? I think the reasoning for having (and using) a particular not-null assertion operator is that it's more specific. It only serves one purpose, converting from nullable to not nullable, while the cast operator is broader, it can convert from any type to any other type. So when using !! you are more clearly communicating the idea of skipping null safety.

A bit related to this I came across this StackOverflow question about the difference between "x as? String" and "x as String?". If x is a String or null, both cases are equivalent, we get a nullable String. The difference is when x is neither null nor String, in that case the safe cast will return null while the unsafe cast will throw an exception. I just copy-paste the code from here


fun  safeCast(t: T){
    val res = t as? String //Type: String?
}

fun  unsafeCast(t: T){
    val res = t as String? //Type: String?
}

fun test(){
    safeCast(1234);//No exception, `res` is null
    unsafeCast(null);//No exception, `res` is null
    unsafeCast(1234);//throws a ClassCastException
}

Wednesday, 5 March 2025

Exceptions vs Errors

I think I've always thought of Errors as "old-school" error codes, and Exceptions as that "more modern" thing that you throw/raise and catch. I also used to expect exceptions to have an Exception suffix (I guess my C# background). Over the years I've noticed how JavaScript and Python follow a different convention for Exceptions, and I've recently learnt about some Java particularities (beyond the checked-unchecked mess). So I've thought it would be a good idea to write a post about this.

Java has a Throwable class, and any object that you intend to throw has to be an instance of a class inheriting from Throwable. Then, Java makes a clear distinction between Errors and Exceptions, as we have different classes for them, both inheriting from Throwable, so can be thrown and caught, but there's a clear semantic difference. An Error represents something critical, you can catch it for logging it, but most likely you can not recover from it and there's nothing more you can do. An Exception represents and "exceptional" situation from which in principle you can recover. We have to take into account one extra thing, that awful idea of checked (you are forced to handling them) and unchecked exceptions (you are not). I think checked Exceptions represent expected situations/problems (so in the end they are not so exceptional) and as such you have to be ready to deal with them and recover. Unchecked Exception (those inheriting from RuntimeException) represent conditions that probably should never happen, but if they were to occur, maybe you could manage to recover from them. This discussion has helped me to wrap my head around all this.

In JavaScript you can throw any object, but there's an Error class, and your custom errors/exceptions should inherit from Error, as the different built-in errors/exceptions do. Most of these built-ins have the Error suffix, save for some cases like DOMException, but it also inherits from Error. I'm not sure if there's a reason for having suffixed a few classes as "Exception" rather than "Error" or if it's just an inconsistency. Based on the Java logic one should think that the "Error" suffix is used for serious, unrecoverable errors/exceptions, and the "Exception" suffix for less critical stuff you can recover from. I've read some discussions, like this and have not found a clear answer. ChatGPT and Claude seem to recommend to use the "Error" suffix for all custom exceptions, but I see much code that does just the contrary, so I'm inclined to use "Exception" if it's not critical. All in all we can say that in JavaScript does not impose any distinction between Error and Exception (as the base class used for all errors/exceptions in the standard library is named just Error), and we are free to establish that distinction by naming our classes one way or another.

In Python we raise objects that are instances of BaseException or any of its subclasses. For that we can either create the instance and raise it, or raise the class (and Python will take care of creating an instance):

The sole argument to raise indicates the exception to be raised. This must be either an exception instance or an exception class (a class that derives from BaseException, such as Exception or one of its subclasses). If an exception class is passed, it will be implicitly instantiated by calling its constructor with no arguments:

We also have the concept of fatal (not recoverable) exceptions vs non-fatal ones:

BaseException is the common base class of all exceptions. One of its subclasses, Exception, is the base class of all the non-fatal exceptions. Exceptions which are not subclasses of Exception are not typically handled, because they are used to indicate that the program should terminate. They include SystemExit which is raised by sys.exit() and KeyboardInterrupt which is raised when a user wishes to interrupt the program.

And now the confusing part. Checking the Exception Hierarchy you can see that many built-in exceptions inheriting from Exception are named with the "Error" suffix. PEP-8 says this about naming:

Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix “Error” on your exception names (if the exception actually is an error)

The "if the exception actually is an error" feels pretty important to me. In the Exception hierarchy you can also see that there are several xxxWarning classes, and the infamous StopIteration class. So in Python it seems like Errors are one kind of Exceptions, and there are other kinds of Exceptions that are not real Errors, just warnings or normal situations like finishing an iterator. I think we can say that Python does not differentiate between Exceptions and Errors (as the base class in the hierarchy is just BaseException), but between Exceptions that are Errors, and Exceptions that are not Errors, and does this based on the naming convention for the derived classes, using the Error suffix for errors.

I have to add that the use of an exception to indicate the end of an Iterator (StopIteration in Python, NoSuchElementException in Java) is something that has always feel rather odd to me. It's a totally normal situation, so using an Exception for something that is part of the normal flow feels strange to me. I quite prefer the JavaScript approach, returning an object with the done property set to true.

Sunday, 2 March 2025

Kotlin pseudo-constructor

There's an uncommon language feature that I first came across with in JavaScript, and then in Python. The ability for a constructor to return an object different from the one being constructed. I've talked about JavaScript "constructors" (I use quotes as any function save arrows and methods (meaning those defined with the modern ES6 class syntax) can be used with new to construct an object) several times in the past [1] and [2], and the main idea of how they work is:

The new operator takes a function F and arguments: new F(arguments...). It does three easy steps:
Create the instance of the class. It is an empty object with its __proto__ property set to F.prototype.
Initialize the instance.
The function F is called with the arguments passed and "this" set to be the instance.
If the function returns an object instead of undefined (which is the default return value), that object will "replace" "this" as the result of the new expression.

Just to make it clear:


class Person {
	constructor() {
		return "Not a real Person";
	}
}

We can do the same in Python by defining a custom __new__() method in our class. Object creation and initialization in Python works like this:
hen creating an instance of a class A (a = A()) it invokes its metaclass __call__ (so for "normal" classes it's type.__call__ and for classes with a custom metaclass it's CustomMetaclass.__call__). Then type.__call__(cls, *args, *kwargs) basically does this: invokes cls.__new__(cls) (that unless overriden is object.__new__) and with the returned instance object it invokes cls.__init__(instance).
So we can write something like this:


class Person:
	def __new__(cls, *args, **kwargs):
	    print("In new")
	    # this would be the normal implementation
	    #return super().__new__(cls,*args, **kwargs)
	    # and this our odd one:
	    return "Not a real Person"

The obvious question is, what's this feature useful for? Mainly I think it's a very nice way to implement a transparent Singleton or Object pool. You invoke the constructor and it decides to return always the same obect or one object form a pool. The client is not awaire of it, this "singletonness" or object caching. You could also think of a constructor that returns instances of derived classes based on its arguments, so the constructor becomes a factory.

Apparently Kotlin lacks this feature. Constructors implicitly return an instance of its class, that's all. Well, the thing is that there is a "hack" to get a behaviour similar to that of JavaScript and Kotlin, by combining the invoke() operator and companion objects. When a class has an invoke() method it makes its instances invokable, so it's the equivalent to Python's __call__ and callable objects. We know that when a class has a companion object its methods are accessble through the class. So if we have a MyClass class that has a companion object, and that companion has an invoke method, we can do "MyClass.invoke()" that can be just rewritten as "MyClass()", which just looks as a constructor call (thought it's not). The invoke in our companion can return whatever it wants, so we have a "constructor-like function" that can return whatever it wants. Notice that for the Companion.invoke to get invoked when called without "()" we can not have any real constructor in the main class with the same signature as our invoke() (making the real constructor private will work). I mean


class OsManager private constructor() {  // Private constructor prevents instantiation
    companion object {
        operator fun invoke(): String {
            println("inside invoke")
            return "OS Manager"
        }
    }
}


class OsManager2() {
    companion object {
        operator fun invoke(nm: String): String {
            println("inside invoke")
            return "OS Manager"
        }
    }
}

// These 2 will invoke the "pseudo-constructor"
val os1 = OsManager() // equivalent to OsManager.invoke()
val os2 = OsManager2("aa") // equivalent to OsManager2.invoke("aa")


This trick can be used hence for implementing transparent object pools or singletons (well, indeed a singleton is an object pool with a single element). Let's see a Singleton example ()


class Singleton private constructor() {  // Private constructor prevents instantiation
    init {
        println("Singleton instance created")
    }

    companion object {
        private val instance: Singleton by lazy { Singleton() }

        operator fun invoke(): Singleton {
            return instance
        }
    }
}

fun main() {
    val obj1 = Singleton()  // Calls the `invoke` operator
    val obj2 = Singleton()  // Calls the `invoke` operator again

    println(obj1 === obj2)  // true, same instance
}

An interface can also have a companion object, which allows this pretty nice pattern that I've found here. A sealed interface with a companion object with an invoke method that acts as a factory for instances of the different classes implementing that interface.

Saturday, 22 February 2025

Python walrus limitation

Sometime ago I talked about Python Assignment Expressions aka walrus operator, and overtime I've really come to appreciate it. Some weeks ago I came across and odd limitation of this operator, it can not be used for assigning to an attribute of an object (so you can use it only with variables), as you'll get an "SyntaxError: cannot use assignment expressions with attribute" error. I don't remember what I was trying when I hit this problem, but now I can think of an example like the verify method below:


class Country:
    def __init__(self, name):
        self.name = name
        self.cities = None
        self.last_verification = None
    
    def _lookup_cities(self):
        print("looking up cities")
        return ["Paris", "Toulouse", "Lyon"]

    def verify(self):  
	# [code to perform verification here]
        print(f"last verification done at: {(self.last_verification := datetime.now())}")
        # SyntaxError: cannot use assignment expressions with attribute
        

So the above throws a SyntaxError: cannot use assignment expressions with attribute. I can think of one technique to circunvent this limitation, leveraging an "assign()" custom function that I use sometimes to conveniently set several properties in one go).


def assign(val: Any, **kwargs) -> Any:
    for key, value in kwargs.items():
        setattr(val, key, value)
    return val

    def verify(self):
	# [code to perform verification here]
        print(f"last verification done at: {assign(self, last_verification=datetime.now()).last_verification}")

That syntax is cool, but having the print() call as the most visible part of the statement is probably confusing, as it makes us think that the important action in that line is print, while setting the last_verification attribute is the real deal in that line. So probably using the "traditional syntax" would make sense:


    def verify(self):
	# [code to perform verification here]
        self.last_verification = datetime.now()
        print(f"last verification done at: {self.last_verification}")

Another example for using this technique:


    def _lookup_cities(self):
        print("looking up cities")
        return ["Paris", "Toulouse", "Lyon"]
		
    def get_cities(self) -> list[str]:
        # return self.cities or (self.cities := self._lookup_cities())
        # SyntaxError: cannot use assignment expressions with attribute
        return self.cities or assign(self, cities=self._lookup_cities()).cities


Notice that this case could be rewritten using a lazy property via functools @cached_property


    @cached_property
    def cities(self):
        print(f"initializing lazy property")
        return self._lookup_cities()


That looks really neat, but notice that I think we should be careful with the use of cached/lazy properties. On one hand, cities represents data belonging to the object, it's part of its state, so using a property rather than a method feels natural. But on the other hand, to obtain those cities maybe we do a http or db request, an external request. This kind of external interaction can be considered as a side-effect, so in that sense we should use a method. In general, I think lazy properties should only be used for data that is calculated based on other data belonging to the object (and if that data is read-only or we observe it and update the property accordingly, and if that calculation is not too lenghty, as accessing a property should always be fast). This is an interesting topic and it has prompted me to revisit this stackoverflow question that I remember to have read several times over the last 15 years.

I have to add that these examples would look much nicer if we had a pipe operator (like in elixir) and we could write something like this:
return self.cities or (self | assign(cities=self._lookup_cities()) | .cities)

Thursday, 13 February 2025

File Locking 2

One decade ago I wrote this post about file-locking (aka file-sharing) and I've been revisiting it lately. That post was focused on Windows, and indeed I had not realized that "the attitude" towards file-locking in Windows and Linux is pretty different. We can say that in Windows file-locking is an integral part of how the the OS manages files. The sharing mode (dwShareMode) parameter of the CreateFile function (used to open files) determines file-sharing.

On the other side, for Unix-like OS's file-locking does not seem to be a major concern. By default no locking of any kind is performed when opening a file (so multiple processes can read and write to the file at the same time). The open() system call does not have any parameter related to locking-sharing. It's true that there is support for file-locking (fcntl, flock, lockf), but it's rather loose, as we can say that it's cooperative:

File locks under Unix are by default advisory. This means that cooperating processes may use locks to coordinate access to a file among themselves, but uncooperative processes are also free to ignore locks and access the file in any way they choose. In other words, file locks lock out other file lockers only, not I/O.

The above was pretty reassuring cause at work we have one application (not developed by us) that appends information to a file, and we wanted to write some code that would be periodically reading that file, with both applications running on Linux. We could not afford opening the file for reading and that right at that moment the other application tried to open it for writing and failed (maybe crashing, as we don't know what kind of error handling, if any, the application has).

For full peace of mind I did a fast check. Open a file for reading in Python (fr = open("test.txt", "r")) and while it's open append lines to it from the terminal (echo hi >> test.txt). No crash and the file gets updated without a problem.

The surprising thing is that doing just the same test on Windows also works fine! Well, indeed it makes good sense, let me explain. The Python open() function does not provide any sort of "File Sharing" parameter, while the different .Net file opening methods do. One can easily think that this is because while both languages are multiplatform, Python was born in a more Linux oriented community, while .Net was for many years Windows-centric, so both libraries reflect what exists in their "favorite" OS. But at the same time, I guess Python developers decided that it should try to show the same behaviour on any OS. Given that Python's open() on Linux can not provide any locking behaviour (cause as I've already mentioned the underlying Linux open() system call does not), it should do the same on Windows, so when it invokes the underlying Windows API CreateFileW function, it does so requesting Read and Write sharing. From here:

Python’s builtin open() shares read and write access, but not delete access. If you need a different share mode, you’ll have to call CreateFile directly via ctypes or PyWin32’s win32file module.

Sunday, 2 February 2025

JavaScript Arguments and Arrow Functions

After writing last week about how to sort of emulate JavaScript's arguments-object in Python, it seems a good idea to mention the special behaviour of the arguments-object in arrow functions. When arrow functions were added to JavaScript most articles promptly informed about the particular behaviour of the this-object in arrow functions (that is indeed one of its main features). The main idea that got stamped in my brain was "arrow functions do not have dynamic this, but lexical this". This means that they do not receive as "this" value the "receiver", the object on which they get invoked, but the "this" of their lexical environment. I used to think that such "this" value would get bound to the arrow function in a similar way to when we use function.bind(), but it's not like that. Arrow functions are not bound-functions with some particular property pointing to the "this", but they just look it up in its scope chain. From [1] and [2]:

[1] In essence, lexical this means that an arrow function will keep climbing up the scope chain until it locates a function with a defined this keyword. The this keyword in the arrow function is determined by the function containing it.

[2] Arrow functions lack their own this binding. Therefore, if you use the this keyword inside an arrow function, it behaves just like any other variable. It will lexically resolve to an enclosing scope that defines a this keyword.

What has prompted this post is another thing that I had not realized until recently (and it's not so hard as it comes in the MDN documentation) is that something similar happens with the arguments-object.

Arrow functions don't have their own bindings to this, arguments, or super, and should not be used as methods.

"don't have their own bindings" means that they'll be looked up in the scope chain (as any other varible). Arrow functions do not have its own arguments-object containing the arguments received by the arrow, but look up that arguments-object in the scope chain, so it will contain the arguments received by the first enclosing function (at the time of the arrow creation, not of the arrow invokation) that is not an arrow. Notice how in the example below, it prints "aaa" (from its scope chain) rather than "bbb" (that is what it receives as parameter).


// arrow functions do not have an "arguments" or "this" binding, they take them from their scope chain
function f2(a) {
    console.log("inside f");
    let fn1 = (b) => console.log(arguments);
    return fn1;
}

let arrow = f2("aaa")
arrow("bbb")
//[Arguments] { '0': 'aaa' }

I'll leverage this post to mention that changing where an entry in the arguments-object points to changes also where the variable itself points to. So this behaviour corresponds to what happens in Python with the FrameType.f_locals write-through proxy, not to what we have in Python via locals() (a snapshot through which we can not change where the original variables point to).


function test(user) {
    console.log(`user: ${user}`)
    console.log(`arguments[0]: ${arguments[0]}`);
    arguments[0] = "Iyán";
    console.log(`arguments[0]: ${arguments[0]}`);
    console.log(`user: ${user}`)
}

test("Xuan");

// user: Xuan
// arguments[0]: Xuan
// arguments[0]: Iyán
// user: Iyán


Sunday, 26 January 2025

JavaScript vs Python: arguments vs locals()

It's not uncommon to have a function that acts as an adapter, wrapping another existing function, having its same parameters and performing some action before and/or after invoking the original function with those parameters. JavaScript provides a very convenient way to invoke the original function without having to write again that full list of arguments in the call, just using the arguments array-like object and the spread operator. I mean:


function doAction(a, b, c) {

}
function checkParams(a, b, c) {
	if (a === "undefined") {
		return null;
	}
	return doAction(...arguments);
}

In many cases the arguments object is not necessary, we can just use the rest operator in the adapter function rather than the list of arguments. That's great when the adapter function is not doing anything with the arguments, for example when creating generic wrappers like this:


function doAction(a, b, c) {

}

function createLoggingWrapper(fn) {
	return (...params) => {
		console.log("started");
		res = fn(...params);
		console.log("finished");
		return res;
	};
}

In Python the equivalent to the rest-spread operaters is the packing-unpacking operator (that given that Python features named parameters has extra support for that) . So we can write the second example just like this:


from functools import wraps

def doAction(a, b, c):
    print(f"{a}{b}{c}")


def createLoggingWrapper(fn):
    @wraps(fn)
    def logged(*args, **kwargs):
        print("started")
        res = fn(*args, **kwargs)
        print("finished")
        return res
    return logged

doAction = createLoggingWrapper(doAction)
doAction("AA", c="CC", b="BB")

But what about the first example? We don't have a direct equivalent to the arguments object, but we have a workaround that is almost equivalent, using locals(). I already talked about the locals() function in a previous post. It returns the list of variables in the "local namespace" at that moment (that is function arguments and local variables), so if we run it before any local variable is declared we can use it as the JavaScript arguments object. Let's see:


def format(name, age, city, country):
    return f"formatting: {name} - {age} [{city}, {country}]"

def format_with_validation(name, age, city, country):
    if city:
        print("validation OK")
        return format(**locals()) # equivalent to: format(name=name, age=age, city=city, country=country)
    else:
        print("validation KO")
        return None

print(format_with_validation("Francois", 40, "Paris", "France"))
print(format_with_validation("Francois", 40, "Lyon", "France"))

# validation OK
# formatting: Francois - 40 [Paris, France]
# validation OK
# formatting: Francois - 40 [Lyon, France]

There's one gotcha. locals() returns also those variables (free-vars) trapped by closures. So if our wrapping function happened to be working as a closure, this technique will not work, as we'll be passing extra parameters to the wrapped function (the "counter" variable in the example below).


def format(name, age, city, country):
    return f"formatting: {name} - {age} [{city}, {country}]"

def create_format_with_validation():
    counter = 0
    def format_with_validation(name, age, city, country):
        nonlocal counter       
        counter +=1
        print(f"invokation number: {counter}")
        if city:
            print("validation OK")
            return format(**locals()) # equivalent to: format(name=name, age=age, city=city, country=country)
        else:
            print("validation KO")
            return None
    return format_with_validation

format_with_validation = create_format_with_validation()
try:
    print(format_with_validation("Francois", 40, "Paris", "France"))
except Exception as ex:
    print(f"Error: {ex}")
    # TypeError: format() got an unexpected keyword argument 'counter'. Did you mean 'country'?


Well, we can easily fix that by removing from the object returned by locals the entries corresponding to trapped variables. Indeed, Python is so amazingly instrospective that we can automate that. The code object of one function has a co_freevars attribute that gives us the names of the freevars used by that function. So we can write a "remove_free_vars" function and do this:


def remove_free_vars(loc_ns: dict, fn) -> dict:
    return {
        key: value for key, value in loc_ns.items()
        if key not in fn.__code__.co_freevars
    }

def create_format_with_validation_closure_aware():
    counter = 0
    def format_with_validation(name, age, city, country):
        nonlocal counter       
        counter +=1
        print(f"invokation number: {counter}")
        if city:
            print("validation OK")
            return format(**(remove_free_vars(locals(), format_with_validation))) 
        else:
            print("validation KO")
            return None
    return format_with_validation

format_with_validation = create_format_with_validation_closure_aware()
print(format_with_validation("Francois", 40, "Paris", "France"))

There's one additional gotcha. As locals() returns a dictionary-like object, if the function to be invoked has been defined restricting that some of its arguments can not be provided as named arguments (that odd feature that I described here).

Monday, 20 January 2025

Python Generator with Calculated Values

I frequently take a look into the Python discussion forum, mainly to the "ideas" section, where people propose things they would like to see in the language or in libraries. Though sadly most of the time most requests for additions to the language are promptly rejected (sometimes with absurd reaons, as if given by people that do not use other modern languages...) the proposed alternatives and the ideas themselves can be pretty interesting. Sometime ago I came across an idea that I was not immediatelly grasping, Literal syntax for generatos.

We know generators generate values lazily, as requested. But if we had a literal syntax we would be already providing all values, so why not just use a list? Well, because the idea would be supporting that some values are generated by functions, that get invoked when the iteration requests it. Well, the guy showed the workaround that he's using, and it's pretty, pretty, interesting:


queries: Iterator[str] = (lambda: (
    (yield "abc"),
    (yield t if (t := expensive_func()) else None),
    (yield "something else"),
))()

Honestly it took me quite a while to understand what's going on there. So he's using an immediatelly invokable lambda with an expression containing multiple expressions (this is the trick that I explained some time ago), where each subexpression is yielding a value. In Python yield can be used both as a statement and as an expression. The clear case for the latter is when we write a = yield "whatever", for using with the send(value) method). In this example, for the yield to be considered as an expression it has to be wrapped with parentheses.

It looks pretty nice, and it would have never occurred to me. The alternative that I could have envisioned would be a generator factory like this:


def lazy_gen(values: Iterable) -> Generator:
    """
    returns a Generator object
    """
    def gen_fn_wrapper():
        for it in values:
            yield it() if callable(it) else it
    return gen_fn_wrapper()

Indeed, maybe this approach is easier to understand. Let's see an example:


def expensive_format(msg: str):
    sleep(2)
    return msg.upper()

print("started")
messages: Generator[str, None, None] = (lambda: (
    (yield "abc"),
    (yield expensive_format("my message")),
    (yield expensive_format("my message 2")),
    (yield "something else"),
))()

for message in messages:
    print(message)

print("- second approach")

messages = lazy_gen([
    "abc", 
    lambda: expensive_format("my message"),
    partial(expensive_format, "my message 2"),
    "something else",
])
for message in messages:
    print(message)


There's another approach that they mention in the discussion thread and that uses a trick that I find rather confusing. We can use a decorator to immediatelly invoke a function that we've just defined. This means that we can define a normal generator function that contains statements and invoke it to obtain a generator object that will get assigned to a variable with the name of the function that we've just decorated. As I've said, it feels terribly confusing, but I'll include it here for the sake of completion. I have to say that defining the decorator inline with a lambda feels pretty cool :-) Let's see:



@lambda x: x() # Or @operator.call
def messages():
    yield "abc"
    yield expensive_format("my message")
    yield expensive_format("my message 2")
    yield "something else"

for message in messages:
    print(message)          


yield expressions are, well, expressions, which means that in principle we can use them anywhere an expression can be used, for example as argument to a function. I mean


def f1():
    yield "a"
    print(str((yield "b")))
    yield "d"
  

gen = f1()
print(next(gen))
#'a'
print(next(gen))
#'b'
print(gen.send(11))
#11
#'d'


Wednesday, 15 January 2025

As Bestas

I had pending to watch As Bestas, a French-Spanish (Galician) film since quite a while ago. I wanted to watch it with my parents, particularly with my Galician father, but as it runs for more than 2 hours it was not easy to find the appropriate moment. This Christmas Eve has been the time. I didn't know that it was based on a true story that happened also in rural Galicia, to a Dutch couple that settled there, between 1996 and 2014. The film is particularly appealing to me cause though I am and feel so deeply Asturian, France is my second country, my second identity, and Galicia is the land of all my paternal ancestors, so I consider it my third country and identity. Dialogs in the film are in French, Galician and some Spanish I think also. It felt odd to me that I could understand better the parts in French than those in Galician, you can never disappoint your ancestors enough...

So in the film we have a French couple of "neo-rurals" that have settled somewhere in the rural interior of Galicia. These are not "digital nomads" teleworking from a small paradise, but 2 hard worker idealists that want to make a living working the land (in a traditional way it seems). From the beginning we see that they do not get on well with some of their neighbours, 2 of them in particular, 2 brothers. At first we can think that this hostility, this distrust, is because of some sort of xenophobia, which would be paradoxal in a land, Galicia, that in spite of its wild natural wealth has never managed to feed all its children, making Galicians (even more than Asturians) one of the most migrant people in the world (you can find Galicians anywhere, indeed the joke says that North Americans came across one Galician in the Moon when they landed there).

The permanent tension between the French couple and these 2 brothers, that bully them and keep a continuous threatening attitude to them creates a thick and oppressive atmosphere, accentuated by the Galician landscape and the backward feeling of the village. The "bar" where good part of the interaction between the French man and the 2 brothers takes place feels like taken out from another century. This reminds me of one discussion with one friend of mine (also an Asturian with Galician ancestry) when he told me about how poor and less developed some parts of rural Galicia feel when compared to their Asturian counterparts. It's strange, cause on the other side, Galician cities (particularly Vigo), that is the part of Galicia that I know, feel way more cosmopolitan and developed than Asturian cities.

As the film evolves, the reason for the resentment that the 2 brothers express for the French couple unveils. The couple were some of the main opponents that refused selling their lands to an energy company that wanted to set up an Eolian farm in the village, and needed for it the lands of all the villagers, so this refusal prevented other villagers from selling. We arrive then to the most intense moment of the film, the discussion where the main brother explains full of bitterness and hatred how his miserable life would have changed if he had sold his lands and left the village for the city. It's a point where you end up empathizing with a character that up to that point had been profoundly revolting.

Well, I think that's all, I can not tell more without fully spoiling the film. Just reserve 2 hours of your life (OK, a bit more if you have to search the torrent and download it) and watch it.

Sunday, 5 January 2025

Closing Javascript Generators

In this previous post about closing Python generators I mentioned that JavaScript generators had a similar feature that would deserve a post on its own, so here it is.

JavaScript generators have a return() method. We can think of it as partially equivalent to Python's close() method. This is so for the simple (and main I guess) use cases that I explained in my previous post (use it as a replacement for break, and when you are passing the generator around to other methods and one of them can decide to close it). For example:


function* citiesGen() {
    yield "Paris";
    yield "Porto";
    return "Europe";
}

// using .return() rather than break
let cities = citiesGen();
for (let city of cities) {
    if (city == "Porto") {
        cities.return();
        console.log("closing generator");
    }
    console.log(city);
}
/*
Paris
closing generator
Porto
*/


Then we have the more advanced cases, for which indeed finding a use case seems not so apparent to me. Here it's where the differences with Python's close() are important. JavaScript return() accepts a value, that will be returned as part of the value-done pair returned when the generator is finished. This "when it's finished" is key, as a try-finally in the generator code can prevent the return() call from finishing the generator in that call. It will continue to produce values as instructed from the finally part, and once completed will return the value that we had passed in the return() call. The theory:

The return() method, when called, can be seen as if a return value; statement is inserted in the generator's body at the current suspended position, where value is the value passed to the return() method. Therefore, in a typical flow, calling return(value) will return { done: true, value: value }. However, if the yield expression is wrapped in a try...finally block, the control flow doesn't exit the function body, but proceeds to the finally block instead. In this case, the value returned may be different, and done may even be false, if there are more yield expressions within the finally block.

And the practice:




function* citiesGen2() {
    yield "Paris";
    try {
        yield "Lyon";
        yield "Porto";
        return "Stockholm";
    }
    finally {
        yield "Lisbon";
        yield "Berlin";
    }
}

cities = citiesGen2();
console.log(cities.next());
console.log(cities.next());
console.log(cities.return("Over"));
console.log(cities.next());
console.log(cities.next());
console.log(cities.next());


// { value: 'Paris', done: false }
// { value: 'Lyon', done: false }
// { value: 'Lisbon', done: false }
// { value: 'Berlin', done: false }
// { value: 'Over', done: true }
// { value: undefined, done: true }


If this feels odd to you, you're not alone :-D. This is quite different from Python, where a call to close() always finishes the generator, even if we are catching the Exception and returning something from it.

generator.close()

Raises a GeneratorExit at the point where the generator function was paused. If the generator function catches the exception and returns a value, this value is returned from close(). If the generator function is already closed, or raises GeneratorExit (by not catching the exception), close() returns None. If the generator yields a value, a RuntimeError is raised. If the generator raises any other exception, it is propagated to the caller. If the generator has already exited due to an exception or normal exit, close() returns None and has no other effect.

Same as Python, JavaScript generators also have a throw() method, and again, I see no much use for it.