Sunday, 24 May 2026

Context Managers Part 1

Context Managers have existed in Python since version 2.5, while Assignment Expressions (walrus operator) were added in version 3.8. Somehow recently I came up to wondering if we can replace the "as" by a ":=" assignment. I mean, can we do this?:


with it := MyContextManager():
	# do whatever with it

rather than this:


with MyContextManager() as it:
	# do whatever with it

The answer is NO, or well, more accurately, sometimes yes, sometimes no, but you should always avoid it. To understand this we have to review what a Context Manager is and how they work. Notice that they're part of a broader concept: Automatic Resource Management that also includes Garbage Collecion and RAII (Resource Acquisition Is Initialization).

A Python Context Manager handles the setup and cleanup of resources in your programs. A context manager is any object that implements:


__enter__(self)
__exit__(self, exc_type, exc_value, traceback)

And it's used like this:


with EXPR as target:
    BODY

And now the important part. Conceptually, Python does something roughly like this (as explained by a GPT):


resource_manager = EXPR
resource = resource_manager.__enter__()
try:
    target = resource
    BODY
finally:
    manager.__exit__(...)

So the key point is: The object used to manage the context and the object bound after as do not have to be the same object. That is exactly why __enter__() is allowed to return anything.

Some Context Managers are implemented so that __enter__ returns the context manager itself, while others return a different object. Basically, in the first case the resource being managed and the Resource Manager (Context Manager) are the same object, in the second case they are different as the management responsability has been moved away from the resource itself, to a different object.

Another interesting topic. We know that in Python a single with block can include multiple context managers. I mean:


with open('a.txt', 'r') as fr, open('b.txt', 'w') as fw:
    do_something(fr, fw)

I was wondering if cases where the second context manager makes use of the first context manager, and that I think is more common to find written like this:


with ContextManager1("aaa") as ctx1:
	with ContextManager2(ctx1) as ctx2:
    		do_something(ctx1, ctx2)

Could be written with a single with (avoiding the additional nesting level):


with ContextManager1("aaa") as ctx1, ContextManager2(ctx1) as ctx2:
	do_something(ctx1, ctx2)

The answer is YES. The first

Python evaluates multiple context managers in a single with statement sequentially from left to right. The moment the first context manager is entered, its return value is bound to the as variable, making it immediately available for the next context manager on the same line.

Sunday, 17 May 2026

Type Hints Notes 2026

Type hints are more and more prevalent in recent Python code. I'm still not too severe about them, but my level of strictness continues to grow over time. I've lately learnt a couple of things:

Variadic Parameters. When typing functions that have packed (variadic) parameters in its signature (*args, **kwargs), we put the type of the individual parameter, we don't have to type it as a collection or dictionary (save if each parameter is really a collection), I mean, for a pipe function we should do:


# this is RIGHT
def pipe(val: Any, *fns: Callable) -> Any:

# this is WRONG
def pipe(val: Any, *fns: list[Callable]) -> Any:

Tuples. In Python we use tuples for "groups" of a fixed number of elements, a pair, a trio... We express it in the signature like this: tuple[str, str] or tuple[int, str, str]... But how to express that a function returns (or receives) a "group" of an unknown number of elements? We also use tuples, combined with ellipsis (...), like this:


tuple[int, ...]         # any number of ints: (), (1,), (1, 2, 99), ...
tuple[int | str, ...]   # any number of elements, where each element can be an int or a str (), ("a", 1), ("a", "b", "c"), (1, 2, 1) ...
  
tuple[int, str, bool]   # exactly 3 elements: an int, a str, a bool
  

An important detail that I've learnt thanks to a typing issue. We know that in a Python try-except block, the except clause can manage multiple exception types, I mean: except RuntimeError, TypeError, NameError:. Those multiple exceptions are a tuple, not just any iterable. Let's see an example (the last line is what is WRONG):


# multiple exceptions
def multiple_exceptions(exceptions: tuple[type[Exception], ...]) -> None:
    try:
        raise ValueError("This is a ValueError")
    except exceptions as e:
        print(f"Caught an exception: {e}")

multiple_exceptions((ValueError, TypeError))  
# Caught an exception: ValueError

# important, this is WRONG, we have to pass a tuple, not just any collection
multiple_exceptions([ValueError, TypeError])
# TypeError: catching classes that do not inherit from BaseException is not allowed


Indeed, an equivalent function with a variadic signature feels more natural and idiomatic than the above (and furthermore prevents the confusion of passing over any collection rather than exactly a tuple):


# this variadic signature feels more natural
def multiple_exceptions2(*exceptions: type[Exception]) -> None:
    if not exceptions:
        raise ValueError("pass at least one exception type")
    try:
        raise ValueError("This is a ValueError")
    except exceptions as e:
        print(f"Caught an exception: {e}")

multiple_exceptions2(ValueError, TypeError) 


And notice also how I've added a guard against the empty-call case (except () is invalid at runtime).

Sunday, 10 May 2026

Python partial and placeholders

When Python 3.14 was released I had already read about some of its main features (those that involve a PEP and that have been discussed in the Python discussion forums), like Lazy Annotations and Template Strings. When reading in depth recently the release notes I came across a small feature added to functools.partial (and partialmethod) that I find particularly useful:

functools:
Add the Placeholder sentinel. This may be used with the partial() or partialmethod() functions to reserve a place for positional arguments in the returned partial object. (Contributed by Dominykas Grigonis in gh-119127.)

Just a reminder of what partial function application is (don't confuse it with the related concept of curried functions):

In computer science, partial application (or partial function application) refers to the process of fixing a number of arguments of a function, producing another function of smaller arity.

Indeed I already talked about functools.partial some time ago

The "basic" approach to partial function application is that we can just fix (pre-fill) arguments from left to right. This is what we have also in JavaScript with function.prototype.bind (that binds as first argument the "this" value). As Python supports named arguments, functools.partial already supported fixing named arguments.


def format_geo_info(country, region, city, population):
    return f"{city}, {region.upper()} ({country}) - {population}"
    
bound_format = functools.partial(format_geo_info, "France")
print(bound_format("Occitanie", "Toulouse", 500_000))
# Toulouse, OCCITANIE (France) - 500000
print(bound_format("Occitanie", city="Toulouse", population=500_000))
# Toulouse, OCCITANIE (France) - 500000


What was not possible until this version was fixing some intermediate non-named argument, but this is possible since version 3.14 thanks to the Placehodler sentinel value:



format_french_city_with_unknown_population = partial(format_geo_info, "France", Placeholder, Placeholder, 0)
print(format_french_city_with_unknown_population("Ile de France", "Saint Denis"))
# Saint Denis, ILE DE FRANCE (France) - 0

Not a revolutionary feature, but one that I've missed occasionally. A trivial implementation could be something like this:


# supports positional and keyword arguments, but not placeholders
def my_basic_partial(func, *args, **kwargs):
    return lambda *fargs, **fkwargs: func(*args, *fargs, **(kwargs | fkwargs))
    
# add support for placeholders in the arguments
PLACEHOLDER = object()
def my_complete_partial(func, *args, **kwargs):
    def new_func(*fargs, **fkwargs):
        merged_args = []
        fargs_iter = iter(fargs)
        for arg in args:
            if arg is PLACEHOLDER:
                merged_args.append(next(fargs_iter))
            else:
                merged_args.append(arg)
        merged_args.extend(fargs_iter)
        return func(*merged_args, **(kwargs | fkwargs))
    return new_func

format_french_city_with_unknown_population = my_complete_partial(format_geo_info, "France", PLACEHOLDER, PLACEHOLDER, 0)
print(format_french_city_with_unknown_population("Ile de France", "Saint Denis"))
# Saint Denis, ILE DE FRANCE (France) - 0

format_2 = my_complete_partial(format_geo_info, "France", city="Toulouse")
print(format_2("Occitanie", population=500_000))
# Toulouse, OCCITANIE (France) - 500000


In my aforementioned previous post about partial in Python I gave some reasons for using partial over directly trapping the variables with a closure (of course internally partial has to use either closures or a callable class). I've just realised that I was missing the main reason, partial is more semantic.

- Intent-Revealing Code: partial(func, arg) explicitly states your intent to partially apply arguments, improving readability and self-documentation. - Declarative Style: It focuses on the result (a new specialized function) rather than the imperative mechanics of capturing lexical scope.

Lodash, the excellent JavaScript library, also features placeholders in its implemention of partial.

Sunday, 3 May 2026

Glibc

Compiling to native code, and furthermore for a Linux system... wow, sounds scary, and very, very far away from what I've been doing in the last decade(s). Well, the thing is that my employer decided some months ago that we had to compile to native code some of our Python applications. It's not something performance related, it's for preventing access to the source code of these applications. We were looking into cython, but we settled on Nuitka, an amazing piece of software that has been serving us so well.

Normally almost every native application compiled for a Linux system has been dynamically linked against glibc. OK, and, what's glibc?

The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages).

So when a Linux native application (using glibc) starts, the dynamic linker (libdl.so) will dynamically load the shared objects (SO, .so files, the equivalent to windows DLL's) needed by the application (like glibc.so) and link the callsites to the functions imported from those libraries.

Obviously glibc evolves over time, so, what about versions? First, what glibc version is installed on my system? You can check the SO's loaded by a running process by doing: lsof -p PID | grep .so. Normally you'll see that it's using: libc.so.6 (in Ubuntu it located here: /usr/lib/x86_64-linux-gnu/libc.so.6). That 6 is not the version number (libc.so.6 is the name for the library since 1997!), the version number is something like 2.XX (2.39 in my ubuntu 24.04). You find it by using: ldd --version

So, what happens if I compile my application in a system with one version of glibc and try to run it in a system with a different version? Well, the situation is quite more fine-grained that I thought. Version numbers are not checked at the glibc level, but at the function level. This is so because glibc uses symbolic versioning

The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. This allows for extreme backward compatibility without breaking the system every time a single function is updated.
    
    The primary goal of symbol versioning is backward compatibility. It allows a single library file to provide multiple versions of the same function so that:

    Old binaries compiled against v2.10 continue to use the v2.10 implementation.

    New binaries compiled against v2.11 use the new v2.11 implementation.

So multiple versions of the same function live inside glibc, and your binary will dynamically link against the one it was compiled for. And, when does the version number of a function change? Normally it only changes if the function interface (the contract, the ABI) changes, but not if its only its internal implementation that changes. So if we compare symbolic versioning to semantic versioning (SemVer, a more familiar versioning schema), we could say that in symbolic versioning a version change corresponds to a Major version in semantic versioning.

You are exactly right: A new Symbol Version is functionally equivalent to a Major Version bump for that specific function. It signals to the linker that the "Contract" for that specific symbol has changed, and old programs should look for the previous contract elsewhere in the same file.

Notice how symbolic versioning is used for functions inside a library, while semantic versioning (when used) is normally used for libraries.

The glibc version (that one obtained with ldd --version) has no importance in terms of loading the library in memory (the dynamic linker will load libc.so.6 regardless of its "internal" version), the important part is the specific version of each function that we try to link.

I guess when you program in C you are aware of the version of each function that you are using, as you have to adapt your code to the ABI of the function if it has changed, but when that happens behind the scenes, that's quite different. In our case, we just write Python code, and the beautiful Nuitka takes care of transforming it to C and then compiling it to native. So it's Nuitka who takes care of writing the C code in accordance to the function versions inside the glibc in the system. So if then you run that binary in a system with an older glibc version it could happen that your binary is "pointing" to a function with a symbolic version (let's say openEncryptedFile@GLIBC_2.12) higher than the one in the older glibc (let's say openEncryptedFile@GLIBC_2.10) present in the current system, and your application will crash. Basically this means that you have to compile your Python application in a system with a glibc version <= that the glibc version in the target system. It feels odd at first, as the starting point is just the same Python code, and if in one system it can just use openEncryptedFile@GLIBC_2.10 why doesn't it compile it always with that 2.10 even if a bigger version (openEncryptedFile@GLIBC_2.12) is present? Well, that's how things work by default, when compiling, code will be linked to the highest version of that function present in the glibc in the compilation machine.

If you wonder if other .so libraries (SO, ELF libraries) also use symbolic versioning, it depends. For smaller, simpler libraries what is usually used is the SONAME approach, the library (.so file) name changes with each version (this is a coarse grained approach).

Symbolic versioning is the technically superior approach, but it is not the universal standard for all ELF libraries. It depends entirely on the library maintainers and their commitment to long-term ABI stability.

In the Linux ecosystem, there are two primary ways to manage library changes:  
1. The "SONAME" Approach (Common)

Most smaller or simpler libraries use the SONAME mechanism. 
You’ve likely seen files like libfoo.so.1 and libfoo.so.2.

    The Logic: If the developers change the interface, they increment the "Major" version number in the filename itself.  

    The Result: Programs linked against libfoo.so.1 will refuse to start 
    if only libfoo.so.2 is present. This is a "heavy-handed" fix because 
    it requires recompiling every program that uses the library even if 
    the specific function they use didn't actually change.

2. The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), 
    but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. 
    This allows for extreme backward compatibility without breaking the system every time a single function is updated.

To complete this post, I'll add some useful, related commands:

  • To check the SO's used by a given program.
    For a binary on disk: ldd /usr/bin/program_name
    For a running process: lsof -p [PID] | grep '\.so'
  • To view the symbols used by a program (the specific functions imported from SO's)
    All imported symbols: nm -Du
    Symbols + Versions: objdump -T | grep '*UND*'
    Only glibc symbols: objdump -T | grep 'GLIBC_'
    Library Version Map: readelf -V
  • To view the symbols/functions exported by glibc in your system: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6

Some additional findings related to the last command. For example I want to see the versions of pthread_spin_init present in my glibc: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6 | grep pthread_spin_init

That gives me:

0000000000a4130 g DF .text 000000000000000d GLIBC_2.34 pthread_spin_init 00000000000a4130 g DF .text 000000000000000d (GLIBC_2.2.5) pthread_spin_ini

Which is very interestinng as it shows us that a symbol version is not a sequential counter for that specific function. Instead, it is a timestamp or a marker of the glibc release that defined that specific version of the function's ABI. From a GPT:

How glibc handles ABI changes with symbol versioning

Original version: Suppose foo() was introduced in GLIBC_2.2.5. That version is tagged as foo@GLIBC_2.2.5.

ABI change in glibc 2.32: If glibc developers change the ABI of foo() in version 2.32 (e.g., change its behavior, arguments, or return type in a way that breaks compatibility), they will:

Keep the old implementation as foo@GLIBC_2.2.5.
Add a new implementation as foo@GLIBC_2.32.

At runtime:

A binary linked against glibc 2.2.5 will request foo@GLIBC_2.2.5, and the dynamic linker will resolve it to the old implementation.
A binary linked against glibc 2.32 will request foo@GLIBC_2.32, and get the new implementation.

This mechanism ensures backward compatibility while allowing glibc to evolve.

Friday, 17 April 2026

Python Sentinel Values

Every now and then we need a flag or sentinel value. A unique value that we can distinguish from the normal values that we are processing and that has a particular meaning. a special, unique value used in programming to signal the end of data processing, a loop, or an operation. For example in my recent post about adding null-safety to a pipe function I was using 2 sentinels/flags: NULL_SAFE and COALESCE.

The essential function a sentinel value has to accomplish is to have a unique identity, so that comparing it by identity (Python: is, JavaScript: ===) with any other value/object in our system has to return false. So the most simple approach is just using a new object for each of our sentinels.



NO_INVEST = object()  # Sentinel value

def invest(amount: int | None | object) -> str:
	if amount is NO_INVEST:
		return "No investment"
	else:
		amount = amount or 0
		return f"we've invested {amount}"

print(invest(NO_INVEST))  # Output: No investment
print(NO_INVEST)  # Output: object object at 0x...


That simple approach works fine, but it's missing a few things. Printing the value is messy (we get an "object object at 0x..." representation, it would be nice to get NO_INVEST) and its rather typing unfriendly. Saying that invest can receive an object, apart from int or None lacks any meaning. What kind of object is that?. A sentinel value should (mainly) have these features:

  • A unique identity (is comparison)
  • A meaningful repr (debuggability)
  • Clear typing (especially for static type checkers)

Our basic sentinel only provides the first one (identity). That's why some smart guy came up with a very interesting PEP 661 proposing a new Sentinel class. Unfortunately that PEP is in deferred status. The document also presents different techniques commonly used for Sentinel values, like the simple object() that I've just shown, using an enum or using a class. Using a class is the best approach to me, I'll show several iterations until getting what I think is the best we can get so far.

Approach 1. Classes are objects in Python. We can use a class for each sentinel object, and in order to get a nice representation we can give it a metaclass with a custom __repr__. The missing piece is having some typing friendliness, we're still stuck with the Any signature. Additionally, declaring a class for something that is not intended to work as an object factory, but to be used as an object in itself is rather unnatural.



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
        
    class Sentinel(metaclass=SentinelMeta):
        pass
        
    class NO_INVEST(Sentinel): pass


    #def invest(value: int | None | Type[NO_INVEST]) -> str: # this signature feels a bit strange, but it works
    #def invest(value: int | None |Literal[NO_INVEST]) -> str: # this one feels better, but only works with mypy, not with pylance
    def invest(amount: int | None | Any) -> str:
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"

    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel in match statement
    print(NO_INVEST)  # Output: NO_INVEST

Approach 2. We can make the usage quite more natural by hiding the class creation behind a function. That also allows us to skip the Sentinel base class.



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__

    def sentinel(name: str):
        return SentinelMeta(name, (), {})
    
    NO_INVEST = sentinel("NO_INVEST")


    #def invest(value: int | Type[NO_INVEST]) -> str: # pylance doesn't like it, using a variable as type
    #def invest(value: int | Sentinel) -> str: # we don't have a Sentinel class... so forget it
    def invest(amount: int | None | Any) -> str:
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"
        
    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel
    print(NO_INVEST)  # Output: NO_INVEST


This one feels quite natural to use, but we still have the problem with typing. We can leverage the Generic types/class subscripting that we saw in my previous post for getting something like this (Approach 3)



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
    
    class Sentinel(metaclass=SentinelMeta):      
        def __class_getitem__(cls, item):
                return cls
    
    def sentinel(name: str):
        return SentinelMeta(name, (Sentinel,), {})
    
    NO_INVEST = sentinel("NO_INVEST")


    #def invest(value: int | Sentinel) -> str:
    def invest(amount: int | None | Sentinel[NO_INVEST]) -> str: # at a typing level it's equivalent to the above, but it's provides extra meaining      
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"

    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel      
    print(NO_INVEST)  # Output: NO_INVEST
	

Hey, this one feels rather good to me. The Sentinel[NO_INVEST] looks pretty nice in that signature. The type checking is mainly off, cause our sentinel() function is not typed, so it's considered as returning Any, and when we pass Any to a function it disables type checking. This means that for the type system any sentinel that we create with sentinel() is just an Any, so the type checker will allow passing any sentinel to a function that expects a specific sentinel. It's not a problem for me, what I'm mainly interested in is the semantics, the clarity, that this type annotation provides to the function signature.

Given that we are using Sentinel classes as objects, not as object factories, we can make these classes to be more object like, by preventing instantiation. Additionally, Sentinels do not have attributes and are not intended to be expanded with attributes, so we can prevent them from getting attributes added dynamically. All in all we get:



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
        
        # prevent sentinel classes from being instantiated
        def __call__(cls, *args, **kwargs):
            raise TypeError(f"{cls.__name__} is a sentinel and cannot be instantiated") 
        
        # prevent sentinel classes from being modified
        def __setattr__(cls, name, value):
            raise AttributeError(f"Cannot modify sentinel {cls.__name__}")


    class Sentinel(metaclass=SentinelMeta):      
        def __class_getitem__(cls, item):
                return cls

    
    def sentinel(name: str):
        return SentinelMeta(name, (Sentinel,), {})

    
    NO_INVEST = sentinel("NO_INVEST")


There's something missing in this implementation, support for our sentinels to be pickled (particularly important if we plan to use them in multiprocessing scenarios). That complicates the design and I've deliberately left it aside for the moment. Maybe we'll see it in another post.

Thursday, 9 April 2026

Python Type Checkers and Type Expressions

In this previous post I explained how Python allows the usage of any object, not just type objects, for its annotations "Annotations can be any valid Python expression". Annotations are used to provide metadata, and while normally that metadata is just typing information, we can provide any sort of metadata for custom use at runtime (and as we saw we have the Annotated mechanism to combine both typing and custom metadata. While doing some tests at that time I noticed that VS Code (the Pylance extension) would warn (with: "Call expression not allowed in type expression" or "Variable not allowed in type expression") against using annotations like this (that as I've said is perfectly valid):


# metadata for parameters
@dataclass
class ValueRange:
    lo: int
    hi: int

# pylance warning: Call expression not allowed in type expression
def create_post_1(
    title: ValueRange(5, 20), 
    content: ValueRange(5, 100),
) -> dict:
    return {"title": title, "content": content}


val_range = ValueRange(1, 10)
# pylance warning: Variable not allowed in type expression
def fn2(a: val_range) -> None:
    pass
    
# but it's stored OK in the function annotations 
print(annotationlib.get_annotations(fn2))
# {'a': ValueRange(lo=1, hi=10), 'return': None}


So if this works fine at runtime (you can see that the annotation is stored along with the function) why pylance warns against it? Because we have to differentiate what is valid for runtime use vs what is valid at type checking time. From a GPT:

The runtime accepts arbitrary expressions. Static type checkers do not. This is so because static typing tools parse Python, but they don’t execute it, and they don’t compile it to bytecode either. Type checkers rely on syntactic patterns, not runtime behavior

This is something I had never thought about before. Python type checkers analyze your code without executing anything (they are static). So the types in the type annotations that they are going to analyze have to be expressed in a direct, static form, not as the result of executing an expression (as they are not going to execute that expression). The different Python type checkers (mypy, Pylance, pyre...) parse python source code into an AST (normally different from CPython AST) and analyze it, they do not run code, indeed they do not even compile the code to bytecodes to create code objects. That's why they can only work with type expressions (annotation expressions), that follow a specific syntax, not with any expression. From here: Note that while annotation expressions are the only expressions valid as type annotations in the type system, the Python language itself makes no such restriction: any expression is allowed.

Type checkers operate on expressions that syntactically denote types, which basically is a type name or a generic type. And this has sparked my curiosity about how generic types (MyClass[T]) work. For the type checker it's simple, as it does not execute anything, it just has to parse that particular syntax. But what's the runtime meaning of such generic expression?

Well, it's subscripted access to an object (to a class). When we do my_instance["x"] this searches for a __getitem__ method in my_instance's type. So MyClass["x"] should just search __getitem__ in MyClass's type (that is, its metaclass). That's correct, but given that Python designers have always considered metaclasses like particularly complex and/or exotic, they decided to introduce (via PEP560) a hook to make easier to implement subscripted access to classes. Rather than having to define a metaclass for MyClass, we can directly define a __class_getitem__ method in MyClass.

Sunday, 29 March 2026

Pipe Operator and Null Safety

I've talked a couple of times [1] and [2] about how beautiful it's having a pipe operator in a language, though it's not particular common, and ways to simulate it in Python. Having a pipe operator makes applying functions to a value as convenient as chaining methods. When chaining methods we can leverage (if available) the safe navigation/optional chaining/elvis (?.) operator, to deal with null values. So, I've been thinking about null safety and pipes (not applying a function if the value is null, and coalescing to a default value).

In my previous post I mentioned that JavaScript had 2 different proposals for a pipe operator, but one of them has been discarded. I've been checking if this proposal includes null safety and the answer is not. It was discussed in the early stages, apart from the normal |> operator, having an additional ?|> operator for null safe cases, but it was discarded


// not null-safe, active proposal
user
  |> getProfile(%)
  |> formatProfile(%)
  
// null-safe, has been discarded
value ?|> fn
value |> fn ?? default

It was rejected on the basis that Pipelines should be pure syntax for data flow, not control flow.

To my surprise (I was not aware php continues to be used and evolve) PHP has recently added a pipe operator to the language, and for the moment it also lacks a null-safe version.

For Python decision makers adding a pipe operator seems "making the language too complex for beginners"... (you can't imagine how much I hate that so common kind of "pythonic" reflections...), but as I explained in my previous post we can easily add a pipe function that makes the trick (what has also been requested multiple times is adding such kind of function to functools, but no luck so far). An implementation is so simple as this:



def pipe(val: Any, *fns: Callable[[Any], Any]) -> Any:
    """
    pipes function calls over an initial value
    """
    def _call(val, fn):
        return fn(val)
    return functools.reduce(_call, fns, val)


And we can use it like this:



@dataclass
class Post:
    id: str
    title: str
    author: str

def get_post(post_id: str) -> Post | None:
    # simulate a function that may return None
    if post_id == "1":
        return Post(id="1", title="First post", author="1")
    else:
        return None

def get_address(person_id: str) -> str | None:
    # simulate a function that may return None
    if person_id == "1":
        return "Rue de La Nation, Paris"
    else:
        return None

pipe("1",
    get_post,
    lambda post: get_address(post.author),
	str.upper,
    print,
)	

# RUE DE LA NATION, PARIS


Creating a null aware equivalent is quite simple. The idea I came up with is having pipe accept not just a sequence of callables, but a sequence of callables or flag and callable or flag and value, with the flag indicating the we have to check for null before applying the Callable, or that we have to coalesce it to a value. Let's see the code:



// sentinel values
NULL_SAFE = object()
COALESCE = object()

def pipe(val: Any, *steps: Callable[[Any], Any] | tuple[Any, Callable[[Any], Any] | Any]) -> Any:
    """
    pipes function calls over an initial value, with support for null safety and coalescing:
    """
    def _call(val, step: Callable[[Any], Any] | tuple[Any, Callable[[Any], Any] | Any]) -> Any:
        if callable(step):
            return step(val)
        else:
            option = step[0]
            if option is NULL_SAFE:
                fn = step[1]
                return None if val is None else fn(val)
                
            elif option is COALESCE:
                default_val = step[1]
                return default_val if val is None else val
            else:
                raise ValueError(f"Invalid option: {option}")
    
    return functools.reduce(_call, steps, val)

pipe2("2",
    (NULL_SAFE, get_post),
    (NULL_SAFE, lambda post: get_address(post.author)),
    (COALESCE, "Not found"),
    str.upper,
    print,
)

# NOT FOUND


The function is quite minimal. We should add it proper error handing, throwing meaningful exceptions for each potential incorrect usage. You can just ask a GPT to add it and you'll end up with something like this:


def pipe(val: Any, *steps: Union[Callable[[Any], Any], Tuple[object, Any]]) -> Any:
    """
    Pipe value through callables or option-tuples.
    Steps can be:
      - a callable: called as fn(acc)
      - null_safe(fn): tuple (NULL_SAFE, fn) — only call fn if acc is not None
      - coalesce(default): tuple (COALESCE, default) — replace None with default

    Raises TypeError or ValueError for invalid steps.
    """
    def _call(val: Any, step: Union[Callable[[Any], Any], Tuple[object, Any]]) -> Any:
        if callable(step):
            return step(val)

        if not (isinstance(step, tuple) and len(step) == 2):
            raise TypeError("pipe2 steps must be callables or 2-tuples from null_safe/coalesce")

        option, payload = step
        if option is NULL_SAFE:
            if val is None:
                return None
            if not callable(payload):
                raise TypeError("NULL_SAFE payload must be callable")
            return payload(val)

        if option is COALESCE:
            default = payload
            return default if val is None else val

        raise ValueError(f"Unknown pipe2 option: {option!r}")

    return functools.reduce(_call, steps, val)


Friday, 20 March 2026

Python Annotated

In Python there is this common mantra that type annotations (type hints) do not have any runtime effect. Well, that's mainly true, as those type hints are not used by the runtime to check if your type assumptions/restrictions are correct (as the documentation says: "The Python runtime does not enforce function and variable type annotations. "). But on the other hand, this type information exists at runtime (only that the runtime itself does not use it). Until recently you would use inspect.get_annotations to get that info, a dictionary that gets stored in the __annotations__ attribute (notice that Annotations have been improved in Python 3.14, they're now evaluated lazily, and you should use now the annotationlib.get_annotions). So that information is available at runtime, and your custom code can make use of it for whatever it feels fit.

Additionally, the type hints/annotations syntax allows us to use any object as an Annotation, not just a type. Indeed Python’s grammar does not restrict the content of annotation expressions, as PEP 3107 explicitly states: "Annotations can be any valid Python expression". Put another way: Python does allow arbitrary expressions (hence returning anything) in type annotations. This means that you can use the syntax for defining information (metadata) about these parameters, information that then is used by your custom code for something other that type-checking (for example, stating that a function expects a string following a certain pattern, let's say: create_user(msg: r"^[0-9a-f]{32}$")). That's very nice, but most likely you'll want to combine both the typing information and the extra metadata information. With that in mind, the Annotated class (Annotated[T, x]) was introduced some versions ago. T is a Type, and type-checkers understand the Annotated class and just take care of the T part. The x part is for metadata, that can be any object, and that will be used by your custom code at runtime. Indeed, that x can be multiple values, not just one, I mean: Annotated[str, ValueRange(10, 20), Complexity("high")].

Apart from metadata that applies to the parameters we can have metadata that applies to the function itself or to a class ('cacheable', 'optimized', some sort of privacy mechanism, whatever). Normally we'll use a custom decorator that adds this information as an attribute to the function/class (__cached__, __private__). So all in all we have 2 mechanisms to provide metadata. Annotated for parameters, and decorators for classes/functions themselves.



# to add metadata to the class or function itself, just use specific decorators that add metadata to specific attributes, for example:
def non_critical(func):
    func._non_critical = True
    return func

# metadata for parameters
@dataclass
class ValueRange:
    lo: int
    hi: int

@non_critical
def create_post_2(
    title: Annotated[str, ValueRange(5, 20)], 
    content: Annotated[str, ValueRange(5, 100)],
) -> dict:
    return {"title": title, "content": content}

annotations = annotationlib.get_annotations(create_post_2)
print(f"annotations: {annotations}") 
# annotations: {'title': typing.Annotated[str, ValueRange(lo=5, hi=20)], 'content': typing.Annotated[str, ValueRange(lo=5, hi=100)], 'return': class 'dict'}


In other languages like Kotlin/Java we use annotations for both parameters metadata and function/class metadata. It's important to note that while in Python we can provide any object as metadata (both when using Annotated or when using a decorator and passing any expression as argument), in Kotlin/Java annotations metadata is managed at compile time, so you are limited to compile-time constants. This means that in Python we have an enormous power with what we can provide as metadata.

Kotlin and Java annotations cannot take arbitrary runtime objects as parameters. Their allowed values are strictly limited because annotation arguments must be compile‑time constants and the annotation instances themselves are created by the compiler, not at runtime.

The Annotated class (well, indeed what we have are instances of _AnnotatedAlias) has an __origin__ attribute (that points to the the type-hint) and a __metadata__ attribute for the metadata. However, ff we only want to get the typing information we can directly use the typing.get_type_hints functions.


annotations = annotationlib.get_annotations(create_post_2)

print(annotations["title"].__origin__) # to get the original type hint, which is str in this case
# class 'str'>

metadata = {key: value.__metadata__ 
    for key, value in annotations.items() if hasattr(value, "__metadata__")
}
print(f"metadata: {metadata}")
# metadata: {'title': (ValueRange(lo=5, hi=20),), 'content': (ValueRange(lo=5, hi=100),)}

print(f"type hints: {get_type_hints(create_post_2)}")
# type hints: {'title': class 'str', 'content': class 'str', 'return': class 'dict'}

Thursday, 12 March 2026

La Mort de Quentin Deranque, un meurtre raciste / a racist murder

On February 12th, 2026, Quentin Deranque, a 23-year-old French man, was beaten to death (receiving multiple kicks to the head while he lay on the floor) by a far-left, pro-Islam, anti-French militia, a terrorist organization called "la Jeune Garde." Why? Because he was a French patriot, because he loved his country and culture, and because he intended to defend a few French girls belonging to Nemesis, a female organization that tries to raise awareness about the dangers that mass immigration from Muslim countries represents for women's rights and safety.

Of course, most mainstream media (which in France range from far-left to left, except for the excellent CNews) hurried to talk about a "fight" rather than a lynching. When the video of the lynching was made public, they tried to minimize it by claiming he was a far-right militant and an ultra-conservative Catholic. When that failed to silence the scandal, they escalated their lies, calling him a fascist, a racist, and a xenophobe. The far-left political movements that have instigated this violence for decades even dared to call him a "Nazi."

No, he was not any of those things; as I said, he was just a French patriot, a French nationalist. One could also say he was a conservative Catholic. It seems he had turned to Catholicism because of the link he established between French identity and the Catholicism. As someone who, even to this day, is not religious, I can clearly see and embrace that link, and I feel deeply grateful for having grown up in a place where Catholicism forms the basis of our moral system (regardless of whether most people consider themselves religious or not) rather than having grown up in a Muslim society.

Quentin was the child of a French father and a Peruvian mother, and he had mixed European and Amerindian features. Unless he was an illiterate idiot (he was a math student who loved philosophy and reading, so that does not seem to be the case), it is obvious that he could not be a racist and that his French nationalism was not based on a "legacy of blood" but on a "legacy of culture."

The fact that Quentin had partial extra-European origins leads to very interesting reflections. We saw his friends on TV; they were devastated by his assassination, yet they paid him tribute with enormous dignity and emotion. Many of these friends were clearly French nationalists, and for them, Quentin, with his 50% Peruvian ancestry, was just another French comrade. This is interesting for those who try to scare us with lies about French nationalism being inherently racist and xenophobic.

The even more interesting reflection is that I firmly believe Quentin’s death was a racist crime. His assassins, the ten far-left scumbags, all of them "white" and mostly coming from white, French (anti-France) bourgeois families, who beat him to death most likely focused on him because of his extra-European features. There were two other nationalist guys lying on the floor being kicked, but not with such cruelty. Am I saying that these far-left terrorists, who are supposed to fight against fascism and racism, killed him because he was not 100% white? YES, that is exactly what I am saying.

The far-left movement in Europe, and particularly in France, has become a cult of racial obsession. They have fully traded traditional class struggle for the radical, segregationist dogmas of the decolonial and indigenist movements. For these oikophobes —people who despise their own civilization— anyone of non-European descent is viewed strictly as a perpetual victim of a 'white system.' In their eyes, such a person is 'required' to hate their 'oppressors' and reject every facet of French culture. This means that for these lunatics, someone like Quentin, who chose to assimilate and embrace his French heritage, is seen as the ultimate 'traitor' to their narrative. To these self-loathing bourgeois radicals, Quentin should have been a grievance-filled victim, weaponizing his skin tone against the state. Instead, he chose the dignity of belonging to a national community, a history, and a culture. He was a French nationalist by choice and by love, proving their 'systemic' lies wrong. It was that clarity of spirit, that refusal to be a pawn in their racial war, that the far-left found truly intolerable.

Finally, I'll put here a list with the names and information (just taken from some other blogs) about the assasins. First three of the pieces of shit that directly kicked Quentin's head to his death:

Trois militants antifas lyonnais ont été formellement identifiés lors du lynchage du jeune Quentin.
1- Jacques-Élie Favrot (surnom “Jef”).
Assistant parlementaire de Raphaël Arnault, M2 à Sciences Po Saint-Étienne.
Militant à la Jeune Garde Lyon ainsi qu’à OSE CGT (syndicat étudiant de la CGT à Saint-Étienne).
2- Adrian Besseyre
Militant très actif de la Jeune Garde Lyon, né en 2001, il a également effectué un stage à l’Assemblée Nationale pour Raphaël Arnault.
3- Lelio Le Besson
Membre du service d’ordre de la Jeune Garde Lyon, et désormais militant actif de « Génération antifasciste », le mouvement qui a succédé à la JG.
Il a fait ses études à l’IG2E en Gestion des Risques et Traitement des pollutions

Then we have the scumbag that founded the far-left terror group, Raphaël Arnault. In this country in decay called France, a criminal identified as Fiche S (someone considered a serious threat to National Security), already condemned for a violent arbitrary aggression, can get a seat in the National Assembly. This ultra-violent illiterate piece of shit should be considered as one of the "intellectual" authors of this crime.

And then we have LFI, the anti-French, Pro-Islam political sect that has funded and empowered this terrorist group and has been sowing hatred in the country for years. Particularly, the leader of the sect, Melenchon, and its main and most violent and ignorant subordinates: Thomas Portes, Bompart, Rima Hassan, Mathilde Panot and Bilongo. They have minimized and even justified the murder, vomited lie after lie about Quentin (plainly calling him a "nazi") and even made fun of his execution. Hope one day all these traitors and scumbags will rot in hell.

Repose en Paix, Quentin. Les hommes sont morts, mais la dignité est éternelle.

Sunday, 22 February 2026

Python Class-Level Type Hints

Notice that in this post I'm talking about "standard" Python classes, not about dataclasses. I recently became aware of the possibility of using class-level type hints in your classes. The thing is that when reading the documentation I found it rather confusing. To make sense of it we have to be pretty aware of the difference between the intent that we express with those class hints and its runtime effects. So we have this example in the documentation:


class BasicStarship:
    captain: str = 'Picard'               # instance variable with default
    damage: int                           # instance variable without default
    stats: ClassVar[dict[str, int]] = {}  # class variable

The 'damage: int' part is the one that I knew about "class-level typehints" and was clear to me. We declare an attribute and its type, but we don't initialize it. Python takes this just as typing information, it has no runtime impact (other than being added to that class __annotations__), we are not creating an attribute in the class object.

The 'captain: str = 'Picard'' is what I could not understand. For me it's like the normal way of adding a class attribute, only that additionally you indicate the type, so how can it be that the doc says that it's an "instance variable with default". Well, it's the type-checking meaning vs the runtime effect. I am right that we get an attribute created at the class level (in the class __dict__), just see:


>>> class User:
...     continent: str = "Europe"
...     active = True
...

>>> User.__dict__
mappingproxy({'__module__': '__main__', '__firstlineno__': 1, '__annotations__': {'continent': }, 'continent': 'Europe', 'active': True, '__static_attributes__': (), '__dict__': , '__weakref__': , '__doc__': None})

>>> User.continent
'Europe'

>>> User.active
True

But for the type checker what that typed declaration means is that instances of that class will have a captain (or continent in my example) attribute. This could feel contradictory, but given how attribute look up works it's perfectly fine. Initially the 'captain' attribute is created at the class level. If we read it through an instance (my_ship.captain) the look up mechanism won't find it in the instance, but in the class, and return it. Then, when we write to it through an instance (not through the class) the writing will be done in the instance, so a 'captain' attribute will be added to the instance. That's fine, indeed, it's very nice, while the attribute is not being written to, just read, it's being shared between instances, kept in the class (and saving memory), then, as soon as you write to it, it's shadowed by the instance.


s = BasicStarship()
print(s.captain)       # "Picard" via class lookup
s.captain = "Xuan"     # creates an instance attribute
print(s.__dict__)      # {'captain': 'Xuan'}
print(BasicStarship.__dict__['captain'])  # 'Picard'

We can sumarize it like this:

Type hints alone do not create attributes; they only declare intent.
If you want the attribute to exist on the class (and thus be visible via Foo.x), you must assign a default value.

By the way, this is not the first time I see this behaviour of reading values from a "parent object" until we write the value to the object itself, shadowing it. This is just how things work in JavaScript with the [[Prototype]] chain.

I'm not much of a fan of defining instance attributes at the class level. It's true that it makes very explicit that an attribute is part of the public contract of the class, but I think most of the time it's a bit boilerplate. Type-checkers and autocomplete work perfectly fine with the classical style of initializing in the __init__ method, and if an attribute is internal/private and should not be considered part of the public API we should just follow the convention of starting it with '_'. So normally I would write the above code like this:



class AdvancedStarship:
    # stats = {} mypy will complain about this, because it is not a ClassVar
    stats: ClassVar[dict[str, int]] = {}  # class variable
    
    def __init__(self, damage: int, captain: str = 'Picard') -> None:
        self.captain = captain
        self.damage = damage



The case where these class-level type hints feel very useful to me is for Protocols, making unnecessary to declare the "data part" of the protocol with properties (get/set descriptors), that is the approach I used to follow so far.



from typing import Protocol

class Foo(Protocol):
    x: int  # part of the interface

class Bar:
    def __init__(self):
        self.x = 42  # matches Foo


It's also useful if we have attributes that won't be set in __init__, but in some later method call. This way we make them part of the class contract and initialize them to a default value (probably None), shared by all instances via the class attribute (as we saw with BasicStarship.captain), and then get it added to each instance when it gets set to a specific value.

Sunday, 15 February 2026

Logical Assignment Operator and More

I've recently come across the Logical OR Assignment (||=), and the Nullish Coalescing Assignment (??=) operators in JavaScript. They are not a revolution, just a shortcut for the usage of the OR (||) operator and the nullish coalescing operator in assignment situations. We use "||=" for falsy values and "??=" for nullish (null, undefined) values. Let's see:


// for "falsy" values
> let name = "";
> name ||= "default";
'default'
> name ||= "default2";
'default'

// is equivalent to:
> name = ""
> name = name || "default";
'default'
> name = name || "default2";
'default'

// for strict null or undefined values:
> let name = null; // or name = undefined
> name ??= "default";
'default'
> name ??= "default2";
'default'

// is equivalent to:
> name = null;
> name = name ?? "default";
'default'
> name = name ?? "default2";
'default'


Python does not have a 'None coalescing' operator (so obviously it does not have a 'None coalescing assignment' operator) so as equivalent we have to use an if-else expression. We have the 'or' operator (that we can use with falsy values), but not an "or assignment" operator. So the equivalent code to the above JavaScript is quite more verbose:


# for "falsy" values
> name = ""
> name = name or "default"
'default'
> name = name || "default2"
'default'

# for strict None values:
> name = null
> name = name if name is not None else "default"
'default'
> name = name if name is not None else "default2"
'default'

As the if-else pattern is quite verbose, we can write a simple coalesce function (I've just remembered that such function is almost standard SQL) to make code more straightforward.


def coalesce(value, default_value):
    return value if value is not None else default_value

a = coalesce(a, "default value")

As for other languages, Kotlin has the || operator and the :? null coalescing operator, but not a shortcut form to use during assignment. Ruby has a logical or assignment operator that we can use with nil and false (the only falsy values in Ruby). It feels strange that Ruby does not have a null coalescing operator, so if we want to be strict and deal only with null (nil), we have to use the so rich Ruby syntax differently:


# for null coalescing assignment
# like JavaScript: a = a ?? "default" 
# or Kotlin: a = a ?: "default"

a = "default" if a.nil?
# or
a = a.nil? ? "default" : a


Reached this point I think it'll be good to remember what are considered falsy values (those that, when evaluated in a boolean context, are considered as false) in different languages:

  • JavaScript: false, null, undefined, 0, ""
  • Python: False, None, 0, "", [], {}, set()
  • Rubynil, false
  • Kotlinfalse. Kotlin does NOT perform truthy/falsy coercion, it's fully, strictly typed:trying to use a non boolean value in a condition causes a compilation error.

As you can see the main (and very important) difference between JavaScript and Python is that in Python empty containers are falsy.

Saturday, 7 February 2026

Python Attribute Lookup and Dunders

I already talked in the past about Python descriptors [1] and [2] (referencing also the complex attribute lookup process). Somehow I've recently realised of how some commonly used attributes are managed with descriptors present in classes or metaclasses. First, I'll paste here the conclusions after an interesting chat with a GPT regarding the attibute lookup process:

1) Instance attribute lookup (obj.attr)

This is (conceptually) what object.__getattribute__(obj, name) does:

a) Check for a data descriptor on the class or its MRO
Search type(obj).__mro__ for name in each class’s __dict__.
If found and it’s a data descriptor (has __set__ or __delete__), return descriptor.__get__(obj, type(obj)).

b) Check the instance’s own dictionary
If obj.__dict__ exists and contains name, return obj.__dict__[name].
Note: If the class defines __slots__ without __dict__, this step may not exist.

c) Check for a non-data descriptor or other attribute on the class/MRO
Search type(obj).__mro__ for name.
If found and it’s a non-data descriptor (has __get__ only), return descriptor.__get__(obj, type(obj)).
Otherwise, return the found value as-is.

d) Fallback: __getattr__
If nothing above produced a value, and type(obj) defines __getattr__(self, name), call it and return its result.

e) Otherwise
Raise AttributeError.

2) class attribute lookup (C.attr)

Conceptually, type.__getattribute__(C, name) does this:

a) Metaclass MRO — data descriptors first
Search type(C).__mro__. If name is found and it’s a data descriptor (__set__ or __delete__ present), return descriptor.__get__(None, C).

b) Class MRO (C and its bases) — regular attributes & descriptors
Search C.__mro__ (starting with C, then bases):
If found and it’s a descriptor (__get__), return descriptor.__get__(None, C) (note obj=None).
Otherwise, return the raw value.

c) Metaclass MRO — non-data descriptors and other attributes
If found on the metaclass MRO and it’s a descriptor, return descriptor.__get__(C, type(C)) (here, the “instance” is the class C) Otherwise return the value.

e) Fallback
If not found and the metaclass defines __getattr__(cls, name), call it.
Else raise AttributeError.

Let's see now some examples of attributes that are indeed descriptors:

__name__ of a class (Person.__name__). One could think that it's just an attribute directly in the class object, but if it were that way, I could acces it via an instance of the class (person1.__name__) that is not the case. So indeed __name__ is a descriptor in the metaclass (and exactly the same for __bases__ or __doc__):


>>> class Person:
...     pass
...     
>>> Person().__name__
Traceback (most recent call last):
    Person().__name__
AttributeError: 'Person' object has no attribute '__name__'

>>> Person.__name__
'Person'

>>> Person.__dict__["__name__"]
Traceback (most recent call last):
    Person.__dict__["__name__"]
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: '__name__'

>>> type(Person).__dict__["__name__"]
attribute '__name__' of 'type' objects
>>> type(type(Person).__dict__["__name__"])
class 'getset_descriptor'

>>> type(Person).__dict__["__bases__"]
attribute '__bases__' of 'type' objects
>>> type(type(Person).__dict__["__bases__"])
class 'getset_descriptor'>

>>> type(type(Person).__dict__["__doc__"])
class 'getset_descriptor'


__class__ of an instance or __class__ of a class. This one does not seem be based on descriptors, but (my discussion with a GPT is a bit confusing) it seem like it's managed specially by the look up algorithm.


>>> p1 = Person()
>>> p1.__class__
class '__main__.Person'

>>> type.__class__
class 'type'

>>> type(p1.__dict__["__class__"])
Traceback (most recent call last):
    type(p1.__dict__["__class__"])
         ~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: '__class__'

>>> type(Person.__dict__["__class__"])
Traceback (most recent call last):
    type(Person.__dict__["__class__"])
         ~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: '__class__'


Dunder attributes. It's interesting to note that there are 2 categories of __dunder__ attributes (those that start and end by "__").
- On one hand we have those like the ones we've just seen, these are Special Attributes (Metadata), that are used to store metadata: __name__, __class__, __bases__, __mro__, __dict__, __module__, __doce__, __annotations__.
- And on the other hand we have Special Methods (Behavioral Hooks), that are used to implement Python's syntactic sugar:

__call__: ob(), Invokation
__getitem__: ob[key]
__setitem__: ob[key] = value 
__getattr__: Fallback for missing attributes
__getattribute__: Intercepts all attribute access
__iter__, __next__: Iteration
__str__, __repr__: String representation
__eq__, __lt__, etc: Comparisons
__enter__, __exit__: Context managers
__add__, __mul__, etc: Arithmetic operations

Notice that if you access a Behavioral Hook "on your own" (I mean, you explicitly do: obj.__call__() or obj.__iter__()) the normal look up mechanism applies (using the object and its class). However, when used in the intended way (when you do obj(), or iter(obj)) the look up is done only in the class of the object (and if an object is a class it's done in its metaclass) not in the object itself.

Friday, 30 January 2026

Python Multinested Closure

After my previous post (that mentions other related posts with similar details) about Python closure introspection (and a bit of internals) I came across a detail that at first seemed strange to me, but that makes much sense (and made me further dive into the implementation).

Let's say we have these nested functions (with 3 levels of nesting). We return the most nested function (inner_2) that traps variables in the most outer function (becoming a closure):


def outer():
    print("outer")
    x = "a"
    y = "b"
    def inner_1():
        # it's using x
        nonlocal x
        x += "b"
        print(f"inner_1: {x}")
        def inner_2():
            # it's using both x and y
            nonlocal x
            x += "c"
            print(f"inner_2, x:{x} y:{y}")
        return inner_2
    return inner_1

in_1 = outer()
in_2 = in_1()


inner_2 is trapping 2 variables defined in outer: x and y. We can see it by checking its __closure__ and the co_freevars in its __code__ object, and the co_cellvars of the outer function code object:


print(f"in_2.__closure__: {in_2.__closure__}.") # 2 cells, for the x and y values>
print(f"in_2.__code__.co_freevars: {in_2.__code__.co_freevars}.") # in_2.__code__.co_freevars: ('x', 'y')

# in_2.__closure__: (cell at 0x78f58889d570: str object at 0x78f58886f930, cell at 0x78f58889d540: str object at 0x5a0cc7144e08).
# in_2.__code__.co_freevars: ('x', 'y').

print(f"outer.__code__.co_cellvars: {outer.__code__.co_cellvars}") # ('x', 'y')
# in_1.__code__.co_freevars: ('x', 'y')

But checking these attributes for the intermediate inner function comes with some surprise:


print(f"in_1.__closure__: {in_1.__closure__}.") # 2 cells, for the x and y values>
print(f"in_1.__code__.co_freevars: {in_1.__code__.co_freevars}.") # in_1.__code__.co_freevars: ('x', 'y').
print(f"in_1.__code__.co_cellvars: {in_1.__code__.co_cellvars}") # () 
print(f"outer.__code__.co_cellvars: {outer.__code__.co_cellvars}") # ('x', 'y')

#in_1.__closure__: (cell at 0x78f58889d570: str object at 0x78f58886f930, cell at 0x78f58889d540: str object at 0x5a0cc7144e08).
#in_1.__code__.co_freevars: ('x', 'y').
#in_1.__code__.co_cellvars: ()
#outer.__code__.co_cellvars: ('x', 'y')


inner_1 is trapping x in its closure, which is normal as it's using it, but it's also trapping y, that it's not using, why? Well, indeed inner_1 is not using y in a direct, visible way, but it needs it, as when the inner_2 function object is created, it needs both x and y for its closure. The cells for x and y are created in the heap when outer is executed. outer creates inner_1 and returns it, so when inner_1 is executed and creates inner_2, outer is long gone, so we need to have the reference to the x and y cells somewhere, to put them in inner_2.__closure__. That "somewhere" is inner_1 closure. So yes, even if inner_1 only works directly with x, it gets y also in its closure.

Discussing this with a GPT you get a nice explanation:

This is sometimes described as “transitive closure capture” or “cell promotion/relaying”: an intermediate function (inner_1) must carry closure cells that it doesn’t itself use, so that functions nested within it can close over them.

In other words: If a nested function needs a variable from an outer scope, every function layer in between must carry that variable as a closure cell, even if those intermediate layers don’t use it directly.

Only the immediate lexical parent can provide the closure cells to a newly created function.

The approach followed by Python for creating its closures is rather different from that of JavaScript, and explains the limitation that I mentioned in this post. In Python the compiler checks if a function closes over variables of its outer scopes, and if so, it sets the co_freevars and co_cellvars of the corresponding code objects and adds the necessary instructions so that at execution time cell objects get created and when the function object is created, its __closure__ can be correctly set, with exactly the cells that it needs. If some "dynamic code" (code compiled dynamically with exec()) tries to access to a variable of an outer scope that had not been trapped by the __closure__ of the function that invokes exec, it can't, as it's not there. In JavaScript this is quite different. eval() has access to any variable of the outer scopes, because indeed all functions in JavaScript have access to all its outer scopes through the scope chain. When a function is created, it gets its [[scope]] property set to the scope (the activation object I think it's called) of its parent function. So if we have a certain level of nesting when defining functions, we end up with a chain of scopes. And the variable look up mechanism will search in this chain if it does not find a variable in the current scope. This is very powerful, but at the same time has serious performance implications. Outer scopes are kept alive regardless of whether the inner functions access to them or not (cause we allow eval to access to them, and we don't know what eval will be evaluating). This also involves extra longer look ups.

Nicely explained by a GPT:

JavaScript keeps the entire lexical scope chain alive, whereas Python collapses scopes into minimal “cell objects” and releases frames as soon as possible.

In JavaScript, every function carries a scope chain because dynamic features like eval() force engines to preserve the full lexical environment at runtime. Python does not need this because its lexical scope is fixed at compile time and not accessible to exec()/eval().

I was wondering how the most powerful and dynamic language that I can think of, ruby, manages this. I have no practical ruby knowledge, so I just asked a GPT, and as expected it follows a very similar approach to JavaScript, keeping sort of a chain of "scopes" that allows eval access to variables in any of them. From a GPT:

Ruby’s closures sit right between Python and JavaScript, but they lean much closer to JavaScript in philosophy:

  • They close over entire lexical scopes, not a minimal set of cell-like variables.
  • Ruby scopes are runtime objects (not a purely compile‑time fiction like Python’s).
  • Blocks, Procs, and lambdas capture the full environment, not a pruned subset.
  • Ruby supports eval within a Binding, which preserves the whole lexical + dynamic scope much like JavaScript’s eval.

Wednesday, 21 January 2026

Python Closure Introspection

I talked time ago about some minor limitation (related to eval) of Python closures when compared to JavaScript ones. That's true, but the thing is that Python closures are particularly powerful in terms of introspection. In this previous post (and some older ones) I already talked about fn.__code__.co_cellvars, fn.__code__.co_freevars and fn.__closure__, as a reminder taken from here

  • co_varnames — is a tuple containing the names of the local variables (starting with the argument names).
  • co_cellvars — is a tuple containing the names of local variables that are referenced by nested functions.
  • co_freevars — is a tuple containing the names of free variables; co_code is a string representing the sequence of bytecode instructions.

And the __closure__ attribute of a function object is a tuple containing the cells for the variables that it has trapped (the free variables).


# closure example (closing over wrapper and counter variables from the enclosing scope)
def create_formatter(wrapper: str) -> Callable[[str], str]:
    counter = 0
    def _format(st: str) -> str:
        nonlocal counter 
        counter += 1
        return f"{wrapper}st{wrapper}"
    return _format

format = create_formatter("|")

print(format("a"))
# |a|

# the closure attribute is a tuple containing the trapped values
print(f"closure: {format.__closure__}")
print(f"freevars: {format.__code__.co_freevars}")
# closure: (cell at 0x731017299ea0: int object at 0x6351ad1bd1b0, cell at 0x731017299de0: str object at 0x6351ad1cd2e8)
# freevars: ('counter', 'wrapper')


A cell is a wrapper object pointing to a value, the trapped variable, it's an additional level of indirection that allows the closure to share the value with the enclosing function and with other closures that could also be trapping that value, so that if any of them changes the value, this is visible for all of them.



def create_formatters(format_st: str) -> Callable[[str], str]:
    """
    creates two formatter closures that share the same 'format' free variable.
    one of them can disable the formatting by setting the format string to an empty string.
    """
    def _prepend(st: str) -> str:
        nonlocal format_st
        if st == "disable":
            format_st = ""  # Example of modifying the closed-over variable
            return
        return f"{format_st}{st}"
    
    def _append(st: str) -> str:
        return f"{st}{format_st}"
    
    return _prepend, _append


prepend, append = create_formatters("!")
print(prepend("Hello"))  
print(append("Hello"))    
# !Hello
# Hello!

prepend("disable")
print(prepend("World"))  # Output: World (since format_st was modified to "")
print(append("World"))   # Output: World
# !Hello
# Hello!


Here you can find a perfect explanation of co_freevars, co_cellvars and closure cells:

Closure cells refer to values needed by the function but are taken from the surrounding scope.

When Python compiles a nested function, it notes any variables that it references but are only defined in a parent function (not globals) in the code objects for both the nested function and the parent scope. These are the co_freevars and co_cellvars attributes on the __code__ objects of these functions, respectively.

Then, when you actually create the nested function (which happens when the parent function is executed), those references are then used to attach a closure to the nested function.

A function closure holds a tuple of cells, one each for each free variable (named in co_freevars); cells are special references to local variables of a parent scope, that follow the values those local variables point to.

If we have a function factory that creates a closure, each time we invoke it we'll get a new function object with its __closure__ attribute pointing to its own object (a tuple), but with __code__ pointing to the same code object. So all those instances of the function have the same bytecodes and metainformation, but each instance has its own state (closure cells/freevars).

The closure "superpowers" that Python features are:

1) As we saw above, ee can easily check if a function is a closure (has cells/freevars) just by checking if its __closure__ attribute is not None (or if its __code__.co_freevars tuple is not empty).

2) We can see "from outside" the values of the closure freevars (the names, the values, and combine both with a simple "show_cell_values" function). And furthermore, we can modify them, just by modifying the contents of the cells in fn.__closure__. It's what we could call "closure introspection".



# combining the names in co_freevars and the values in closure cells to nicely see the trapped values
def show_cell_values(fn) -> dict[str, CellType]:
    return {name: fn.__closure__[i].cell_contents
        for i, name in enumerate(fn.__code__.co_freevars)
    }

def cell_name_to_index_map(fn) -> dict[str, int]:
    return {name: i for i, name in enumerate(fn.__code__.co_freevars)}

def get_freevar(fn, name: str) -> Any:
    name_to_index = cell_name_to_index_map(fn)
    return fn.__closure__[name_to_index[name]].cell_contents

def set_freevar(fn, name: str, value: Any) -> Any:
    name_to_index = cell_name_to_index_map(fn)
    fn.__closure__[name_to_index[name]].cell_contents = value
    
    
def create_formatter(wrapper: str) -> Callable[[str], str]:
    counter = 0
    def _format(st: str) -> str:
        nonlocal counter 
        counter += 1
        return f"{wrapper}st{wrapper}"
    return _format

format = create_formatter("|")

print(f"format cells: {show_cell_values(format)}")
print(f"format 'wrapper' freevar before: {get_freevar(format, 'wrapper')}")
print(format("a"))
# format cells: {'counter': 1, 'wrapper': '|'}
# format 'wrapper' freevar before: |
# |st|

set_freevar(format, 'wrapper', '-')

print(f"format 'wrapper' freevar after: {get_freevar(format, 'wrapper')}")
print(format("a"))
# format 'wrapper' freevar after: -
# -st-