Sunday, 21 June 2026

Python Class Body and Lexical Scope

In my previous post about class creation in Python I mentioned how a code object is created for the code in the class body, and then that code object is executed as a function receiving a namespace object (created by __prepare__) as its locals. The class body will add attributes to that namespace, and can use whatever is already present in that namespace (if __prepare__ has put something there). This has made me wonder if apart from that, the class body can have access to its enclosing scope. Just remember that in this post we saw that methods in a class have access to its enclosing scope (they close over variables defined outside the class).

So the answer is YES, and it's just the closures mechanism in action. Let's see an example:


def create_class(id: str):
    class MyClass:
        # the class initialiation (the class body) has access to "external" variables in the enclosing scope, such as "id"
        # the code in the class body is placed in a codeobject that will run in a function having trapped (closed over) the "id" free variable
        class_id = id
        
        print(f"Free variables: {inspect.currentframe().f_code.co_freevars}")
        # Free variables: ('id',)

        def __init__(self, value):
            self.value = value

        def display(self):
            print(f"MyClass value: {self.value}, Class ID: {self.class_id}")

    return MyClass

cl = create_class("123")
instance = cl("aa")
instance.display()

# Free variables: ('id',)
# locals: {'__module__': '__main__', '__qualname__': 'create_class..MyClass', '__firstlineno__': 7, 'class_id': '123'}
# MyClass value: aa, Class ID: 123


As you can see, that class body (my understanding is that Python will execute the code object corresponding to that class body by putting it in a "synthetic function") has access to the id variable in the outer scope and can assign it to one of its attributes. We can see that 'id' in the list of freevars for the code object of the class body (that we get accessing the current frame from the class body itself). However, I came across something that confused me. If I print locals() from the class body, I can't see 'id' there. That's very strange, if I do the same from a normal function, locals shows both "normal" variables and those that the function has trapped in its closure, but, as I've said, for the class body, 'id' is missing in locals:


def create_class(id: str):
    class MyClass:
        # the class initialiation (the class body) has access to "external" variables in the enclosing scope, such as "id"
        # the code in the class body is placed in a codeobject that will run in a function having trapped (closed over) the "id" free variable
        class_id = id
        
        print(f"Free variables: {inspect.currentframe().f_code.co_freevars}")
        # Free variables: ('id',)

        # notice that locals() does not show the id free var
        # that's because we are running in "class scope" and locals() just shows its namespace, not the closure cells. The closure is a separate object that holds references to the free variables, and it is not part of the local namespace of the class body. However, the class body can still access the free variable "id" through the closure.
        print(f"locals: {locals()}")
        # {'__module__': '__main__', '__qualname__': 'create_class..MyClass', '__firstlineno__': 4, 'class_id': '123'}

While for a normal function, it's well there:



def outer(id):
    def inner(value):
        # locals() shows "id" free variable because we are in a function scope (optimized scope)
        print(f"locals: {locals()}")
        # {'value': 'bb', 'id': '456'}
        print(f"Inner function value: {value}, Outer ID: {id}")
    return inner

inner_func = outer("456")
inner_func("bb")


So, why is that? At the time of the Python3.13 release I wrote a post about locals(), f_locals and the "local namespace" (this is related to PEP-667). Well, indeed what I mention on that post (based on other articles) about the "local namespace" as a sort of dictionary is not correct for normal functions in recent Python versions. In "normal" functions we are working in an Optimized Scope. In this optimized scope local variables are not placed in a dictionary and accessed by key, but in a fastarray, and accessed by index (you can see this using dis to check the bytecodes of a function). This locals fastarray, part of the _PyInterpreterFrame for each running function, contains both local variables (including function arguments and those local variables that are cells, cellvars, because they are trapped by inner functions in its closure) and variables trapped by the function itself in its closure:

In CPython's frame object, the `fastlocals` array is laid out as:

[regular locals] [cellvars] [freevars]

At the beginning of a function that has variables in its __closure__ the `COPY_FREE_VARS` bytecode instruction copies cell references from `__closure__` into the frame's fastlocals array for quick access!*
After `COPY_FREE_VARS` executes, all variables (normal locals, cellvars and freevars) are accessed from the fastlocals array during function execution.

By the way, regarding the aforementioned _PyInterpreterFrame, I'll leverage to copy here some GPT wisdom about frames in recent (Python 3.11 and above) Python versions

The _PyInterpreterFrame is an internal C struct introduced in Python 3.11 that represents a stack frame for execution, aiming to improve performance by reducing the overhead of allocating full Python PyFrameObject objects.

Purpose: It holds the execution state for code objects, including local variables, globals, builtins, and the instruction pointer (f_lasti).
Performance: Unlike older Python versions where every frame was a full heap-allocated PyFrameObject, _PyInterpreterFrame is designed to be lightweight and often lives on the C stack, reducing garbage collection pressure.

The traditional PyFrameObject still exists, but it has been relegated to a "shadow" role. It is now treated purely as a compatibility API wrapper.

Python only creates a PyFrameObject on demand when a tool or a piece of code explicitly asks to inspect the call stack. This process is often referred to as materializing a frame.

Thursday, 11 June 2026

Class Statement vs Dynamic Class Creation

We know that along with the standard class statement, Python also allows us to create classes dynamically by calling type() (or another metaclass if our class has a metaclass other than type)

I already dedicated a rather thick post to type. Basically we use it for creating a new class like this: type(classname, superclasses, namespace). (the namespace is just a dictionary with the attributes).

So I was wondering if the compiler translates a class statement into a call to type(), and yes, more or less we can say so, but there are some extras. I've had a really insightful conversation with a GPT about this, and additionally I've found an excellent article that explains it in full detail

The steps that Python follows when it comes across with a class statement (class Foo(Base, metaclass=Meta): x = 1) are these (I'm taking it from a GPT discussion, it's basically the same that is explained in the linked article)

  • Step 1 — Determine the metaclass. Python calls __build_class__ (a builtin), which inspects the bases and the explicit metaclass= kwarg to resolve which metaclass to use (with MRO-based metaclass conflict resolution).
  • Step 2 — Prepare the namespace. The metaclass's __prepare__ classmethod is called: namespace = Meta.__prepare__('Foo', (Base,), **kwargs). This returns the dict (or dict-like object) that will serve as the class namespace. For type, this is just a plain dict. For enum.EnumMeta, for example, it returns a special _EnumDict.
  • Step 3 — Execute the class body. The compiled code object for the class body is executed as a function, with the namespace from step 2 as its locals(). This is the key insight: it's essentially exec(body_code, globals(), namespace). After this, namespace contains {'x': 1, '__module__': ..., '__qualname__': ...}.
  • Step 4 — Call the metaclass. Meta('Foo', (Base,), namespace) is called — which, for the default type, invokes type.__call__ → type.__new__ → type.__init__. This is where the actual class object is constructed.

How do the above steps look at the bytecode level? When the Python compiler (yes, in Python, where compilation is like a hidden step that happens the first time our code is run (or has changed), it's sometimes confusing to establish the difference between compilation time and execution time), comes across a class statement, it creates a code object for the code that we've placed inside that statement (the body of the class statement), along with code objects for each function (method) defined in that code, and creates a sequence of bytecode instructions that at runtime will make use of that code object (and many more things) to create a class object (yes, remember that classes are objects).

That sequence of bytecode instructions can vary slightly with Python versions (what I'll show below, that corresponds to python 3.14 is slightly different from what is shown in the aforementioned article), but the intent is the same.


class Person:
	pass

# translates into:

 0           RESUME                   0

  1           LOAD_BUILD_CLASS
              PUSH_NULL
              LOAD_CONST               0 (code object Person at 0x78d07576e730, file "class_creation.py", line 1)
              MAKE_FUNCTION
              LOAD_CONST               1 ('Person')
              CALL                     2
              STORE_NAME               0 (Person)
              LOAD_CONST               2 (None)
              RETURN_VALUE


So those seem like very few instructions for the complex 4 steps that I've just described!. Well, that's because all the magic happens in a builtin function __build_class, that is loaded by the LOAD_BUILD_CLASS bytecode instruction. The article makes a great job explaining these opcodes.

When we create a class dynamically using type() (or any other metaclass), we are directly at step 4, we are skipping the first 3 steps. Obviously it's us who choose the metaclass to use, and there's not class body to execute. And as we create ourselves the namespace object to pass to the metaclass the __prepare__ method that helps prepare that namespace is not executed. That's maybe the main difference then, that in the dynamic class creation the metaclass __prepare__ method does not intervene. That's interesting, cause indeed I was not familiar with that __prepare__ method (also referred as hook).

When talking about metaclasses I always think about __new__ and __init__ (and __call__ that intervenes when instances of a class created by the metaclass are created), I've talked about them in different posts, one of the most interesting being this, but was unfamiliar with __prepare__. We've seen that it allows us to prepare the namespace, OK, but when can we need that? Well, very rarely (this is particularly dark metaclass stuff). That can be material for another post, for now I'll just say that enum.EnumMeta makes use of it.

Sunday, 7 June 2026

SQL, NULL, Unknown

Lately I've been revisiting the rather particular behaviour of NULL in SQL, and it has led me into a better understanding of how different SQL is from General Programming Languages

- Binary Logic vs Ternary Logic

General Programming Languages (Python, JavaScript, ruby, Java...) use binary logic (Boolean logic in particular, and indeed that's the only logic I was aware of). Conditions are either True or False.
SQL uses a ternary logic (kleene logic), where we have TRUE, FALSE, and UNKNOWN

- The meaning of Missing Data

Both in General Programming Languages and in SQL we use null (None in Python) to represent missing data. There are 2 reasons for missing data, either it does not apply to that object, or we don't know it. Let's say we have an instance of a ShopItem class. Its expirationDate attribute can be null either because this object is a Book, and books do not expire, or because the printed date on this beans can is blurry (or we've had no time to read it yet) and then we don't know it, it's unknown.

In general Programming Languages null is a value (that represents that there's nothing here, there is no value here, for whatever the reason, either because it does not apply or because we don't know it), and with binary logic comparing a value to to another value is either true or false. So "a" == null is false, and null == null is true.

In SQL we have a sort of mismatch. On one hand we have ternary logic with that additional UNKOWN concept, but on the other hand we still have a single value, NULL, to represent both that it does not apply or that we don't know it. So how should NULL behave in comparisons? SQL designers decided to treat NULL as a marker that represents that the value is unknown (so we can not express that the value does not apply).

Once we have understood that, the apparent odd behaviour of NULL in comparisons suddenly makes sense. Any comparison using the standard operators (=, !=, <, >, <>) involving a NULL value will return UNKNOWN, even NULL = NULL or NULL != NULL return UNKNOWN. The negation of UNKNOWN (NOT UNKNWON) is also UNKNOWN.

What is odd is what I've just said, that SQL lacks a way to indicate that the value does not apply. It seems one of the main influences in the design of SQL ended realizing this was a serious problem, but too late:

Codd actually realized this flaw later in his life and proposed that SQL should have two different kinds of NULLs: A-Values (Absence) and I-Values (Information Unknown). Sadly, by then, SQL was already set in stone.

Sunday, 24 May 2026

Context Managers Part 1

Context Managers have existed in Python since version 2.5, while Assignment Expressions (walrus operator) were added in version 3.8. Somehow recently I came up to wondering if we can replace the "as" by a ":=" assignment. I mean, can we do this?:


with it := MyContextManager():
	# do whatever with it

rather than this:


with MyContextManager() as it:
	# do whatever with it

The answer is NO, or well, more accurately, sometimes yes, sometimes no, but you should always avoid it. To understand this we have to review what a Context Manager is and how they work. Notice that they're part of a broader concept: Automatic Resource Management that also includes Garbage Collecion and RAII (Resource Acquisition Is Initialization).

A Python Context Manager handles the setup and cleanup of resources in your programs. A context manager is any object that implements:


__enter__(self)
__exit__(self, exc_type, exc_value, traceback)

And it's used like this:


with EXPR as target:
    BODY

And now the important part. Conceptually, Python does something roughly like this (as explained by a GPT):


resource_manager = EXPR
resource = resource_manager.__enter__()
try:
    target = resource
    BODY
finally:
    manager.__exit__(...)

So the key point is: The object used to manage the context and the object bound after as do not have to be the same object. That is exactly why __enter__() is allowed to return anything.

Some Context Managers are implemented so that __enter__ returns the context manager itself, while others return a different object. Basically, in the first case the resource being managed and the Resource Manager (Context Manager) are the same object, in the second case they are different as the management responsability has been moved away from the resource itself, to a different object.

Another interesting topic. We know that in Python a single with block can include multiple context managers. I mean:


with open('a.txt', 'r') as fr, open('b.txt', 'w') as fw:
    do_something(fr, fw)

I was wondering if cases where the second context manager makes use of the first context manager, and that I think is more common to find written like this:


with ContextManager1("aaa") as ctx1:
	with ContextManager2(ctx1) as ctx2:
    		do_something(ctx1, ctx2)

Could be written with a single with (avoiding the additional nesting level):


with ContextManager1("aaa") as ctx1, ContextManager2(ctx1) as ctx2:
	do_something(ctx1, ctx2)

The answer is YES. The first

Python evaluates multiple context managers in a single with statement sequentially from left to right. The moment the first context manager is entered, its return value is bound to the as variable, making it immediately available for the next context manager on the same line.

Sunday, 17 May 2026

Type Hints Notes 2026

Type hints are more and more prevalent in recent Python code. I'm still not too severe about them, but my level of strictness continues to grow over time. I've lately learnt a couple of things:

Variadic Parameters. When typing functions that have packed (variadic) parameters in its signature (*args, **kwargs), we put the type of the individual parameter, we don't have to type it as a collection or dictionary (save if each parameter is really a collection), I mean, for a pipe function we should do:


# this is RIGHT
def pipe(val: Any, *fns: Callable) -> Any:

# this is WRONG
def pipe(val: Any, *fns: list[Callable]) -> Any:

Tuples. In Python we use tuples for "groups" of a fixed number of elements, a pair, a trio... We express it in the signature like this: tuple[str, str] or tuple[int, str, str]... But how to express that a function returns (or receives) a "group" of an unknown number of elements? We also use tuples, combined with ellipsis (...), like this:


tuple[int, ...]         # any number of ints: (), (1,), (1, 2, 99), ...
tuple[int | str, ...]   # any number of elements, where each element can be an int or a str (), ("a", 1), ("a", "b", "c"), (1, 2, 1) ...
  
tuple[int, str, bool]   # exactly 3 elements: an int, a str, a bool
  

An important detail that I've learnt thanks to a typing issue. We know that in a Python try-except block, the except clause can manage multiple exception types, I mean: except RuntimeError, TypeError, NameError:. Those multiple exceptions are a tuple, not just any iterable. Let's see an example (the last line is what is WRONG):


# multiple exceptions
def multiple_exceptions(exceptions: tuple[type[Exception], ...]) -> None:
    try:
        raise ValueError("This is a ValueError")
    except exceptions as e:
        print(f"Caught an exception: {e}")

multiple_exceptions((ValueError, TypeError))  
# Caught an exception: ValueError

# important, this is WRONG, we have to pass a tuple, not just any collection
multiple_exceptions([ValueError, TypeError])
# TypeError: catching classes that do not inherit from BaseException is not allowed


Indeed, an equivalent function with a variadic signature feels more natural and idiomatic than the above (and furthermore prevents the confusion of passing over any collection rather than exactly a tuple):


# this variadic signature feels more natural
def multiple_exceptions2(*exceptions: type[Exception]) -> None:
    if not exceptions:
        raise ValueError("pass at least one exception type")
    try:
        raise ValueError("This is a ValueError")
    except exceptions as e:
        print(f"Caught an exception: {e}")

multiple_exceptions2(ValueError, TypeError) 


And notice also how I've added a guard against the empty-call case (except () is invalid at runtime).

Sunday, 10 May 2026

Python partial and placeholders

When Python 3.14 was released I had already read about some of its main features (those that involve a PEP and that have been discussed in the Python discussion forums), like Lazy Annotations and Template Strings. When reading in depth recently the release notes I came across a small feature added to functools.partial (and partialmethod) that I find particularly useful:

functools:
Add the Placeholder sentinel. This may be used with the partial() or partialmethod() functions to reserve a place for positional arguments in the returned partial object. (Contributed by Dominykas Grigonis in gh-119127.)

Just a reminder of what partial function application is (don't confuse it with the related concept of curried functions):

In computer science, partial application (or partial function application) refers to the process of fixing a number of arguments of a function, producing another function of smaller arity.

Indeed I already talked about functools.partial some time ago

The "basic" approach to partial function application is that we can just fix (pre-fill) arguments from left to right. This is what we have also in JavaScript with function.prototype.bind (that binds as first argument the "this" value). As Python supports named arguments, functools.partial already supported fixing named arguments.


def format_geo_info(country, region, city, population):
    return f"{city}, {region.upper()} ({country}) - {population}"
    
bound_format = functools.partial(format_geo_info, "France")
print(bound_format("Occitanie", "Toulouse", 500_000))
# Toulouse, OCCITANIE (France) - 500000
print(bound_format("Occitanie", city="Toulouse", population=500_000))
# Toulouse, OCCITANIE (France) - 500000


What was not possible until this version was fixing some intermediate non-named argument, but this is possible since version 3.14 thanks to the Placehodler sentinel value:



format_french_city_with_unknown_population = partial(format_geo_info, "France", Placeholder, Placeholder, 0)
print(format_french_city_with_unknown_population("Ile de France", "Saint Denis"))
# Saint Denis, ILE DE FRANCE (France) - 0

Not a revolutionary feature, but one that I've missed occasionally. A trivial implementation could be something like this:


# supports positional and keyword arguments, but not placeholders
def my_basic_partial(func, *args, **kwargs):
    return lambda *fargs, **fkwargs: func(*args, *fargs, **(kwargs | fkwargs))
    
# add support for placeholders in the arguments
PLACEHOLDER = object()
def my_complete_partial(func, *args, **kwargs):
    def new_func(*fargs, **fkwargs):
        merged_args = []
        fargs_iter = iter(fargs)
        for arg in args:
            if arg is PLACEHOLDER:
                merged_args.append(next(fargs_iter))
            else:
                merged_args.append(arg)
        merged_args.extend(fargs_iter)
        return func(*merged_args, **(kwargs | fkwargs))
    return new_func

format_french_city_with_unknown_population = my_complete_partial(format_geo_info, "France", PLACEHOLDER, PLACEHOLDER, 0)
print(format_french_city_with_unknown_population("Ile de France", "Saint Denis"))
# Saint Denis, ILE DE FRANCE (France) - 0

format_2 = my_complete_partial(format_geo_info, "France", city="Toulouse")
print(format_2("Occitanie", population=500_000))
# Toulouse, OCCITANIE (France) - 500000


In my aforementioned previous post about partial in Python I gave some reasons for using partial over directly trapping the variables with a closure (of course internally partial has to use either closures or a callable class). I've just realised that I was missing the main reason, partial is more semantic.

- Intent-Revealing Code: partial(func, arg) explicitly states your intent to partially apply arguments, improving readability and self-documentation. - Declarative Style: It focuses on the result (a new specialized function) rather than the imperative mechanics of capturing lexical scope.

Lodash, the excellent JavaScript library, also features placeholders in its implemention of partial.

Sunday, 3 May 2026

Glibc

Compiling to native code, and furthermore for a Linux system... wow, sounds scary, and very, very far away from what I've been doing in the last decade(s). Well, the thing is that my employer decided some months ago that we had to compile to native code some of our Python applications. It's not something performance related, it's for preventing access to the source code of these applications. We were looking into cython, but we settled on Nuitka, an amazing piece of software that has been serving us so well.

Normally almost every native application compiled for a Linux system has been dynamically linked against glibc. OK, and, what's glibc?

The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages).

So when a Linux native application (using glibc) starts, the dynamic linker (libdl.so) will dynamically load the shared objects (SO, .so files, the equivalent to windows DLL's) needed by the application (like glibc.so) and link the callsites to the functions imported from those libraries.

Obviously glibc evolves over time, so, what about versions? First, what glibc version is installed on my system? You can check the SO's loaded by a running process by doing: lsof -p PID | grep .so. Normally you'll see that it's using: libc.so.6 (in Ubuntu it located here: /usr/lib/x86_64-linux-gnu/libc.so.6). That 6 is not the version number (libc.so.6 is the name for the library since 1997!), the version number is something like 2.XX (2.39 in my ubuntu 24.04). You find it by using: ldd --version

So, what happens if I compile my application in a system with one version of glibc and try to run it in a system with a different version? Well, the situation is quite more fine-grained that I thought. Version numbers are not checked at the glibc level, but at the function level. This is so because glibc uses symbolic versioning

The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. This allows for extreme backward compatibility without breaking the system every time a single function is updated.
    
    The primary goal of symbol versioning is backward compatibility. It allows a single library file to provide multiple versions of the same function so that:

    Old binaries compiled against v2.10 continue to use the v2.10 implementation.

    New binaries compiled against v2.11 use the new v2.11 implementation.

So multiple versions of the same function live inside glibc, and your binary will dynamically link against the one it was compiled for. And, when does the version number of a function change? Normally it only changes if the function interface (the contract, the ABI) changes, but not if its only its internal implementation that changes. So if we compare symbolic versioning to semantic versioning (SemVer, a more familiar versioning schema), we could say that in symbolic versioning a version change corresponds to a Major version in semantic versioning.

You are exactly right: A new Symbol Version is functionally equivalent to a Major Version bump for that specific function. It signals to the linker that the "Contract" for that specific symbol has changed, and old programs should look for the previous contract elsewhere in the same file.

Notice how symbolic versioning is used for functions inside a library, while semantic versioning (when used) is normally used for libraries.

The glibc version (that one obtained with ldd --version) has no importance in terms of loading the library in memory (the dynamic linker will load libc.so.6 regardless of its "internal" version), the important part is the specific version of each function that we try to link.

I guess when you program in C you are aware of the version of each function that you are using, as you have to adapt your code to the ABI of the function if it has changed, but when that happens behind the scenes, that's quite different. In our case, we just write Python code, and the beautiful Nuitka takes care of transforming it to C and then compiling it to native. So it's Nuitka who takes care of writing the C code in accordance to the function versions inside the glibc in the system. So if then you run that binary in a system with an older glibc version it could happen that your binary is "pointing" to a function with a symbolic version (let's say openEncryptedFile@GLIBC_2.12) higher than the one in the older glibc (let's say openEncryptedFile@GLIBC_2.10) present in the current system, and your application will crash. Basically this means that you have to compile your Python application in a system with a glibc version <= that the glibc version in the target system. It feels odd at first, as the starting point is just the same Python code, and if in one system it can just use openEncryptedFile@GLIBC_2.10 why doesn't it compile it always with that 2.10 even if a bigger version (openEncryptedFile@GLIBC_2.12) is present? Well, that's how things work by default, when compiling, code will be linked to the highest version of that function present in the glibc in the compilation machine.

If you wonder if other .so libraries (SO, ELF libraries) also use symbolic versioning, it depends. For smaller, simpler libraries what is usually used is the SONAME approach, the library (.so file) name changes with each version (this is a coarse grained approach).

Symbolic versioning is the technically superior approach, but it is not the universal standard for all ELF libraries. It depends entirely on the library maintainers and their commitment to long-term ABI stability.

In the Linux ecosystem, there are two primary ways to manage library changes:  
1. The "SONAME" Approach (Common)

Most smaller or simpler libraries use the SONAME mechanism. 
You’ve likely seen files like libfoo.so.1 and libfoo.so.2.

    The Logic: If the developers change the interface, they increment the "Major" version number in the filename itself.  

    The Result: Programs linked against libfoo.so.1 will refuse to start 
    if only libfoo.so.2 is present. This is a "heavy-handed" fix because 
    it requires recompiling every program that uses the library even if 
    the specific function they use didn't actually change.

2. The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), 
    but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. 
    This allows for extreme backward compatibility without breaking the system every time a single function is updated.

To complete this post, I'll add some useful, related commands:

  • To check the SO's used by a given program.
    For a binary on disk: ldd /usr/bin/program_name
    For a running process: lsof -p [PID] | grep '\.so'
  • To view the symbols used by a program (the specific functions imported from SO's)
    All imported symbols: nm -Du
    Symbols + Versions: objdump -T | grep '*UND*'
    Only glibc symbols: objdump -T | grep 'GLIBC_'
    Library Version Map: readelf -V
  • To view the symbols/functions exported by glibc in your system: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6

Some additional findings related to the last command. For example I want to see the versions of pthread_spin_init present in my glibc: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6 | grep pthread_spin_init

That gives me:

0000000000a4130 g DF .text 000000000000000d GLIBC_2.34 pthread_spin_init 00000000000a4130 g DF .text 000000000000000d (GLIBC_2.2.5) pthread_spin_ini

Which is very interestinng as it shows us that a symbol version is not a sequential counter for that specific function. Instead, it is a timestamp or a marker of the glibc release that defined that specific version of the function's ABI. From a GPT:

How glibc handles ABI changes with symbol versioning

Original version: Suppose foo() was introduced in GLIBC_2.2.5. That version is tagged as foo@GLIBC_2.2.5.

ABI change in glibc 2.32: If glibc developers change the ABI of foo() in version 2.32 (e.g., change its behavior, arguments, or return type in a way that breaks compatibility), they will:

Keep the old implementation as foo@GLIBC_2.2.5.
Add a new implementation as foo@GLIBC_2.32.

At runtime:

A binary linked against glibc 2.2.5 will request foo@GLIBC_2.2.5, and the dynamic linker will resolve it to the old implementation.
A binary linked against glibc 2.32 will request foo@GLIBC_2.32, and get the new implementation.

This mechanism ensures backward compatibility while allowing glibc to evolve.

Friday, 17 April 2026

Python Sentinel Values

Every now and then we need a flag or sentinel value. A unique value that we can distinguish from the normal values that we are processing and that has a particular meaning. a special, unique value used in programming to signal the end of data processing, a loop, or an operation. For example in my recent post about adding null-safety to a pipe function I was using 2 sentinels/flags: NULL_SAFE and COALESCE.

The essential function a sentinel value has to accomplish is to have a unique identity, so that comparing it by identity (Python: is, JavaScript: ===) with any other value/object in our system has to return false. So the most simple approach is just using a new object for each of our sentinels.



NO_INVEST = object()  # Sentinel value

def invest(amount: int | None | object) -> str:
	if amount is NO_INVEST:
		return "No investment"
	else:
		amount = amount or 0
		return f"we've invested {amount}"

print(invest(NO_INVEST))  # Output: No investment
print(NO_INVEST)  # Output: object object at 0x...


That simple approach works fine, but it's missing a few things. Printing the value is messy (we get an "object object at 0x..." representation, it would be nice to get NO_INVEST) and its rather typing unfriendly. Saying that invest can receive an object, apart from int or None lacks any meaning. What kind of object is that?. A sentinel value should (mainly) have these features:

  • A unique identity (is comparison)
  • A meaningful repr (debuggability)
  • Clear typing (especially for static type checkers)

Our basic sentinel only provides the first one (identity). That's why some smart guy came up with a very interesting PEP 661 proposing a new Sentinel class. Unfortunately that PEP is in deferred status. The document also presents different techniques commonly used for Sentinel values, like the simple object() that I've just shown, using an enum or using a class. Using a class is the best approach to me, I'll show several iterations until getting what I think is the best we can get so far.

Approach 1. Classes are objects in Python. We can use a class for each sentinel object, and in order to get a nice representation we can give it a metaclass with a custom __repr__. The missing piece is having some typing friendliness, we're still stuck with the Any signature. Additionally, declaring a class for something that is not intended to work as an object factory, but to be used as an object in itself is rather unnatural.



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
        
    class Sentinel(metaclass=SentinelMeta):
        pass
        
    class NO_INVEST(Sentinel): pass


    #def invest(value: int | None | Type[NO_INVEST]) -> str: # this signature feels a bit strange, but it works
    #def invest(value: int | None |Literal[NO_INVEST]) -> str: # this one feels better, but only works with mypy, not with pylance
    def invest(amount: int | None | Any) -> str:
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"

    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel in match statement
    print(NO_INVEST)  # Output: NO_INVEST

Approach 2. We can make the usage quite more natural by hiding the class creation behind a function. That also allows us to skip the Sentinel base class.



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__

    def sentinel(name: str):
        return SentinelMeta(name, (), {})
    
    NO_INVEST = sentinel("NO_INVEST")


    #def invest(value: int | Type[NO_INVEST]) -> str: # pylance doesn't like it, using a variable as type
    #def invest(value: int | Sentinel) -> str: # we don't have a Sentinel class... so forget it
    def invest(amount: int | None | Any) -> str:
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"
        
    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel
    print(NO_INVEST)  # Output: NO_INVEST


This one feels quite natural to use, but we still have the problem with typing. We can leverage the Generic types/class subscripting that we saw in my previous post for getting something like this (Approach 3)



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
    
    class Sentinel(metaclass=SentinelMeta):      
        def __class_getitem__(cls, item):
                return cls
    
    def sentinel(name: str):
        return SentinelMeta(name, (Sentinel,), {})
    
    NO_INVEST = sentinel("NO_INVEST")


    #def invest(value: int | Sentinel) -> str:
    def invest(amount: int | None | Sentinel[NO_INVEST]) -> str: # at a typing level it's equivalent to the above, but it's provides extra meaining      
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"

    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel      
    print(NO_INVEST)  # Output: NO_INVEST
	

Hey, this one feels rather good to me. The Sentinel[NO_INVEST] looks pretty nice in that signature. The type checking is mainly off, cause our sentinel() function is not typed, so it's considered as returning Any, and when we pass Any to a function it disables type checking. This means that for the type system any sentinel that we create with sentinel() is just an Any, so the type checker will allow passing any sentinel to a function that expects a specific sentinel. It's not a problem for me, what I'm mainly interested in is the semantics, the clarity, that this type annotation provides to the function signature.

Given that we are using Sentinel classes as objects, not as object factories, we can make these classes to be more object like, by preventing instantiation. Additionally, Sentinels do not have attributes and are not intended to be expanded with attributes, so we can prevent them from getting attributes added dynamically. All in all we get:



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
        
        # prevent sentinel classes from being instantiated
        def __call__(cls, *args, **kwargs):
            raise TypeError(f"{cls.__name__} is a sentinel and cannot be instantiated") 
        
        # prevent sentinel classes from being modified
        def __setattr__(cls, name, value):
            raise AttributeError(f"Cannot modify sentinel {cls.__name__}")


    class Sentinel(metaclass=SentinelMeta):      
        def __class_getitem__(cls, item):
                return cls

    
    def sentinel(name: str):
        return SentinelMeta(name, (Sentinel,), {})

    
    NO_INVEST = sentinel("NO_INVEST")


There's something missing in this implementation, support for our sentinels to be pickled (particularly important if we plan to use them in multiprocessing scenarios). That complicates the design and I've deliberately left it aside for the moment. Maybe we'll see it in another post.

Thursday, 9 April 2026

Python Type Checkers and Type Expressions

In this previous post I explained how Python allows the usage of any object, not just type objects, for its annotations "Annotations can be any valid Python expression". Annotations are used to provide metadata, and while normally that metadata is just typing information, we can provide any sort of metadata for custom use at runtime (and as we saw we have the Annotated mechanism to combine both typing and custom metadata. While doing some tests at that time I noticed that VS Code (the Pylance extension) would warn (with: "Call expression not allowed in type expression" or "Variable not allowed in type expression") against using annotations like this (that as I've said is perfectly valid):


# metadata for parameters
@dataclass
class ValueRange:
    lo: int
    hi: int

# pylance warning: Call expression not allowed in type expression
def create_post_1(
    title: ValueRange(5, 20), 
    content: ValueRange(5, 100),
) -> dict:
    return {"title": title, "content": content}


val_range = ValueRange(1, 10)
# pylance warning: Variable not allowed in type expression
def fn2(a: val_range) -> None:
    pass
    
# but it's stored OK in the function annotations 
print(annotationlib.get_annotations(fn2))
# {'a': ValueRange(lo=1, hi=10), 'return': None}


So if this works fine at runtime (you can see that the annotation is stored along with the function) why pylance warns against it? Because we have to differentiate what is valid for runtime use vs what is valid at type checking time. From a GPT:

The runtime accepts arbitrary expressions. Static type checkers do not. This is so because static typing tools parse Python, but they don’t execute it, and they don’t compile it to bytecode either. Type checkers rely on syntactic patterns, not runtime behavior

This is something I had never thought about before. Python type checkers analyze your code without executing anything (they are static). So the types in the type annotations that they are going to analyze have to be expressed in a direct, static form, not as the result of executing an expression (as they are not going to execute that expression). The different Python type checkers (mypy, Pylance, pyre...) parse python source code into an AST (normally different from CPython AST) and analyze it, they do not run code, indeed they do not even compile the code to bytecodes to create code objects. That's why they can only work with type expressions (annotation expressions), that follow a specific syntax, not with any expression. From here: Note that while annotation expressions are the only expressions valid as type annotations in the type system, the Python language itself makes no such restriction: any expression is allowed.

Type checkers operate on expressions that syntactically denote types, which basically is a type name or a generic type. And this has sparked my curiosity about how generic types (MyClass[T]) work. For the type checker it's simple, as it does not execute anything, it just has to parse that particular syntax. But what's the runtime meaning of such generic expression?

Well, it's subscripted access to an object (to a class). When we do my_instance["x"] this searches for a __getitem__ method in my_instance's type. So MyClass["x"] should just search __getitem__ in MyClass's type (that is, its metaclass). That's correct, but given that Python designers have always considered metaclasses like particularly complex and/or exotic, they decided to introduce (via PEP560) a hook to make easier to implement subscripted access to classes. Rather than having to define a metaclass for MyClass, we can directly define a __class_getitem__ method in MyClass.

Sunday, 29 March 2026

Pipe Operator and Null Safety

I've talked a couple of times [1] and [2] about how beautiful it's having a pipe operator in a language, though it's not particular common, and ways to simulate it in Python. Having a pipe operator makes applying functions to a value as convenient as chaining methods. When chaining methods we can leverage (if available) the safe navigation/optional chaining/elvis (?.) operator, to deal with null values. So, I've been thinking about null safety and pipes (not applying a function if the value is null, and coalescing to a default value).

In my previous post I mentioned that JavaScript had 2 different proposals for a pipe operator, but one of them has been discarded. I've been checking if this proposal includes null safety and the answer is not. It was discussed in the early stages, apart from the normal |> operator, having an additional ?|> operator for null safe cases, but it was discarded


// not null-safe, active proposal
user
  |> getProfile(%)
  |> formatProfile(%)
  
// null-safe, has been discarded
value ?|> fn
value |> fn ?? default

It was rejected on the basis that Pipelines should be pure syntax for data flow, not control flow.

To my surprise (I was not aware php continues to be used and evolve) PHP has recently added a pipe operator to the language, and for the moment it also lacks a null-safe version.

For Python decision makers adding a pipe operator seems "making the language too complex for beginners"... (you can't imagine how much I hate that so common kind of "pythonic" reflections...), but as I explained in my previous post we can easily add a pipe function that makes the trick (what has also been requested multiple times is adding such kind of function to functools, but no luck so far). An implementation is so simple as this:



def pipe(val: Any, *fns: Callable[[Any], Any]) -> Any:
    """
    pipes function calls over an initial value
    """
    def _call(val, fn):
        return fn(val)
    return functools.reduce(_call, fns, val)


And we can use it like this:



@dataclass
class Post:
    id: str
    title: str
    author: str

def get_post(post_id: str) -> Post | None:
    # simulate a function that may return None
    if post_id == "1":
        return Post(id="1", title="First post", author="1")
    else:
        return None

def get_address(person_id: str) -> str | None:
    # simulate a function that may return None
    if person_id == "1":
        return "Rue de La Nation, Paris"
    else:
        return None

pipe("1",
    get_post,
    lambda post: get_address(post.author),
	str.upper,
    print,
)	

# RUE DE LA NATION, PARIS


Creating a null aware equivalent is quite simple. The idea I came up with is having pipe accept not just a sequence of callables, but a sequence of callables or flag and callable or flag and value, with the flag indicating the we have to check for null before applying the Callable, or that we have to coalesce it to a value. Let's see the code:



// sentinel values
NULL_SAFE = object()
COALESCE = object()

def pipe(val: Any, *steps: Callable[[Any], Any] | tuple[Any, Callable[[Any], Any] | Any]) -> Any:
    """
    pipes function calls over an initial value, with support for null safety and coalescing:
    """
    def _call(val, step: Callable[[Any], Any] | tuple[Any, Callable[[Any], Any] | Any]) -> Any:
        if callable(step):
            return step(val)
        else:
            option = step[0]
            if option is NULL_SAFE:
                fn = step[1]
                return None if val is None else fn(val)
                
            elif option is COALESCE:
                default_val = step[1]
                return default_val if val is None else val
            else:
                raise ValueError(f"Invalid option: {option}")
    
    return functools.reduce(_call, steps, val)

pipe2("2",
    (NULL_SAFE, get_post),
    (NULL_SAFE, lambda post: get_address(post.author)),
    (COALESCE, "Not found"),
    str.upper,
    print,
)

# NOT FOUND


The function is quite minimal. We should add it proper error handing, throwing meaningful exceptions for each potential incorrect usage. You can just ask a GPT to add it and you'll end up with something like this:


def pipe(val: Any, *steps: Union[Callable[[Any], Any], Tuple[object, Any]]) -> Any:
    """
    Pipe value through callables or option-tuples.
    Steps can be:
      - a callable: called as fn(acc)
      - null_safe(fn): tuple (NULL_SAFE, fn) — only call fn if acc is not None
      - coalesce(default): tuple (COALESCE, default) — replace None with default

    Raises TypeError or ValueError for invalid steps.
    """
    def _call(val: Any, step: Union[Callable[[Any], Any], Tuple[object, Any]]) -> Any:
        if callable(step):
            return step(val)

        if not (isinstance(step, tuple) and len(step) == 2):
            raise TypeError("pipe2 steps must be callables or 2-tuples from null_safe/coalesce")

        option, payload = step
        if option is NULL_SAFE:
            if val is None:
                return None
            if not callable(payload):
                raise TypeError("NULL_SAFE payload must be callable")
            return payload(val)

        if option is COALESCE:
            default = payload
            return default if val is None else val

        raise ValueError(f"Unknown pipe2 option: {option!r}")

    return functools.reduce(_call, steps, val)


Friday, 20 March 2026

Python Annotated

In Python there is this common mantra that type annotations (type hints) do not have any runtime effect. Well, that's mainly true, as those type hints are not used by the runtime to check if your type assumptions/restrictions are correct (as the documentation says: "The Python runtime does not enforce function and variable type annotations. "). But on the other hand, this type information exists at runtime (only that the runtime itself does not use it). Until recently you would use inspect.get_annotations to get that info, a dictionary that gets stored in the __annotations__ attribute (notice that Annotations have been improved in Python 3.14, they're now evaluated lazily, and you should use now the annotationlib.get_annotions). So that information is available at runtime, and your custom code can make use of it for whatever it feels fit.

Additionally, the type hints/annotations syntax allows us to use any object as an Annotation, not just a type. Indeed Python’s grammar does not restrict the content of annotation expressions, as PEP 3107 explicitly states: "Annotations can be any valid Python expression". Put another way: Python does allow arbitrary expressions (hence returning anything) in type annotations. This means that you can use the syntax for defining information (metadata) about these parameters, information that then is used by your custom code for something other that type-checking (for example, stating that a function expects a string following a certain pattern, let's say: create_user(msg: r"^[0-9a-f]{32}$")). That's very nice, but most likely you'll want to combine both the typing information and the extra metadata information. With that in mind, the Annotated class (Annotated[T, x]) was introduced some versions ago. T is a Type, and type-checkers understand the Annotated class and just take care of the T part. The x part is for metadata, that can be any object, and that will be used by your custom code at runtime. Indeed, that x can be multiple values, not just one, I mean: Annotated[str, ValueRange(10, 20), Complexity("high")].

Apart from metadata that applies to the parameters we can have metadata that applies to the function itself or to a class ('cacheable', 'optimized', some sort of privacy mechanism, whatever). Normally we'll use a custom decorator that adds this information as an attribute to the function/class (__cached__, __private__). So all in all we have 2 mechanisms to provide metadata. Annotated for parameters, and decorators for classes/functions themselves.



# to add metadata to the class or function itself, just use specific decorators that add metadata to specific attributes, for example:
def non_critical(func):
    func._non_critical = True
    return func

# metadata for parameters
@dataclass
class ValueRange:
    lo: int
    hi: int

@non_critical
def create_post_2(
    title: Annotated[str, ValueRange(5, 20)], 
    content: Annotated[str, ValueRange(5, 100)],
) -> dict:
    return {"title": title, "content": content}

annotations = annotationlib.get_annotations(create_post_2)
print(f"annotations: {annotations}") 
# annotations: {'title': typing.Annotated[str, ValueRange(lo=5, hi=20)], 'content': typing.Annotated[str, ValueRange(lo=5, hi=100)], 'return': class 'dict'}


In other languages like Kotlin/Java we use annotations for both parameters metadata and function/class metadata. It's important to note that while in Python we can provide any object as metadata (both when using Annotated or when using a decorator and passing any expression as argument), in Kotlin/Java annotations metadata is managed at compile time, so you are limited to compile-time constants. This means that in Python we have an enormous power with what we can provide as metadata.

Kotlin and Java annotations cannot take arbitrary runtime objects as parameters. Their allowed values are strictly limited because annotation arguments must be compile‑time constants and the annotation instances themselves are created by the compiler, not at runtime.

The Annotated class (well, indeed what we have are instances of _AnnotatedAlias) has an __origin__ attribute (that points to the the type-hint) and a __metadata__ attribute for the metadata. However, ff we only want to get the typing information we can directly use the typing.get_type_hints functions.


annotations = annotationlib.get_annotations(create_post_2)

print(annotations["title"].__origin__) # to get the original type hint, which is str in this case
# class 'str'>

metadata = {key: value.__metadata__ 
    for key, value in annotations.items() if hasattr(value, "__metadata__")
}
print(f"metadata: {metadata}")
# metadata: {'title': (ValueRange(lo=5, hi=20),), 'content': (ValueRange(lo=5, hi=100),)}

print(f"type hints: {get_type_hints(create_post_2)}")
# type hints: {'title': class 'str', 'content': class 'str', 'return': class 'dict'}

Thursday, 12 March 2026

La Mort de Quentin Deranque, un meurtre raciste / a racist murder

On February 12th, 2026, Quentin Deranque, a 23-year-old French man, was beaten to death (receiving multiple kicks to the head while he lay on the floor) by a far-left, pro-Islam, anti-French militia, a terrorist organization called "la Jeune Garde." Why? Because he was a French patriot, because he loved his country and culture, and because he intended to defend a few French girls belonging to Nemesis, a female organization that tries to raise awareness about the dangers that mass immigration from Muslim countries represents for women's rights and safety.

Of course, most mainstream media (which in France range from far-left to left, except for the excellent CNews) hurried to talk about a "fight" rather than a lynching. When the video of the lynching was made public, they tried to minimize it by claiming he was a far-right militant and an ultra-conservative Catholic. When that failed to silence the scandal, they escalated their lies, calling him a fascist, a racist, and a xenophobe. The far-left political movements that have instigated this violence for decades even dared to call him a "Nazi."

No, he was not any of those things; as I said, he was just a French patriot, a French nationalist. One could also say he was a conservative Catholic. It seems he had turned to Catholicism because of the link he established between French identity and the Catholicism. As someone who, even to this day, is not religious, I can clearly see and embrace that link, and I feel deeply grateful for having grown up in a place where Catholicism forms the basis of our moral system (regardless of whether most people consider themselves religious or not) rather than having grown up in a Muslim society.

Quentin was the child of a French father and a Peruvian mother, and he had mixed European and Amerindian features. Unless he was an illiterate idiot (he was a math student who loved philosophy and reading, so that does not seem to be the case), it is obvious that he could not be a racist and that his French nationalism was not based on a "legacy of blood" but on a "legacy of culture."

The fact that Quentin had partial extra-European origins leads to very interesting reflections. We saw his friends on TV; they were devastated by his assassination, yet they paid him tribute with enormous dignity and emotion. Many of these friends were clearly French nationalists, and for them, Quentin, with his 50% Peruvian ancestry, was just another French comrade. This is interesting for those who try to scare us with lies about French nationalism being inherently racist and xenophobic.

The even more interesting reflection is that I firmly believe Quentin’s death was a racist crime. His assassins, the ten far-left scumbags, all of them "white" and mostly coming from white, French (anti-France) bourgeois families, who beat him to death most likely focused on him because of his extra-European features. There were two other nationalist guys lying on the floor being kicked, but not with such cruelty. Am I saying that these far-left terrorists, who are supposed to fight against fascism and racism, killed him because he was not 100% white? YES, that is exactly what I am saying.

The far-left movement in Europe, and particularly in France, has become a cult of racial obsession. They have fully traded traditional class struggle for the radical, segregationist dogmas of the decolonial and indigenist movements. For these oikophobes —people who despise their own civilization— anyone of non-European descent is viewed strictly as a perpetual victim of a 'white system.' In their eyes, such a person is 'required' to hate their 'oppressors' and reject every facet of French culture. This means that for these lunatics, someone like Quentin, who chose to assimilate and embrace his French heritage, is seen as the ultimate 'traitor' to their narrative. To these self-loathing bourgeois radicals, Quentin should have been a grievance-filled victim, weaponizing his skin tone against the state. Instead, he chose the dignity of belonging to a national community, a history, and a culture. He was a French nationalist by choice and by love, proving their 'systemic' lies wrong. It was that clarity of spirit, that refusal to be a pawn in their racial war, that the far-left found truly intolerable.

Finally, I'll put here a list with the names and information (just taken from some other blogs) about the assasins. First three of the pieces of shit that directly kicked Quentin's head to his death:

Trois militants antifas lyonnais ont été formellement identifiés lors du lynchage du jeune Quentin.
1- Jacques-Élie Favrot (surnom “Jef”).
Assistant parlementaire de Raphaël Arnault, M2 à Sciences Po Saint-Étienne.
Militant à la Jeune Garde Lyon ainsi qu’à OSE CGT (syndicat étudiant de la CGT à Saint-Étienne).
2- Adrian Besseyre
Militant très actif de la Jeune Garde Lyon, né en 2001, il a également effectué un stage à l’Assemblée Nationale pour Raphaël Arnault.
3- Lelio Le Besson
Membre du service d’ordre de la Jeune Garde Lyon, et désormais militant actif de « Génération antifasciste », le mouvement qui a succédé à la JG.
Il a fait ses études à l’IG2E en Gestion des Risques et Traitement des pollutions

Then we have the scumbag that founded the far-left terror group, Raphaël Arnault. In this country in decay called France, a criminal identified as Fiche S (someone considered a serious threat to National Security), already condemned for a violent arbitrary aggression, can get a seat in the National Assembly. This ultra-violent illiterate piece of shit should be considered as one of the "intellectual" authors of this crime.

And then we have LFI, the anti-French, Pro-Islam political sect that has funded and empowered this terrorist group and has been sowing hatred in the country for years. Particularly, the leader of the sect, Melenchon, and its main and most violent and ignorant subordinates: Thomas Portes, Bompart, Rima Hassan, Mathilde Panot and Bilongo. They have minimized and even justified the murder, vomited lie after lie about Quentin (plainly calling him a "nazi") and even made fun of his execution. Hope one day all these traitors and scumbags will rot in hell.

Repose en Paix, Quentin. Les hommes sont morts, mais la dignité est éternelle.

Sunday, 22 February 2026

Python Class-Level Type Hints

Notice that in this post I'm talking about "standard" Python classes, not about dataclasses. I recently became aware of the possibility of using class-level type hints in your classes. The thing is that when reading the documentation I found it rather confusing. To make sense of it we have to be pretty aware of the difference between the intent that we express with those class hints and its runtime effects. So we have this example in the documentation:


class BasicStarship:
    captain: str = 'Picard'               # instance variable with default
    damage: int                           # instance variable without default
    stats: ClassVar[dict[str, int]] = {}  # class variable

The 'damage: int' part is the one that I knew about "class-level typehints" and was clear to me. We declare an attribute and its type, but we don't initialize it. Python takes this just as typing information, it has no runtime impact (other than being added to that class __annotations__), we are not creating an attribute in the class object.

The 'captain: str = 'Picard'' is what I could not understand. For me it's like the normal way of adding a class attribute, only that additionally you indicate the type, so how can it be that the doc says that it's an "instance variable with default". Well, it's the type-checking meaning vs the runtime effect. I am right that we get an attribute created at the class level (in the class __dict__), just see:


>>> class User:
...     continent: str = "Europe"
...     active = True
...

>>> User.__dict__
mappingproxy({'__module__': '__main__', '__firstlineno__': 1, '__annotations__': {'continent': }, 'continent': 'Europe', 'active': True, '__static_attributes__': (), '__dict__': , '__weakref__': , '__doc__': None})

>>> User.continent
'Europe'

>>> User.active
True

But for the type checker what that typed declaration means is that instances of that class will have a captain (or continent in my example) attribute. This could feel contradictory, but given how attribute look up works it's perfectly fine. Initially the 'captain' attribute is created at the class level. If we read it through an instance (my_ship.captain) the look up mechanism won't find it in the instance, but in the class, and return it. Then, when we write to it through an instance (not through the class) the writing will be done in the instance, so a 'captain' attribute will be added to the instance. That's fine, indeed, it's very nice, while the attribute is not being written to, just read, it's being shared between instances, kept in the class (and saving memory), then, as soon as you write to it, it's shadowed by the instance.


s = BasicStarship()
print(s.captain)       # "Picard" via class lookup
s.captain = "Xuan"     # creates an instance attribute
print(s.__dict__)      # {'captain': 'Xuan'}
print(BasicStarship.__dict__['captain'])  # 'Picard'

We can sumarize it like this:

Type hints alone do not create attributes; they only declare intent.
If you want the attribute to exist on the class (and thus be visible via Foo.x), you must assign a default value.

By the way, this is not the first time I see this behaviour of reading values from a "parent object" until we write the value to the object itself, shadowing it. This is just how things work in JavaScript with the [[Prototype]] chain.

I'm not much of a fan of defining instance attributes at the class level. It's true that it makes very explicit that an attribute is part of the public contract of the class, but I think most of the time it's a bit boilerplate. Type-checkers and autocomplete work perfectly fine with the classical style of initializing in the __init__ method, and if an attribute is internal/private and should not be considered part of the public API we should just follow the convention of starting it with '_'. So normally I would write the above code like this:



class AdvancedStarship:
    # stats = {} mypy will complain about this, because it is not a ClassVar
    stats: ClassVar[dict[str, int]] = {}  # class variable
    
    def __init__(self, damage: int, captain: str = 'Picard') -> None:
        self.captain = captain
        self.damage = damage



The case where these class-level type hints feel very useful to me is for Protocols, making unnecessary to declare the "data part" of the protocol with properties (get/set descriptors), that is the approach I used to follow so far.



from typing import Protocol

class Foo(Protocol):
    x: int  # part of the interface

class Bar:
    def __init__(self):
        self.x = 42  # matches Foo


It's also useful if we have attributes that won't be set in __init__, but in some later method call. This way we make them part of the class contract and initialize them to a default value (probably None), shared by all instances via the class attribute (as we saw with BasicStarship.captain), and then get it added to each instance when it gets set to a specific value.

Sunday, 15 February 2026

Logical Assignment Operator and More

I've recently come across the Logical OR Assignment (||=), and the Nullish Coalescing Assignment (??=) operators in JavaScript. They are not a revolution, just a shortcut for the usage of the OR (||) operator and the nullish coalescing operator in assignment situations. We use "||=" for falsy values and "??=" for nullish (null, undefined) values. Let's see:


// for "falsy" values
> let name = "";
> name ||= "default";
'default'
> name ||= "default2";
'default'

// is equivalent to:
> name = ""
> name = name || "default";
'default'
> name = name || "default2";
'default'

// for strict null or undefined values:
> let name = null; // or name = undefined
> name ??= "default";
'default'
> name ??= "default2";
'default'

// is equivalent to:
> name = null;
> name = name ?? "default";
'default'
> name = name ?? "default2";
'default'


Python does not have a 'None coalescing' operator (so obviously it does not have a 'None coalescing assignment' operator) so as equivalent we have to use an if-else expression. We have the 'or' operator (that we can use with falsy values), but not an "or assignment" operator. So the equivalent code to the above JavaScript is quite more verbose:


# for "falsy" values
> name = ""
> name = name or "default"
'default'
> name = name || "default2"
'default'

# for strict None values:
> name = null
> name = name if name is not None else "default"
'default'
> name = name if name is not None else "default2"
'default'

As the if-else pattern is quite verbose, we can write a simple coalesce function (I've just remembered that such function is almost standard SQL) to make code more straightforward.


def coalesce(value, default_value):
    return value if value is not None else default_value

a = coalesce(a, "default value")

As for other languages, Kotlin has the || operator and the :? null coalescing operator, but not a shortcut form to use during assignment. Ruby has a logical or assignment operator that we can use with nil and false (the only falsy values in Ruby). It feels strange that Ruby does not have a null coalescing operator, so if we want to be strict and deal only with null (nil), we have to use the so rich Ruby syntax differently:


# for null coalescing assignment
# like JavaScript: a = a ?? "default" 
# or Kotlin: a = a ?: "default"

a = "default" if a.nil?
# or
a = a.nil? ? "default" : a


Reached this point I think it'll be good to remember what are considered falsy values (those that, when evaluated in a boolean context, are considered as false) in different languages:

  • JavaScript: false, null, undefined, 0, ""
  • Python: False, None, 0, "", [], {}, set()
  • Rubynil, false
  • Kotlinfalse. Kotlin does NOT perform truthy/falsy coercion, it's fully, strictly typed:trying to use a non boolean value in a condition causes a compilation error.

As you can see the main (and very important) difference between JavaScript and Python is that in Python empty containers are falsy.

Saturday, 7 February 2026

Python Attribute Lookup and Dunders

I already talked in the past about Python descriptors [1] and [2] (referencing also the complex attribute lookup process). Somehow I've recently realised of how some commonly used attributes are managed with descriptors present in classes or metaclasses. First, I'll paste here the conclusions after an interesting chat with a GPT regarding the attibute lookup process:

1) Instance attribute lookup (obj.attr)

This is (conceptually) what object.__getattribute__(obj, name) does:

a) Check for a data descriptor on the class or its MRO
Search type(obj).__mro__ for name in each class’s __dict__.
If found and it’s a data descriptor (has __set__ or __delete__), return descriptor.__get__(obj, type(obj)).

b) Check the instance’s own dictionary
If obj.__dict__ exists and contains name, return obj.__dict__[name].
Note: If the class defines __slots__ without __dict__, this step may not exist.

c) Check for a non-data descriptor or other attribute on the class/MRO
Search type(obj).__mro__ for name.
If found and it’s a non-data descriptor (has __get__ only), return descriptor.__get__(obj, type(obj)).
Otherwise, return the found value as-is.

d) Fallback: __getattr__
If nothing above produced a value, and type(obj) defines __getattr__(self, name), call it and return its result.

e) Otherwise
Raise AttributeError.

2) class attribute lookup (C.attr)

Conceptually, type.__getattribute__(C, name) does this:

a) Metaclass MRO — data descriptors first
Search type(C).__mro__. If name is found and it’s a data descriptor (__set__ or __delete__ present), return descriptor.__get__(None, C).

b) Class MRO (C and its bases) — regular attributes & descriptors
Search C.__mro__ (starting with C, then bases):
If found and it’s a descriptor (__get__), return descriptor.__get__(None, C) (note obj=None).
Otherwise, return the raw value.

c) Metaclass MRO — non-data descriptors and other attributes
If found on the metaclass MRO and it’s a descriptor, return descriptor.__get__(C, type(C)) (here, the “instance” is the class C) Otherwise return the value.

e) Fallback
If not found and the metaclass defines __getattr__(cls, name), call it.
Else raise AttributeError.

Let's see now some examples of attributes that are indeed descriptors:

__name__ of a class (Person.__name__). One could think that it's just an attribute directly in the class object, but if it were that way, I could acces it via an instance of the class (person1.__name__) that is not the case. So indeed __name__ is a descriptor in the metaclass (and exactly the same for __bases__ or __doc__):


>>> class Person:
...     pass
...     
>>> Person().__name__
Traceback (most recent call last):
    Person().__name__
AttributeError: 'Person' object has no attribute '__name__'

>>> Person.__name__
'Person'

>>> Person.__dict__["__name__"]
Traceback (most recent call last):
    Person.__dict__["__name__"]
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: '__name__'

>>> type(Person).__dict__["__name__"]
attribute '__name__' of 'type' objects
>>> type(type(Person).__dict__["__name__"])
class 'getset_descriptor'

>>> type(Person).__dict__["__bases__"]
attribute '__bases__' of 'type' objects
>>> type(type(Person).__dict__["__bases__"])
class 'getset_descriptor'>

>>> type(type(Person).__dict__["__doc__"])
class 'getset_descriptor'


__class__ of an instance or __class__ of a class. This one does not seem be based on descriptors, but (my discussion with a GPT is a bit confusing) it seem like it's managed specially by the look up algorithm.


>>> p1 = Person()
>>> p1.__class__
class '__main__.Person'

>>> type.__class__
class 'type'

>>> type(p1.__dict__["__class__"])
Traceback (most recent call last):
    type(p1.__dict__["__class__"])
         ~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: '__class__'

>>> type(Person.__dict__["__class__"])
Traceback (most recent call last):
    type(Person.__dict__["__class__"])
         ~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: '__class__'


Dunder attributes. It's interesting to note that there are 2 categories of __dunder__ attributes (those that start and end by "__").
- On one hand we have those like the ones we've just seen, these are Special Attributes (Metadata), that are used to store metadata: __name__, __class__, __bases__, __mro__, __dict__, __module__, __doce__, __annotations__.
- And on the other hand we have Special Methods (Behavioral Hooks), that are used to implement Python's syntactic sugar:

__call__: ob(), Invokation
__getitem__: ob[key]
__setitem__: ob[key] = value 
__getattr__: Fallback for missing attributes
__getattribute__: Intercepts all attribute access
__iter__, __next__: Iteration
__str__, __repr__: String representation
__eq__, __lt__, etc: Comparisons
__enter__, __exit__: Context managers
__add__, __mul__, etc: Arithmetic operations

Notice that if you access a Behavioral Hook "on your own" (I mean, you explicitly do: obj.__call__() or obj.__iter__()) the normal look up mechanism applies (using the object and its class). However, when used in the intended way (when you do obj(), or iter(obj)) the look up is done only in the class of the object (and if an object is a class it's done in its metaclass) not in the object itself.