Sunday, 17 May 2026

Type Hints Notes 2026

Type hints are more and more prevalent in recent Python code. I'm still not too severe about them, but my level of strictness continues to grow over time. I've lately learnt a couple of things:

Variadic Parameters. When typing functions that have packed (variadic) parameters in its signature (*args, **kwargs), we put the type of the individual parameter, we don't have to type it as a collection or dictionary (save if each parameter is really a collection), I mean, for a pipe function we should do:


# this is RIGHT
def pipe(val: Any, *fns: Callable) -> Any:

# this is WRONG
def pipe(val: Any, *fns: list[Callable]) -> Any:

Tuples. In Python we use tuples for "groups" of a fixed number of elements, a pair, a trio... We express it in the signature like this: tuple[str, str] or tuple[int, str, str]... But how to express that a function returns (or receives) a "group" of an unknown number of elements? We also use tuples, combined with ellipsis (...), like this:


tuple[int, ...]         # any number of ints: (), (1,), (1, 2, 99), ...
tuple[int | str, ...]   # any number of elements, where each element can be an int or a str (), ("a", 1), ("a", "b", "c"), (1, 2, 1) ...
  
tuple[int, str, bool]   # exactly 3 elements: an int, a str, a bool
  

An important detail that I've learnt thanks to a typing issue. We know that in a Python try-except block, the except clause can manage multiple exception types, I mean: except RuntimeError, TypeError, NameError:. Those multiple exceptions are a tuple, not just any iterable. Let's see an example (the last line is what is WRONG):


# multiple exceptions
def multiple_exceptions(exceptions: tuple[type[Exception], ...]) -> None:
    try:
        raise ValueError("This is a ValueError")
    except exceptions as e:
        print(f"Caught an exception: {e}")

multiple_exceptions((ValueError, TypeError))  
# Caught an exception: ValueError

# important, this is WRONG, we have to pass a tuple, not just any collection
multiple_exceptions([ValueError, TypeError])
# TypeError: catching classes that do not inherit from BaseException is not allowed


Indeed, an equivalent function with a variadic signature feels more natural and idiomatic than the above (and furthermore prevents the confusion of passing over any collection rather than exactly a tuple):


# this variadic signature feels more natural
def multiple_exceptions2(*exceptions: type[Exception]) -> None:
    if not exceptions:
        raise ValueError("pass at least one exception type")
    try:
        raise ValueError("This is a ValueError")
    except exceptions as e:
        print(f"Caught an exception: {e}")

multiple_exceptions2(ValueError, TypeError) 


And notice also how I've added a guard against the empty-call case (except () is invalid at runtime).

Sunday, 10 May 2026

Python partial and placeholders

When Python 3.14 was released I had already read about some of its main features (those that involve a PEP and that have been discussed in the Python discussion forums), like Lazy Annotations and Template Strings. When reading in depth recently the release notes I came across a small feature added to functools.partial (and partialmethod) that I find particularly useful:

functools:
Add the Placeholder sentinel. This may be used with the partial() or partialmethod() functions to reserve a place for positional arguments in the returned partial object. (Contributed by Dominykas Grigonis in gh-119127.)

Just a reminder of what partial function application is (don't confuse it with the related concept of curried functions):

In computer science, partial application (or partial function application) refers to the process of fixing a number of arguments of a function, producing another function of smaller arity.

Indeed I already talked about functools.partial some time ago

The "basic" approach to partial function application is that we can just fix (pre-fill) arguments from left to right. This is what we have also in JavaScript with function.prototype.bind (that binds as first argument the "this" value). As Python supports named arguments, functools.partial already supported fixing named arguments.


def format_geo_info(country, region, city, population):
    return f"{city}, {region.upper()} ({country}) - {population}"
    
bound_format = functools.partial(format_geo_info, "France")
print(bound_format("Occitanie", "Toulouse", 500_000))
# Toulouse, OCCITANIE (France) - 500000
print(bound_format("Occitanie", city="Toulouse", population=500_000))
# Toulouse, OCCITANIE (France) - 500000


What was not possible until this version was fixing some intermediate non-named argument, but this is possible since version 3.14 thanks to the Placehodler sentinel value:



format_french_city_with_unknown_population = partial(format_geo_info, "France", Placeholder, Placeholder, 0)
print(format_french_city_with_unknown_population("Ile de France", "Saint Denis"))
# Saint Denis, ILE DE FRANCE (France) - 0

Not a revolutionary feature, but one that I've missed occasionally. A trivial implementation could be something like this:


# supports positional and keyword arguments, but not placeholders
def my_basic_partial(func, *args, **kwargs):
    return lambda *fargs, **fkwargs: func(*args, *fargs, **(kwargs | fkwargs))
    
# add support for placeholders in the arguments
PLACEHOLDER = object()
def my_complete_partial(func, *args, **kwargs):
    def new_func(*fargs, **fkwargs):
        merged_args = []
        fargs_iter = iter(fargs)
        for arg in args:
            if arg is PLACEHOLDER:
                merged_args.append(next(fargs_iter))
            else:
                merged_args.append(arg)
        merged_args.extend(fargs_iter)
        return func(*merged_args, **(kwargs | fkwargs))
    return new_func

format_french_city_with_unknown_population = my_complete_partial(format_geo_info, "France", PLACEHOLDER, PLACEHOLDER, 0)
print(format_french_city_with_unknown_population("Ile de France", "Saint Denis"))
# Saint Denis, ILE DE FRANCE (France) - 0

format_2 = my_complete_partial(format_geo_info, "France", city="Toulouse")
print(format_2("Occitanie", population=500_000))
# Toulouse, OCCITANIE (France) - 500000


In my aforementioned previous post about partial in Python I gave some reasons for using partial over directly trapping the variables with a closure (of course internally partial has to use either closures or a callable class). I've just realised that I was missing the main reason, partial is more semantic.

- Intent-Revealing Code: partial(func, arg) explicitly states your intent to partially apply arguments, improving readability and self-documentation. - Declarative Style: It focuses on the result (a new specialized function) rather than the imperative mechanics of capturing lexical scope.

Lodash, the excellent JavaScript library, also features placeholders in its implemention of partial.

Sunday, 3 May 2026

Glibc

Compiling to native code, and furthermore for a Linux system... wow, sounds scary, and very, very far away from what I've been doing in the last decade(s). Well, the thing is that my employer decided some months ago that we had to compile to native code some of our Python applications. It's not something performance related, it's for preventing access to the source code of these applications. We were looking into cython, but we settled on Nuitka, an amazing piece of software that has been serving us so well.

Normally almost every native application compiled for a Linux system has been dynamically linked against glibc. OK, and, what's glibc?

The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages).

So when a Linux native application (using glibc) starts, the dynamic linker (libdl.so) will dynamically load the shared objects (SO, .so files, the equivalent to windows DLL's) needed by the application (like glibc.so) and link the callsites to the functions imported from those libraries.

Obviously glibc evolves over time, so, what about versions? First, what glibc version is installed on my system? You can check the SO's loaded by a running process by doing: lsof -p PID | grep .so. Normally you'll see that it's using: libc.so.6 (in Ubuntu it located here: /usr/lib/x86_64-linux-gnu/libc.so.6). That 6 is not the version number (libc.so.6 is the name for the library since 1997!), the version number is something like 2.XX (2.39 in my ubuntu 24.04). You find it by using: ldd --version

So, what happens if I compile my application in a system with one version of glibc and try to run it in a system with a different version? Well, the situation is quite more fine-grained that I thought. Version numbers are not checked at the glibc level, but at the function level. This is so because glibc uses symbolic versioning

The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. This allows for extreme backward compatibility without breaking the system every time a single function is updated.
    
    The primary goal of symbol versioning is backward compatibility. It allows a single library file to provide multiple versions of the same function so that:

    Old binaries compiled against v2.10 continue to use the v2.10 implementation.

    New binaries compiled against v2.11 use the new v2.11 implementation.

So multiple versions of the same function live inside glibc, and your binary will dynamically link against the one it was compiled for. And, when does the version number of a function change? Normally it only changes if the function interface (the contract, the ABI) changes, but not if its only its internal implementation that changes. So if we compare symbolic versioning to semantic versioning (SemVer, a more familiar versioning schema), we could say that in symbolic versioning a version change corresponds to a Major version in semantic versioning.

You are exactly right: A new Symbol Version is functionally equivalent to a Major Version bump for that specific function. It signals to the linker that the "Contract" for that specific symbol has changed, and old programs should look for the previous contract elsewhere in the same file.

Notice how symbolic versioning is used for functions inside a library, while semantic versioning (when used) is normally used for libraries.

The glibc version (that one obtained with ldd --version) has no importance in terms of loading the library in memory (the dynamic linker will load libc.so.6 regardless of its "internal" version), the important part is the specific version of each function that we try to link.

I guess when you program in C you are aware of the version of each function that you are using, as you have to adapt your code to the ABI of the function if it has changed, but when that happens behind the scenes, that's quite different. In our case, we just write Python code, and the beautiful Nuitka takes care of transforming it to C and then compiling it to native. So it's Nuitka who takes care of writing the C code in accordance to the function versions inside the glibc in the system. So if then you run that binary in a system with an older glibc version it could happen that your binary is "pointing" to a function with a symbolic version (let's say openEncryptedFile@GLIBC_2.12) higher than the one in the older glibc (let's say openEncryptedFile@GLIBC_2.10) present in the current system, and your application will crash. Basically this means that you have to compile your Python application in a system with a glibc version <= that the glibc version in the target system. It feels odd at first, as the starting point is just the same Python code, and if in one system it can just use openEncryptedFile@GLIBC_2.10 why doesn't it compile it always with that 2.10 even if a bigger version (openEncryptedFile@GLIBC_2.12) is present? Well, that's how things work by default, when compiling, code will be linked to the highest version of that function present in the glibc in the compilation machine.

If you wonder if other .so libraries (SO, ELF libraries) also use symbolic versioning, it depends. For smaller, simpler libraries what is usually used is the SONAME approach, the library (.so file) name changes with each version (this is a coarse grained approach).

Symbolic versioning is the technically superior approach, but it is not the universal standard for all ELF libraries. It depends entirely on the library maintainers and their commitment to long-term ABI stability.

In the Linux ecosystem, there are two primary ways to manage library changes:  
1. The "SONAME" Approach (Common)

Most smaller or simpler libraries use the SONAME mechanism. 
You’ve likely seen files like libfoo.so.1 and libfoo.so.2.

    The Logic: If the developers change the interface, they increment the "Major" version number in the filename itself.  

    The Result: Programs linked against libfoo.so.1 will refuse to start 
    if only libfoo.so.2 is present. This is a "heavy-handed" fix because 
    it requires recompiling every program that uses the library even if 
    the specific function they use didn't actually change.

2. The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), 
    but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. 
    This allows for extreme backward compatibility without breaking the system every time a single function is updated.

To complete this post, I'll add some useful, related commands:

  • To check the SO's used by a given program.
    For a binary on disk: ldd /usr/bin/program_name
    For a running process: lsof -p [PID] | grep '\.so'
  • To view the symbols used by a program (the specific functions imported from SO's)
    All imported symbols: nm -Du
    Symbols + Versions: objdump -T | grep '*UND*'
    Only glibc symbols: objdump -T | grep 'GLIBC_'
    Library Version Map: readelf -V
  • To view the symbols/functions exported by glibc in your system: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6

Some additional findings related to the last command. For example I want to see the versions of pthread_spin_init present in my glibc: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6 | grep pthread_spin_init

That gives me:

0000000000a4130 g DF .text 000000000000000d GLIBC_2.34 pthread_spin_init 00000000000a4130 g DF .text 000000000000000d (GLIBC_2.2.5) pthread_spin_ini

Which is very interestinng as it shows us that a symbol version is not a sequential counter for that specific function. Instead, it is a timestamp or a marker of the glibc release that defined that specific version of the function's ABI. From a GPT:

How glibc handles ABI changes with symbol versioning

Original version: Suppose foo() was introduced in GLIBC_2.2.5. That version is tagged as foo@GLIBC_2.2.5.

ABI change in glibc 2.32: If glibc developers change the ABI of foo() in version 2.32 (e.g., change its behavior, arguments, or return type in a way that breaks compatibility), they will:

Keep the old implementation as foo@GLIBC_2.2.5.
Add a new implementation as foo@GLIBC_2.32.

At runtime:

A binary linked against glibc 2.2.5 will request foo@GLIBC_2.2.5, and the dynamic linker will resolve it to the old implementation.
A binary linked against glibc 2.32 will request foo@GLIBC_2.32, and get the new implementation.

This mechanism ensures backward compatibility while allowing glibc to evolve.

Friday, 17 April 2026

Python Sentinel Values

Every now and then we need a flag or sentinel value. A unique value that we can distinguish from the normal values that we are processing and that has a particular meaning. a special, unique value used in programming to signal the end of data processing, a loop, or an operation. For example in my recent post about adding null-safety to a pipe function I was using 2 sentinels/flags: NULL_SAFE and COALESCE.

The essential function a sentinel value has to accomplish is to have a unique identity, so that comparing it by identity (Python: is, JavaScript: ===) with any other value/object in our system has to return false. So the most simple approach is just using a new object for each of our sentinels.



NO_INVEST = object()  # Sentinel value

def invest(amount: int | None | object) -> str:
	if amount is NO_INVEST:
		return "No investment"
	else:
		amount = amount or 0
		return f"we've invested {amount}"

print(invest(NO_INVEST))  # Output: No investment
print(NO_INVEST)  # Output: object object at 0x...


That simple approach works fine, but it's missing a few things. Printing the value is messy (we get an "object object at 0x..." representation, it would be nice to get NO_INVEST) and its rather typing unfriendly. Saying that invest can receive an object, apart from int or None lacks any meaning. What kind of object is that?. A sentinel value should (mainly) have these features:

  • A unique identity (is comparison)
  • A meaningful repr (debuggability)
  • Clear typing (especially for static type checkers)

Our basic sentinel only provides the first one (identity). That's why some smart guy came up with a very interesting PEP 661 proposing a new Sentinel class. Unfortunately that PEP is in deferred status. The document also presents different techniques commonly used for Sentinel values, like the simple object() that I've just shown, using an enum or using a class. Using a class is the best approach to me, I'll show several iterations until getting what I think is the best we can get so far.

Approach 1. Classes are objects in Python. We can use a class for each sentinel object, and in order to get a nice representation we can give it a metaclass with a custom __repr__. The missing piece is having some typing friendliness, we're still stuck with the Any signature. Additionally, declaring a class for something that is not intended to work as an object factory, but to be used as an object in itself is rather unnatural.



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
        
    class Sentinel(metaclass=SentinelMeta):
        pass
        
    class NO_INVEST(Sentinel): pass


    #def invest(value: int | None | Type[NO_INVEST]) -> str: # this signature feels a bit strange, but it works
    #def invest(value: int | None |Literal[NO_INVEST]) -> str: # this one feels better, but only works with mypy, not with pylance
    def invest(amount: int | None | Any) -> str:
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"

    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel in match statement
    print(NO_INVEST)  # Output: NO_INVEST

Approach 2. We can make the usage quite more natural by hiding the class creation behind a function. That also allows us to skip the Sentinel base class.



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__

    def sentinel(name: str):
        return SentinelMeta(name, (), {})
    
    NO_INVEST = sentinel("NO_INVEST")


    #def invest(value: int | Type[NO_INVEST]) -> str: # pylance doesn't like it, using a variable as type
    #def invest(value: int | Sentinel) -> str: # we don't have a Sentinel class... so forget it
    def invest(amount: int | None | Any) -> str:
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"
        
    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel
    print(NO_INVEST)  # Output: NO_INVEST


This one feels quite natural to use, but we still have the problem with typing. We can leverage the Generic types/class subscripting that we saw in my previous post for getting something like this (Approach 3)



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
    
    class Sentinel(metaclass=SentinelMeta):      
        def __class_getitem__(cls, item):
                return cls
    
    def sentinel(name: str):
        return SentinelMeta(name, (Sentinel,), {})
    
    NO_INVEST = sentinel("NO_INVEST")


    #def invest(value: int | Sentinel) -> str:
    def invest(amount: int | None | Sentinel[NO_INVEST]) -> str: # at a typing level it's equivalent to the above, but it's provides extra meaining      
        if amount is NO_INVEST:
            return "No investment"
        else:
            amount = amount or 0
            return f"we've invested {amount}"

    print(invest(NO_INVEST))  # Output: Using NO_INVEST sentinel      
    print(NO_INVEST)  # Output: NO_INVEST
	

Hey, this one feels rather good to me. The Sentinel[NO_INVEST] looks pretty nice in that signature. The type checking is mainly off, cause our sentinel() function is not typed, so it's considered as returning Any, and when we pass Any to a function it disables type checking. This means that for the type system any sentinel that we create with sentinel() is just an Any, so the type checker will allow passing any sentinel to a function that expects a specific sentinel. It's not a problem for me, what I'm mainly interested in is the semantics, the clarity, that this type annotation provides to the function signature.

Given that we are using Sentinel classes as objects, not as object factories, we can make these classes to be more object like, by preventing instantiation. Additionally, Sentinels do not have attributes and are not intended to be expanded with attributes, so we can prevent them from getting attributes added dynamically. All in all we get:



    class SentinelMeta(type):
        def __repr__(cls):
            return cls.__name__
        
        # prevent sentinel classes from being instantiated
        def __call__(cls, *args, **kwargs):
            raise TypeError(f"{cls.__name__} is a sentinel and cannot be instantiated") 
        
        # prevent sentinel classes from being modified
        def __setattr__(cls, name, value):
            raise AttributeError(f"Cannot modify sentinel {cls.__name__}")


    class Sentinel(metaclass=SentinelMeta):      
        def __class_getitem__(cls, item):
                return cls

    
    def sentinel(name: str):
        return SentinelMeta(name, (Sentinel,), {})

    
    NO_INVEST = sentinel("NO_INVEST")


There's something missing in this implementation, support for our sentinels to be pickled (particularly important if we plan to use them in multiprocessing scenarios). That complicates the design and I've deliberately left it aside for the moment. Maybe we'll see it in another post.

Thursday, 9 April 2026

Python Type Checkers and Type Expressions

In this previous post I explained how Python allows the usage of any object, not just type objects, for its annotations "Annotations can be any valid Python expression". Annotations are used to provide metadata, and while normally that metadata is just typing information, we can provide any sort of metadata for custom use at runtime (and as we saw we have the Annotated mechanism to combine both typing and custom metadata. While doing some tests at that time I noticed that VS Code (the Pylance extension) would warn (with: "Call expression not allowed in type expression" or "Variable not allowed in type expression") against using annotations like this (that as I've said is perfectly valid):


# metadata for parameters
@dataclass
class ValueRange:
    lo: int
    hi: int

# pylance warning: Call expression not allowed in type expression
def create_post_1(
    title: ValueRange(5, 20), 
    content: ValueRange(5, 100),
) -> dict:
    return {"title": title, "content": content}


val_range = ValueRange(1, 10)
# pylance warning: Variable not allowed in type expression
def fn2(a: val_range) -> None:
    pass
    
# but it's stored OK in the function annotations 
print(annotationlib.get_annotations(fn2))
# {'a': ValueRange(lo=1, hi=10), 'return': None}


So if this works fine at runtime (you can see that the annotation is stored along with the function) why pylance warns against it? Because we have to differentiate what is valid for runtime use vs what is valid at type checking time. From a GPT:

The runtime accepts arbitrary expressions. Static type checkers do not. This is so because static typing tools parse Python, but they don’t execute it, and they don’t compile it to bytecode either. Type checkers rely on syntactic patterns, not runtime behavior

This is something I had never thought about before. Python type checkers analyze your code without executing anything (they are static). So the types in the type annotations that they are going to analyze have to be expressed in a direct, static form, not as the result of executing an expression (as they are not going to execute that expression). The different Python type checkers (mypy, Pylance, pyre...) parse python source code into an AST (normally different from CPython AST) and analyze it, they do not run code, indeed they do not even compile the code to bytecodes to create code objects. That's why they can only work with type expressions (annotation expressions), that follow a specific syntax, not with any expression. From here: Note that while annotation expressions are the only expressions valid as type annotations in the type system, the Python language itself makes no such restriction: any expression is allowed.

Type checkers operate on expressions that syntactically denote types, which basically is a type name or a generic type. And this has sparked my curiosity about how generic types (MyClass[T]) work. For the type checker it's simple, as it does not execute anything, it just has to parse that particular syntax. But what's the runtime meaning of such generic expression?

Well, it's subscripted access to an object (to a class). When we do my_instance["x"] this searches for a __getitem__ method in my_instance's type. So MyClass["x"] should just search __getitem__ in MyClass's type (that is, its metaclass). That's correct, but given that Python designers have always considered metaclasses like particularly complex and/or exotic, they decided to introduce (via PEP560) a hook to make easier to implement subscripted access to classes. Rather than having to define a metaclass for MyClass, we can directly define a __class_getitem__ method in MyClass.

Sunday, 29 March 2026

Pipe Operator and Null Safety

I've talked a couple of times [1] and [2] about how beautiful it's having a pipe operator in a language, though it's not particular common, and ways to simulate it in Python. Having a pipe operator makes applying functions to a value as convenient as chaining methods. When chaining methods we can leverage (if available) the safe navigation/optional chaining/elvis (?.) operator, to deal with null values. So, I've been thinking about null safety and pipes (not applying a function if the value is null, and coalescing to a default value).

In my previous post I mentioned that JavaScript had 2 different proposals for a pipe operator, but one of them has been discarded. I've been checking if this proposal includes null safety and the answer is not. It was discussed in the early stages, apart from the normal |> operator, having an additional ?|> operator for null safe cases, but it was discarded


// not null-safe, active proposal
user
  |> getProfile(%)
  |> formatProfile(%)
  
// null-safe, has been discarded
value ?|> fn
value |> fn ?? default

It was rejected on the basis that Pipelines should be pure syntax for data flow, not control flow.

To my surprise (I was not aware php continues to be used and evolve) PHP has recently added a pipe operator to the language, and for the moment it also lacks a null-safe version.

For Python decision makers adding a pipe operator seems "making the language too complex for beginners"... (you can't imagine how much I hate that so common kind of "pythonic" reflections...), but as I explained in my previous post we can easily add a pipe function that makes the trick (what has also been requested multiple times is adding such kind of function to functools, but no luck so far). An implementation is so simple as this:



def pipe(val: Any, *fns: Callable[[Any], Any]) -> Any:
    """
    pipes function calls over an initial value
    """
    def _call(val, fn):
        return fn(val)
    return functools.reduce(_call, fns, val)


And we can use it like this:



@dataclass
class Post:
    id: str
    title: str
    author: str

def get_post(post_id: str) -> Post | None:
    # simulate a function that may return None
    if post_id == "1":
        return Post(id="1", title="First post", author="1")
    else:
        return None

def get_address(person_id: str) -> str | None:
    # simulate a function that may return None
    if person_id == "1":
        return "Rue de La Nation, Paris"
    else:
        return None

pipe("1",
    get_post,
    lambda post: get_address(post.author),
	str.upper,
    print,
)	

# RUE DE LA NATION, PARIS


Creating a null aware equivalent is quite simple. The idea I came up with is having pipe accept not just a sequence of callables, but a sequence of callables or flag and callable or flag and value, with the flag indicating the we have to check for null before applying the Callable, or that we have to coalesce it to a value. Let's see the code:



// sentinel values
NULL_SAFE = object()
COALESCE = object()

def pipe(val: Any, *steps: Callable[[Any], Any] | tuple[Any, Callable[[Any], Any] | Any]) -> Any:
    """
    pipes function calls over an initial value, with support for null safety and coalescing:
    """
    def _call(val, step: Callable[[Any], Any] | tuple[Any, Callable[[Any], Any] | Any]) -> Any:
        if callable(step):
            return step(val)
        else:
            option = step[0]
            if option is NULL_SAFE:
                fn = step[1]
                return None if val is None else fn(val)
                
            elif option is COALESCE:
                default_val = step[1]
                return default_val if val is None else val
            else:
                raise ValueError(f"Invalid option: {option}")
    
    return functools.reduce(_call, steps, val)

pipe2("2",
    (NULL_SAFE, get_post),
    (NULL_SAFE, lambda post: get_address(post.author)),
    (COALESCE, "Not found"),
    str.upper,
    print,
)

# NOT FOUND


The function is quite minimal. We should add it proper error handing, throwing meaningful exceptions for each potential incorrect usage. You can just ask a GPT to add it and you'll end up with something like this:


def pipe(val: Any, *steps: Union[Callable[[Any], Any], Tuple[object, Any]]) -> Any:
    """
    Pipe value through callables or option-tuples.
    Steps can be:
      - a callable: called as fn(acc)
      - null_safe(fn): tuple (NULL_SAFE, fn) — only call fn if acc is not None
      - coalesce(default): tuple (COALESCE, default) — replace None with default

    Raises TypeError or ValueError for invalid steps.
    """
    def _call(val: Any, step: Union[Callable[[Any], Any], Tuple[object, Any]]) -> Any:
        if callable(step):
            return step(val)

        if not (isinstance(step, tuple) and len(step) == 2):
            raise TypeError("pipe2 steps must be callables or 2-tuples from null_safe/coalesce")

        option, payload = step
        if option is NULL_SAFE:
            if val is None:
                return None
            if not callable(payload):
                raise TypeError("NULL_SAFE payload must be callable")
            return payload(val)

        if option is COALESCE:
            default = payload
            return default if val is None else val

        raise ValueError(f"Unknown pipe2 option: {option!r}")

    return functools.reduce(_call, steps, val)


Friday, 20 March 2026

Python Annotated

In Python there is this common mantra that type annotations (type hints) do not have any runtime effect. Well, that's mainly true, as those type hints are not used by the runtime to check if your type assumptions/restrictions are correct (as the documentation says: "The Python runtime does not enforce function and variable type annotations. "). But on the other hand, this type information exists at runtime (only that the runtime itself does not use it). Until recently you would use inspect.get_annotations to get that info, a dictionary that gets stored in the __annotations__ attribute (notice that Annotations have been improved in Python 3.14, they're now evaluated lazily, and you should use now the annotationlib.get_annotions). So that information is available at runtime, and your custom code can make use of it for whatever it feels fit.

Additionally, the type hints/annotations syntax allows us to use any object as an Annotation, not just a type. Indeed Python’s grammar does not restrict the content of annotation expressions, as PEP 3107 explicitly states: "Annotations can be any valid Python expression". Put another way: Python does allow arbitrary expressions (hence returning anything) in type annotations. This means that you can use the syntax for defining information (metadata) about these parameters, information that then is used by your custom code for something other that type-checking (for example, stating that a function expects a string following a certain pattern, let's say: create_user(msg: r"^[0-9a-f]{32}$")). That's very nice, but most likely you'll want to combine both the typing information and the extra metadata information. With that in mind, the Annotated class (Annotated[T, x]) was introduced some versions ago. T is a Type, and type-checkers understand the Annotated class and just take care of the T part. The x part is for metadata, that can be any object, and that will be used by your custom code at runtime. Indeed, that x can be multiple values, not just one, I mean: Annotated[str, ValueRange(10, 20), Complexity("high")].

Apart from metadata that applies to the parameters we can have metadata that applies to the function itself or to a class ('cacheable', 'optimized', some sort of privacy mechanism, whatever). Normally we'll use a custom decorator that adds this information as an attribute to the function/class (__cached__, __private__). So all in all we have 2 mechanisms to provide metadata. Annotated for parameters, and decorators for classes/functions themselves.



# to add metadata to the class or function itself, just use specific decorators that add metadata to specific attributes, for example:
def non_critical(func):
    func._non_critical = True
    return func

# metadata for parameters
@dataclass
class ValueRange:
    lo: int
    hi: int

@non_critical
def create_post_2(
    title: Annotated[str, ValueRange(5, 20)], 
    content: Annotated[str, ValueRange(5, 100)],
) -> dict:
    return {"title": title, "content": content}

annotations = annotationlib.get_annotations(create_post_2)
print(f"annotations: {annotations}") 
# annotations: {'title': typing.Annotated[str, ValueRange(lo=5, hi=20)], 'content': typing.Annotated[str, ValueRange(lo=5, hi=100)], 'return': class 'dict'}


In other languages like Kotlin/Java we use annotations for both parameters metadata and function/class metadata. It's important to note that while in Python we can provide any object as metadata (both when using Annotated or when using a decorator and passing any expression as argument), in Kotlin/Java annotations metadata is managed at compile time, so you are limited to compile-time constants. This means that in Python we have an enormous power with what we can provide as metadata.

Kotlin and Java annotations cannot take arbitrary runtime objects as parameters. Their allowed values are strictly limited because annotation arguments must be compile‑time constants and the annotation instances themselves are created by the compiler, not at runtime.

The Annotated class (well, indeed what we have are instances of _AnnotatedAlias) has an __origin__ attribute (that points to the the type-hint) and a __metadata__ attribute for the metadata. However, ff we only want to get the typing information we can directly use the typing.get_type_hints functions.


annotations = annotationlib.get_annotations(create_post_2)

print(annotations["title"].__origin__) # to get the original type hint, which is str in this case
# class 'str'>

metadata = {key: value.__metadata__ 
    for key, value in annotations.items() if hasattr(value, "__metadata__")
}
print(f"metadata: {metadata}")
# metadata: {'title': (ValueRange(lo=5, hi=20),), 'content': (ValueRange(lo=5, hi=100),)}

print(f"type hints: {get_type_hints(create_post_2)}")
# type hints: {'title': class 'str', 'content': class 'str', 'return': class 'dict'}