Monday, 25 August 2025

Python Adaptive Specializing Interpreter

It's clear that I have a fascination with the interaction between interpreters and JIT's, optmizations, deoptimizations and so on. I've written multiple posts about that, with this one being the most recent. I recently came across this interesting thesis where in the introduction it mentions:

For both variants, we can quicken certain operations. Quickening entails replacing one operation with another, which usually handles a subset of possible values or a more restrictive set of preconditions but performs the operation quicker. In tree-walk interpreters, this is performed by node replacement, and in bytecode-based interpreters, by changing the instruction stream, replacing one instruction with another. In both cases, the target of the dispatch changes. How this is implemented in Operation DSL is detailed in section 3.7.

The "replacing one instruction with another" suddenly reminded me of something I had read some months ago regarding Python performance improvements but that I had forgotten to dive into and had indeed forgotten. I'm talking about Adaptive Specializing Interpreter, that is something pretty surprising to me. I've talked in my previous posts about interpreters that find hotspots in your code and send those hot methods to a fast JIT compiler to turn them into native code. Then that native code continues to be monitored and if it's hot enough it's sent to a more aggressive JIT compiler that spends more time in producing a more optimized native code. Then we have cases where the code has to be deoptimized, returning the function to its interpreted form, and the cycle starts again. But the idea of monitoring the code to find specific bytecode instructions (opcodes) that can be replaced by an optimized (specialized/quickened) version of that bytecode instruction is something that was pretty new to me

As of today (Python 3.13) the main Python environment, CPython, does not come with a JIT compiler (an experimental one is due for 3.14 I think). CPython just uses a bytecode interpreter (historical note: The move from the initial tree-walk interpreter to a bytecode interpreter happened between Python 0.9.x and Python 1.0, likely around 1992–1993 during the pre-1.0 development phase). Python 3.11 implemented PEP-659 - Specializing Adaptive Interpreter as part of the Faster CPython project. It introduced some bytecode instructions that are generic (for example the one for adding 2 items, BINARY_OP_ADD (+)) and that if a constant execution pattern is found will be replaced (specialized/quickened) by an specialized version ('BINARY_OP_ADD_FLOAT', 'BINARY_OP_ADD_INT', 'BINARY_OP_ADD_UNICODE'). If that pattern changes the instruction will be replaced by the initial, generic bytecode. This discussion has some interesting information.

You probably know that (as I mention in this post) Python compiles functions to code objects, and then each function object points to a code object (via __code__ attribute). A code object has a co_code attribute pointing to a bytes object containing the bytecodes.

bytes objects are inmutable, so the bytecode specialization has to happen in another structure. I could not find much information about this, so ChatGPT came to the rescue. So yes, there's an additional structure that contains a mutable copy of the bytecodes. It's from that structure that the interpreter reads the bytecodes to execute for that given function, and applies adaptations/specializations/quickening as it sees fit.

  • The co_code itself remains immutable and is not rewritten at runtime. It continues to contain the canonical, "baseline" bytecode sequence as emitted by the compiler.
  • When a code object is executed, CPython creates an internal _PyCodeRuntime structure (not exposed to Python), which contains a mutable copy of the bytecode in a field called co_firstinstr (technically in co_warm → co_warm.instructions).
  • That runtime bytecode buffer is where the interpreter patches in "quickened" instructions (specialized opcodes). For example, a generic BINARY_OP might be replaced at runtime with BINARY_OP_ADD_INT if it sees enough hot integer additions.

That mutable copy of the bytecodes seem to be referenced from the code object via a private _co_code_adaptive attribute (but this is an internal, undocumented detail that can change from version to version). Python allows us to very easily check the bytecodes for a given function by using the standard dis module: dis.dis(my_function). By default dis.dis shows the immutable bytecodes in co_code, but since python3.12 we can use the adaptive=True flag, to see the adapted/quickened instruction. This is pretty amazing, cause we can so easily see how a function bytecodes evolve over time!


import sys, dis

def f(a, b):
    return a + b

print("before warming up")
dis.dis(f, adaptive=True)  # Only in 3.12+
# BINARY_OP                0 (+)

# Warm it up
for _ in range(10_000):
    f(1, 2)

# Disassemble with quickening shown
print("after warming up with ints")
dis.dis(f, adaptive=True)
#BINARY_OP_ADD_INT        0 (+)

# now let's try to break the quickening by passing strings rather than ints
print("first call with strings")
f("a", "b")
dis.dis(f, adaptive=True)
# it's still quickened
#BINARY_OP_ADD_INT        0 (+)

print("second call with strings")
f("c", "b")
dis.dis(f, adaptive=True)
# it's still quickened
#BINARY_OP_ADD_INT        0 (+)

print("Warm it up again, this time with strings")
for _ in range(10_000):
    f("a", "b")

print("after warming up again")
dis.dis(f, adaptive=True)
# BINARY_OP_ADD_UNICODE    0 (+)

So initially the bytecode for an addition of 2 values uses the BINARY_OP opcode (python is a dynamic language where a and b could be of any type, so BINARY_OP is a generic (and slower) instruction for summing up any value). Then we do a good bunch of additions all of them with int values, so that the interpreter decides to specialize the generic addition to a fast BINARY_OP_ADD_INT opcode. After that we do a couple of invokations using strings rather than ints. The specialized opcode checks if the operands are the expected types (here, two ints), as they are not it falls back to the generic implementation of the operation (the slow path), but for the moment it still keeps the specialized opcode. It takes note of these divergences so that if they continue it will revert the specialization. The thing is that in my tests I have not managed to find the number of failed executions that will make the interpreter to revert the specialization, what we can see is that after a good bunch of executions using strings, the int specialization is changed to a string specialization (BINARY_OP_ADD_UNICODE).

In some previous post I mentioned some crazy Python projects that manipulate functions by creating a new code object (an instance of types.CodeType) based on the original one, with a modified version (adding extra instructions, whatever) of its bytecodes, and assigns it to the function. How does this play with the adaptive version of the code? Well, thanks to ChatGPT we learn that the adaptation process starts again:

  • The quickened bytecode (co_code_adaptive) is built lazily, the first time the interpreter executes a CodeType.
  • It is not stored permanently in the CodeType; rather, it is in a per-runtime structure that references the original co_code.
  • If you assign a different code object to a function (func.__code__ = new_code), that’s a new CodeType with its own co_code_adaptive buffer, initially empty.
  • Therefore, execution will start again with baseline opcodes and caches, and the specializing interpreter will re-warm and re-specialize.

Thursday, 7 August 2025

Python Decorators Implementation

There's something in the inner workings of the amazing multimethod library that we saw in my previous post that has had me pretty confused. In that post I outline how the multidispatch decorator works internally, but for the multimethod decorator, things are more complex. From my previous post we know we use it like this:


class Formatter:
    def __init__(self, wrapper: str):
        self.wrapper = wrapper

    @multimethod
    def format(self, item: str, starter: str):
        return f"{starter}{self.wrapper}{item}{self.wrapper}"
    
    @multimethod
    def format(self, item: int, starter: str):
        return f"{starter}{self.wrapper * 2}{item}{self.wrapper * 2}"   

multimethod is a class based decorator. In the first invokation it returns an instance of the multimethod class, that will store the format function and its signature. For each new invokation of the decorator it has to store each of those additional overload functions and signatures in that already existing instance, rather than creating a new instance each time. For doing that, the decorator checks if in the current scope (the class declaration scope) already exists a variable with the name of the function being decorated that already points to an instance of multimethod. For that it inspects the previous frame in the stack, like this (taken from its source code):


    def __new__(cls, func):
        homonym = inspect.currentframe().f_back.f_locals.get(func.__name__)
        if isinstance(homonym, multimethod):
            return homonym
        ...

That's really, really nice code, but it's another thing what I could not grasp. We know that Python decorators are just callables (functions or classes) that are invoked receiving the function (or class) being decorated as parameter. So they are normally explained like this:


@my_deco
def fn(): 
   # whatever

Is (in principle) just equivalent (syntactic sugar) to this:


def fn():
   # whatever
fn = my_deco(fn)

The problem is that it would mean that our example above is indeed translated by the Python compiler into something like this:



    def format(self, item: str, starter: str):
        return f"{starter}{self.wrapper}{item}{self.wrapper}"
    format = multimethod(format)
    # here format is pointing to a multimethod instance, good
    
   
    def format(self, item: int, starter: str):
        return f"{starter}{self.wrapper * 2}{item}{self.wrapper * 2}" 
    # but here format is pointing the the function that we've just defined, so when we apply the decorator again and it does the isinstance(homonym, multimethod) check, it will be False! so this can not work
    format = multimethod(format)

I've explained the problem in the comments. When defining the second format function, the format variable in the current scope is set to that second function, so it's no longer pointing the the multimethod instance previously created, so the isinstance(homonym, multimethod) check will be False! and a new multimethod instance will be created, so the whole thing can not work!

Well, after discussing this with Claude AI we've found an explanation. When applying a decorator to a function, the function definition doesn't immediately overwrite the namespace (setting a variable with the name of the function to point to the function), what is really happening is this:

  1. def format(...) creates a temporary function object
  2. @multimethod is applied to that temporary function object
  3. The decorator returns an object (which could be the existing multimethod)
  4. Only then does format = assignment happen

So indeed a decorator works more like this:


format = my_deco(
	def format():
	    # whatever
)

But as Python lacks statement lambdas, the above code is not valid, and hence the different articles explaining decorators can not use that as the pseudo-code for what the runtime really does, and use an inaccurate approximation. Usually that's not a problem, but for this very specific case that approximation prevented me from envisioning how this particular decorator works.

Friday, 1 August 2025

Visitor vs Multiple Dispatch

The Visitor pattern has always felt a bit odd to me. The thing is that more than as a good practice, it was born to make up for a limitation, the lack of multiple dispatch (aka multimethods) in most programming languages. Being forced to have a visit() method in the classes that are going to be visited feels terribly intrusive to me, it violates the separation of concerns principle. So, when we have a language that supports multiple dispatch (either directly in the language of through some library) we should favor that and forget about Visitors.

That said, Python has no support for multiple dispatch at the language level, but has at least 2 interesting libraries that leverage Python's dynamism and introspection to provide us with the feature. I have only tried multimethod, and it's pretty amazing. I'll show some examples.


from multimethod import multimethod

# multimethod does not support keyword arguments

class Formatter:
    def __init__(self, wrapper: str):
        self.wrapper = wrapper

    @multimethod
    def format(self, item: str, starter: str):
        return f"{starter}{self.wrapper}{item}{self.wrapper}"
    
    @multimethod
    def format(self, item: int, starter: str):
        return f"{starter}{self.wrapper * 2}{item}{self.wrapper * 2}"
    
f1 = Formatter("|")
print(f1.format("aaa", "- "))

print(f1.format(25, "- "))

# I can even register new overloads to an existing one
@Formatter.format.register
def _(self, item: bool):
    return f"{self.wrapper * 3}{item}{self.wrapper * 3}"

print(f1.format(True))

# and I can even overwrite an existing overload!
@Formatter.format.register
def _(self, item: int, starter: str):
    return f"{starter}{self.wrapper * 5}{item}{self.wrapper * 5}"

print(f1.format(25, "- "))

# - |aaa|
# - ||25||
# |||True|||
# - |||||25|||||

# but there's one limitation, keyword arguments do not work
#print(f1.format(item="bbb", starter="- "))
#print(f1.format(starter="- ", item="bbb"))
# multimethod.DispatchError

Well, the code is self-explanatory. I can define different overloads of a same method using the multimethod decorator, and the method invocation will be redirected to the right method based on the parameters. There's one limitation though, this redirection only works when using positional parameters. If we use named parameters, it fails with a multimethod.DispatchError. To work with named parameters we have to use the multidispatch decorator.


from multimethod import multidispatch 

# multidispatch supports keyword arguments  
class Formatter:
    def __init__(self, wrapper: str):
        self.wrapper = wrapper

    @multidispatch
    def format(self, item: str, starter: str):
        return f"{starter}{self.wrapper}{item}{self.wrapper}"
    
    # @multidispatch
    # def format(self, item: int, starter: str):
    #     return f"{starter}{self.wrapper * 2}{item}{self.wrapper * 2}"

    @format.register
    def _(self, item: int, starter: str):
        return f"{starter}{self.wrapper * 2}{item}{self.wrapper * 2}"

    
f1 = Formatter("|")
print(f1.format("aaa", "- "))

print(f1.format(25, "- "))

print(f1.format(starter="- ", item="bbb"))
print(f1.format(item=25, starter="- "))

# I can register new overloads to an existing class!
@Formatter.format.register
def _(self, item: bool):
    return f"{self.wrapper * 3}{item}{self.wrapper * 3}"

print(f1.format(True))
 
# and I can even overwrite an existing overload!
@Formatter.format.register
def _(self, item: int, starter: str):
    return f"{starter}{self.wrapper * 5}{item}{self.wrapper * 5}"

print(f1.format(item=25, starter="- "))

#  - |aaa|
# - ||25||
# - |bbb|
# - ||25||
# |||True|||
# - |||||25|||||


This multidispatch decorator is implemented quite differently from the multimethod decorator, so apart from supporting named parameters, you use it differently, via the register method. Let's see how the whole thing works.

First of all, let's revisit how class creation works in Python. When the runtime comes across a class declaration it executes the statements inside that declaration (normally we mainly have function definitions, but we can have also assignments, conditionals...) and the different elements (methods and attributes) defined there are added to a namespace object (sort of dictionary). Then, the class object is created by invoking type(classname, superclasses, namespace) (or if the class is using a metaclass other than type: MyMeta(classname, superclasses, namespace, **kwargs)). You can further read about this in these 2 previous posts [1] and [2]

multidispatch (same as multimethod) is a class based decorator, so when you do: "@multidispatch def format()..." you are creating an instance of the multidispatch class and adding a "format" entry in the class namespace pointing to it. Then, you decorate the other "overloads" of the function with calls to the register() method of that multidispatch instance, so the instance will store the different overloads with their corresponding signatures. Notice that because of its different internal implementation, you can not name the overload functions with the same name of the first one (that will become the "public" name for this multiple dispatch method), we use a commodity name like "_". Finally the class object will be created by inovoking type with a namespace object that contains this instance of the multidispatch class. Obviously the multidispatch class is callable, so it has a __call__ method that when invoked searches through its list of registered overloads based on the signature.

Notice in both samples above that after having declared a class we still can add additional overloads to the class using the register method (we can even overwrite an existing overload). This is a particularly nice feature as your class remains open