Thursday 18 July 2024

Leveraging Python Bytecode

There's a pretty nice Python feature that appeals to those freaks of us that on some occasions have found ourselves looking into some bytecode source (like 20 years ago I did some programming in .Net bytecode, and in the last 2 years I've looked into Java bytecode quite a few times to understand how the Kotlin compiler implements some stuff). The thing is that in Python we easily get access to the bytecode of any function at runtime! Of course we know that functions are objects, and these objects have many attributes, like __name__, __defaults__, __closure__ or __code__. __code__ points to a code object that has attributes like co_freevars (with the names of the variables trapped by the closure (if any)) and co_code, that points to a bytes object containing that function's bytecode. We can see that bytecode in a readable format with dis.dis():


def format():
    print("hi")

format.__code__
Out[6]: code object format at 0x70cd52d5ead0, file "/tmp/ipykernel_372789/3053649360.py", line 1

format.__code__.co_code
Out[7]: b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'

import dis
dis.dis(format.__code__.co_code)
          0 LOAD_GLOBAL              0 (0)
          2 LOAD_CONST               1 (1)
          4 CALL_FUNCTION            1
          6 POP_TOP
          8 LOAD_CONST               0 (0)
         10 RETURN_VALUE


I've come across several projects that take advantage of this runtime access to Python bytecode. Basically they create a copy of that bytes object, modify the bytecodes as needed and create a new function (with FunctionType) based on that new code object and other attributes of the original function. One example of a surprising project using this technique is this one that adds go-to's to python. For that, it defines goto() and label() functions that act just as markers. A fuction using these goto and label functions to define its goto's and labels has to be decorated with a decorator that will take care of getting the bytecode of the function and rewriting it adding those goto's (that exist at the bytecode level)

Another project leveraging the access to Python bytecodes is PonyORM. In this case it does not modify the bytecodes to generate a new function, but translates them to SQL!!! It was clear to me that generator functions create generator objects, but I had never thought about the fact that generator expressions also create generator objects. Generator objects have an associated code object (accessible through the gi_code attribute). Among other things PonyORM provides a select function that can receive a generator expression and translate it to SQL. The process is explained in this interesting video and basically it consists of these 3 main steps:

  • Decompile bytecode and restore AST
  • Translate AST to "abstract SQL"
  • Translate "abstract SQL" to a specific SQL dialect

So my understanding is that it decompiles the Python bytecodes into Python source code (using its own decompiler), loads that Python source into an AST (using the standard ast module), and transforms that AST into SQL.

No comments:

Post a Comment