Tuesday 30 July 2024

Python Self-Referencing Functions

Some time ago I published a post about creating self-referencing functions in JavaScript. My approach for that was using a combination of closures and eval(). When thinking recently of having the same functionality in Python I remembered that, as I explain in this post, Python closures do not play well with exec-eval, because if we use these functions to dynamically create a new function, this function will not trap variables in the surrounding scope (the function won't be a closure trapping freevars). I was thinking about several alternatives when I realised that indeed we can use a quite more simple approach (that would also work fine in JavaScript).

It's an obvious approach once you have interiorized how methods are defined in Python and extrapolate it. They do not receive a "this" in an implicit way as in most languages, but you have to declare that "this" (that we tend to call "self" rather than "this") in the method signature. Well, let's apply a similar logic to self-referencing functions, rather than having access to themselves thanks to a variable trapped by a closure (as I was doing in JavaScript), let's just assume that the function signature has to include a parameter that gives the function access to itself. I'll call that parameter me.

I've defined a decorator function that receives a function and returns a self-referencing function. We want our self-referencing function to have access to itself, but also to allow external code to get-set attributes in the function. That way, the function has a state that is accessible not only from the function, but also from outside, which makes it more powerful than a standard closure. For that, as the decorator is returning a function that wraps the original function, I'm adding get_attr and set_attr methods to the wrapper function that will operate on the original, wrapped one. I'm using functools.wraps so that the wrapper function looks like the wrapped one (same __name__, __doc__...). OK, so this is my self_referencing decorator:


# decorator to create self-referencing functions
def self_referencing(fn: Callable) -> Callable:
    @functools.wraps(fn)
    def new_fn(*args, **kargs):
        return fn(fn, *args, **kargs)
    # I have to add these functions to the wrapper function in order to get, set attributes in the original/internal function
    new_fn.set_attr = lambda key, val: setattr(fn, key, val)
    new_fn.get_attr = lambda key: getattr(fn, key)
    return new_fn


That, given a function (e.g.: format()) we'll use like this to convert it into a self-referencing function:


@self_referencing
def format(me: Callable, txt: str):
    # me is the function itself
    print(f"{me.__name__} invoked")
    me.invoked_times = 1 if not hasattr(me, "invoked_times") else me.invoked_times + 1
    return f"{me.prefix}{txt}{me.suffix}"
	
format.set_attr("prefix", "pre_")
format.set_attr("suffix", "_post")

print(format("aa"))
print(format("bb"))

print(f"invoked {format.get_attr('invoked_times')} times")
print(f"function name: {format.__name__}")

#format invoked
#pre_aa_post
#format invoked
#pre_bb_post
#invoked 2 times
#function name: format


When using the functools.wraps decorator a __wrapped__ attribute poiting to the wrapped function is added to the wrapper function. This means that I could have skipped adding the get_attr and set_attr "methods" to the wrapper function and just use wrapper.__wrapped__.xxx, but I think it's more clear to have these 2 methods.


print(f"prefix: {format.__wrapped__.prefix}")
print(f"prefix: {format.get_attr('prefix')}")
#prefix: pre_
#prefix: pre_

Wednesday 24 July 2024

Extending Python

Last week I wrote about how Python gives us access to the bytecodes of one function and how several projects leverage this to create new code from those bytecodes. Python also gives us access to the Python source code of functions, which is also leveraged in interesting ways. To obtain the source code of one function (it also works for classes) we use the inspect.getsource() function. This cute function works good with "advanced" cases, like factory functions, but it won't work for example for code generated with exec/eval/compile (probably I'll write a short, separate post about that).

Looking into different ways to pipe/compose functions I've found a very interesting project that makes use of inspect.getsource(). The general idea behind it is to write a function using valid Python syntax but with the intent of giving that syntax a different meaning (in this particular case, composing/piping functions). As the syntax is correct the function is compiled by Python without a problem, but we indeed want to rewrite it at runtime giving that syntax a different meaning. So this smart guy wrote a decorator for that. The decorator receives the original function, gets its Python source code (via inspect.getsource()), parses it into an AST (using the ast standard module), transforms that AST (converting that fake-pipes syntax into real function piping/composition), and compiles the transformed AST into a new code-object that then is passed to the exec() function to obtain the new function. You should check the source code

I've found 2 projects [1] and [2] that I think are doing something similar to allow for concise/improved lambdas, but honestly I have not looked enough into their source code.

So what we've seen in the previous post and in this post so far is the ability to modify at runtime an existing function. For that existing function to exist the Python interpreter has had to get its source and convert it into an object, and for that the syntax of that function has to be valid. We can use tricks like marker functions or use a given syntax with a different meaning in mind, but we can not use a new, expanded (and invalid) syntax. Well, the amazing news is that indeed we can, but for that, such syntax has to be processed and transformed into standard Python code before the Python interpreter fails to process it, and that can be done at module loading time, by means of import hooks.

Import hooks allow us to modify a file being imported as a module (that can be any file, not a realPpython source code file), before Python tries to compile it and finds that its syntax is not supported. So we can write a hook that takes a pseudo-module containing some extra Python syntax and transforms it to standard Python code. It's the amazing Coconut language who made me aware of this.

Coconut is a variant of Python built for simple, elegant, Pythonic functional programming. Coconut syntax is a strict superset of the latest Python 3 syntax.

So the Coconut language is a superset of Python, which allows us writing code in a much more functional style (and yes, it brings (Multi)-statement lambdas into Python!!!) Coconut transpiles the coconut code to standard Python. You can do this ahead of time, but you can just let it happen at runtime (automatic-compilation), which feels much more natural to me. You can have a project with some modules written in standard Python and others in coconut (with its .coco extension) and run it, without having to remember to preprocess the .coco files first. It's import hooks who allow us doing that.

I have no knowledge at all about implementing import hooks, so I've taken a fast look into the source code out of curiosity. Importing the coconut.api module in our code will enable automatic-compilation by adding a finder object (an instance of CoconutImporter) to the list of meta path finders:



if coconut_importer not in sys.meta_path:
	sys.meta_path.insert(0, coconut_importer)
	...
	
class CoconutImporter(object):
    """Finder and loader for compiling Coconut files at import time."""
	...
	

I've found a similar project, the Hy language, a Lisp dialect embedded in Python. Same as coconut it comes with an import hook, so that you can import the lisp-like modules with no need of compiling ahead of time.

Finally, this project that makes writing import hooks almost trivial looks well worth a mention.

Thursday 18 July 2024

Leveraging Python Bytecode

There's a pretty nice Python feature that appeals to those freaks of us that on some occasions have found ourselves looking into some bytecode source (like 20 years ago I did some programming in .Net bytecode, and in the last 2 years I've looked into Java bytecode quite a few times to understand how the Kotlin compiler implements some stuff). The thing is that in Python we easily get access to the bytecode of any function at runtime! Of course we know that functions are objects, and these objects have many attributes, like __name__, __defaults__, __closure__ or __code__. __code__ points to a code object that has attributes like co_freevars (with the names of the variables trapped by the closure (if any)) and co_code, that points to a bytes object containing that function's bytecode. We can see that bytecode in a readable format with dis.dis():


def format():
    print("hi")

format.__code__
Out[6]: code object format at 0x70cd52d5ead0, file "/tmp/ipykernel_372789/3053649360.py", line 1

format.__code__.co_code
Out[7]: b't\x00d\x01\x83\x01\x01\x00d\x00S\x00'

import dis
dis.dis(format.__code__.co_code)
          0 LOAD_GLOBAL              0 (0)
          2 LOAD_CONST               1 (1)
          4 CALL_FUNCTION            1
          6 POP_TOP
          8 LOAD_CONST               0 (0)
         10 RETURN_VALUE


I've come across several projects that take advantage of this runtime access to Python bytecode. Basically they create a copy of that bytes object, modify the bytecodes as needed and create a new function (with FunctionType) based on that new code object and other attributes of the original function. One example of a surprising project using this technique is this one that adds go-to's to python. For that, it defines goto() and label() functions that act just as markers. A fuction using these goto and label functions to define its goto's and labels has to be decorated with a decorator that will take care of getting the bytecode of the function and rewriting it adding those goto's (that exist at the bytecode level)

Another project leveraging the access to Python bytecodes is PonyORM. In this case it does not modify the bytecodes to generate a new function, but translates them to SQL!!! It was clear to me that generator functions create generator objects, but I had never thought about the fact that generator expressions also create generator objects. Generator objects have an associated code object (accessible through the gi_code attribute). Among other things PonyORM provides a select function that can receive a generator expression and translate it to SQL. The process is explained in this interesting video and basically it consists of these 3 main steps:

  • Decompile bytecode and restore AST
  • Translate AST to "abstract SQL"
  • Translate "abstract SQL" to a specific SQL dialect

So my understanding is that it decompiles the Python bytecodes into Python source code (using its own decompiler), loads that Python source into an AST (using the standard ast module), and transforms that AST into SQL.

Saturday 13 July 2024

Some Python Tricks 2024

This post is not much related to another post with a similar title from 2 years ago, but honestly I could not think of any other distintive title.

1) Overtime I've been using 3 different approches for a common, simple task. I've got a list of lines that I've read from a basic CSV file. To split these lines by separator and extract some of the columns (so obtaining a list of tuples) I was initially using this:


	lines: list[str] = Path(file_path).read_text().splitlines()
    
    pairs = [(items[0], items[2]) 
    	for items in
    	[line.split(separator) for line in lines]
    ]

Then I realized I could leverage the assignment expressions (aka walrus) feature like this:


    pairs = [((items := line.split(separator))[0], items[2])
        for line in lines
    ]

And recently I've learnt about the convenience of using operator.itemgetter


    pairs = [(operator.itemgetter(0, 2)(line.split(separator)))
        for line in lines
    ]

2) Nested list comprehensions can be confusing if written in a single line. Given this data:


# Nested list comprehensions
@dataclass
class City:
    name: str
    population: int

@dataclass
class Country:
    name: str
    population: int
    cities: list[City]

countries = [
    Country("France", 68_000_000, 
        cities = [
            City("Paris", 12_000_000),
            City("Marseille", 1_800_000),
        ]        
    ),
    Country("Portugal", 10_000_000, 
        cities = [
            City("Lisbon", 3_000_000),
            City("Porto", 1_000_000),
        ]        
    ),   
]

This list comprehension:


city_country_pairs = [(city.name, country.name) for country in countries for city in country.cities]

Is quite less evident than this one:


city_country_pairs = [
    (city.name, country.name)
    for country in countries
    for city in country.cities
]

That is equivalent to this:


city_country_pairs = []
for country in countries:
    for city in country.cities:
        city_country_pairs.append((city.name, country.name))

3) The other day I came across a nice trick for setting an attribute in objects of one list and returning it in just one sentence (so using a list comprehension)


cities = [
    City("Paris", 12_000_000),
    City("Marseille", 1_800_000),
]        
return [
    setattr(city, "population", None) or city
    for city in cities 
]


The trick there is that setattr returns None, so doing an or with the object itself where we've just set the attribute returns that object.

Tuesday 2 July 2024

Python Modules and Inheritance

We can say that Python is a "very object oriented language" because not only functions are objects, but also modules are objects. Each module is an instance of the module class. That's one of those builtin types that can be referenced through the types module, using types.ModuleType in this case.


import types

# get a reference to the current module
m1 = sys.modules[__name__]

print(f"module type: {type(m1)}, name: {__name__}")
#module type: class 'module', name: module1


print(type(m1) is types.ModuleType) #True
print(m1.__class__ is types.ModuleType) #True


With modules being objects one could think about having in them features that we have in "normal objects", like intercepting attribute access, being callable... These features are based on dunder methods that are defined not in the object itself, but in its class (__getattr__ for the "missing attribute functionality", __setattr__ and __getattribute__ for intercepting attribute setting or getting, __call__ for callables), but modules are just instances of types.ModuleType not of custom classes where we can define those dunder methods, so it seems like there's not much we can do...

Well, indeed there is. First, PEP-562 was implemented some years ago to provide the __getattr__ and __dir__ functionality to modules. You can define these methods directly in your module and it will work. So it seems like the standard behaviour of these two dunder methods has been modified so that for module objects they work just being in the object itself rather than in its class.

There was a proposal, PEP-726) to add that same support for __setattr__ and __delattr__, but it was rejected. But hopefully, reading the PEP itself we find a simple alternative for it, that leverages the beautiful python dynamism. If we want a module with "super-powers", we can just define in our module a class that inherits from module and provides those dunder methods we need, and make our module inherit from it just by setting the __class__ attribute. Cool! We can use this technique not only for dunder methods, but also for adding getters-setters (descriptors) to our module. Let's see an example of a callable module with also a get descriptor:


# module2.py
import sys
import types

def format_msg(msg: str) -> str:
    return f"[[{msg.upper()}]]"

# define a new Module type (inheriting from the standard Module)
class MyModule(types.ModuleType):
    # make our module callable
    def __call__(self, msg):
        return self.format_msg(msg)
    
    # define a property
    @property
    def formatted_name(self):
        return f"-||{self.__name__}||-"

# set the class of the current module to the new MyModule Module type    
sys.modules[__name__].__class__ = MyModule


###########################

# main.py
import module2

print(module2.format_msg("aaa"))

# our module is callable:
print(module2("aaa"))

# property (getter)
print(module2.formatted_name)


It's important to notice something that is mentioned in the PEP regarding performance and this technique:

But this variant is slower (~2x) than the proposed solution. More importantly, it also brings a noticeable speed regression (~2-3x) for attribute access.