Monday, 28 April 2025

Passing parameters to Metaclass (method_added 2)

In my previous post I showed how to simulate Ruby's method_added hook in Python. The most interesting part is making the hook work with methods added to the class ater the class is defined (expandos/monkey-patching). If your remember I use a metaclass defining __setattr__, create a class inheriting from our main class and with that metaclass as metaclass and set in our class the function to execute whan a new method gets added (wrapper_creator).



class WrapMeta(type):
    def __setattr__(cls, name, value):
        if inspect.isfunction(value):
            value = cls._wrapper_creator(value)
        return super().__setattr__(name, value)

def wrap_methods_dynamic(wrapper_creator: Callable):
    def _wrap_methods(cls):
        # wrap method that exist when the class is provided
        members = inspect.getmembers(cls, predicate=inspect.isfunction)
        for name, fn in members:
            setattr(cls, name, wrapper_creator(fn))
        # and use a metaclass for methods that will be added in the future
        # create a new class extending the main one and using as metaclass WrapMeta
        class A(cls, metaclass=WrapMeta):
            pass
        type.__setattr__(A, "_wrapper_creator", wrapper_creator)      
        A.__name__ = cls.__name__ 
        return A
    return _wrap_methods 


There's a "dark" metaclass feature (well, everything related to metaclasses can feel a bit gloomy sometimes) that can make the above code more elegant (and maybe more complex to grasp also). Metaclasses accept parameters. Those parameters can be passed to the metaclass when a class using that metaclass is defined (as keyword arguments that we place in the class declaration after the metaclass keyword argument), and the metaclass can make use of them in its __init__ method. We can leverage this to pass to our metaclass the wrapper_creator function and move into the metaclass the code that sets that function as a "private" attribute of the class. So the code looks like this:



class WrapMeta(type):
    # when a class of this metaclass is created, we provide the additional "wrapper_creator" argument
    # as explained in https://stackoverflow.com/questions/13762231/how-to-pass-arguments-to-the-metaclass-from-the-class-definition
    # we have to declare this custom __new__  to prevent a "__init_subclass__() takes no keyword arguments" Error
    def __new__(metacls, name, bases, namespace, **kwargs):
        return super().__new__(metacls, name, bases, namespace)
     
    def __init__(cls, name, bases, namespace, wrapper_creator):
        super().__init__(name, bases, namespace)
        # this one fails cause it invokes __setattr__ that tries to access the wrapper_creator that we are trying to set
        #cls.wrapper_creator = wrapper_creator
        super().__setattr__("_wrapper_creator", wrapper_creator)
		#the above is just equivalent to: 
        # type.__setattr__(cls, "_wrapper_creator", wrapper_creator)

    def __setattr__(cls, name, value):
        if inspect.isfunction(value):
            value = cls._wrapper_creator(value)
        return super().__setattr__(name, value)

def wrap_methods_dynamic(wrapper_creator: Callable):
    def _wrap_methods(cls):
        # wrap method that exist when the class is provided
        members = inspect.getmembers(cls, predicate=inspect.isfunction)
        for name, fn in members:
            setattr(cls, name, wrapper_creator(fn))
        # and use a metaclass for methods that will be added in the future
        # create a new class extending the main one and using as metaclass  WrapMeta
        # furthermore, we pass an argument to the class  
        class A(cls, metaclass=WrapMeta, wrapper_creator=wrapper_creator):
            pass
        A.__name__ = cls.__name__
        return A
    return _wrap_methods 


@wrap_methods_dynamic(log_wrapper_creator)
class User:
    def __init__(self, name):
        self.name = name

    def say_hi(self, sm: str):
        return f"{self.name} says hi to {sm}"

    def walk(self):
        return f"{self.name} is walking"


u1 = User("Iyan")
print(u1.say_hi("Francois"))
print(u1.walk())
User.do_work = lambda self, a, b: f"{self.name} is working on {a} and {b}"
print(u1.do_work("aaa", "bbb"))

# __init__ started
# __init__ finished
# say_hi started
# say_hi finished
# Iyan says hi to Francois
# walk started
# walk finished
# Iyan is walking
#  started
#  finished
# Iyan is working on aaa and bbb


These custom parameters that I define in __new__ just as **kwargs (I'm not using them there other than for discarding them in the call to the parent __new__) and in __init__ as wrapper_creator, correspond to the **kwargs in the second signature for type.__call__ that we saw in this previous post.



#when constructing a class (an instance of a metaclass)
#type.__call__ would be:
def __call__(metacls, name, bases, namespace, **kwargs):
    cls = metacls.__new__(metacls, name, bases, namespace, **kwargs)
    if isinstance(cls, metacls):
        metacls.__init__(cls, name, bases, namespace, **kwargs)
    return cls

Notice that the 3 statements for creating the Auxiliar class that inherits from cls and has WrapMeta as metaclass, setting its name and returning it:


        class A(cls, metaclass=WrapMeta, wrapper_creator=wrapper_creator):
            pass
        A.__name__ = cls.__name__
        return A

Can be condensed into a single one:


	return WrapMeta(cls.__name__, (cls,), {}, wrapper_creator=wrapper_creator)

Wednesday, 23 April 2025

Simulate method_added Hook in Python

Recently, for whatever the reason, I've ended up reading some stuff about the amazing Ruby metaprogramming capabilities (I have no Ruby experience, I just toyed with it more than 15 years ago, and from then I've occasionally taken a fast peek to compare it with my known languages). Ruby hooks seem pretty amazing to me. I'd never seen anything similar in other languages and are one of those language features that after the initial amazement you wonder, how would I use them, and how could I simulate them in other languages?

I'm particularly interested in the method_added hook. So you can hook to your class a function that will be executed each time a method is added to that class (both if the method is defined as part of the class or if it's added later to the class (aka expando methods)). Amazing, but what's a practical use for that? Well, main thing that comes to my mind is AOP. Let's say we want to add to all methods in one class a call to log("method started") and log("method finished"). We would hook to method_added a function that receives the original function and creates a new function that wraps the call to the original function between those calls to the logger. And now the big question, how could we achieve something similar in Python? Well, creating wrapper functions... that's what decorators are all about. So first let's create a function that when invoked creates a new wrapper function that performs this extra logging.


# wraps a function in a new function that performs logging before and after invokation
def log_wrapper_creator(fn: Callable) -> Callable:
    @wraps(fn)
    def log_added(*args, **kwargs):
        print(f"{fn.__name__} started")
        res = fn(*args, **kwargs)
        print(f"{fn.__name__} finished")
        return res
    return log_added
	

An now let's create a decorator that when applied to a class, applies the above decorator to all methods in the class.



# decorator that applies to a class wrapping all its methods using the provided wrapper_creator function
def wrap_methods(wrapper_creator: Callable):
    def _wrap_methods(cls):
        members = inspect.getmembers(cls, predicate=inspect.isfunction)
        for name, fn in members:
            setattr(cls, name, wrapper_creator(fn))
        return cls
    return _wrap_methods 
	

That's nice, but it's only one part of what we get with Ruby method_added. If after defining our class then we decide to add a new method to it (we monkey-patch or expand the class with new methods) the method_added hook will also handle that. So how can we emulate that in Python? __setattr__ is a sort of "generic hook" that we can use in Python to react to the setting of an attribute in one object. If we want to react to setting attibutes in a class, we'll need __setattr__ defined in its metaclass. We can add extra logic to our existing decorator, so that after wrapping the existing methods it sets the metaclass of our class to a metaclass with _setattr__ powers. Important point, as I explain at the end of this post the metaclass of a class can not be changed (__class__ is readonly in instances of type), so we have to use an extra trick. We can define a new class that extends the existing class and has our custom metaclass as metaclass. Yeah, much talk, let's see in action what I mean:



# Metaclass
class WrapMeta(type):
    def __setattr__(cls, name, value):
        if inspect.isfunction(value):
            value = cls._wrapper_creator(value)
        return super().__setattr__(name, value)

def wrap_methods_dynamic(wrapper_creator: Callable):
    def _wrap_methods(cls):
        # wrap method that exist when the class is provided
        members = inspect.getmembers(cls, predicate=inspect.isfunction)
        for name, fn in members:
            setattr(cls, name, wrapper_creator(fn))
        # and use a metaclass for methods that will be added in the future
        # create a new class extending the main one and using as metaclass WrapMeta
        class A(cls, metaclass=WrapMeta):
            pass
        # this one fails cause it invokes the __setattr__ that tries to access the wrapper_creator that we are trying to set
            # A.wrapper_creator = wrapper_creator 

        # so we have to use:
            # object.__setattr__ does not work with classes ( an't apply this __setattr__ to WrapMeta object), we have to use type.__setattr__
            # object.__setattr__(A, "wrapper_creator", wrapper_creator)
        type.__setattr__(A, "_wrapper_creator", wrapper_creator)      
        A.__name__ = cls.__name__ 
        return A
    return _wrap_methods 

Notice how for the __setattr__ in the metaclass to be able to apply our wrapper_creator function, we have to make it accessible somewhere. We do so but setting it as a "private" attibute of the class, and that's a bit tricky. Setting that attribute will trigger our custom __setattr__ so to prevent that, we'll have to specify that we want to use a different __setattr__ logic. In a method of a class, we use super().__setattr__, but here we are in a function, so I tried to use object.__setattr__. That fails with a TypeError: can't apply this __setattr__ to WrapMeta object. What we have to use is type.__setattr__. It's discussed here.

type.__setattr__ is used for classes, basically instances of metaclasses. object.__setattr__ on the other hand, is used for instances of classes.

Ah, of course, let's see an example:



@wrap_methods_dynamic(log_wrapper_creator)
class User:
    def __init__(self, name):
        self.name = name

    def say_hi(self, sm: str):
        return f"{self.name} says hi to {sm}"

    def walk(self):
        return f"{self.name} is walking"


u1 = User("Iyan")
print(u1.say_hi("Francois"))
print(u1.walk())
User.do_work = lambda self, a, b: f"{self.name} is working on {a} and {b}"
print(u1.do_work("aaa", "bbb"))

# __init__ started
# __init__ finished
# say_hi started
# say_hi finished
# Iyan says hi to Francois
# walk started
# walk finished
# Iyan is walking
#  started
#  finished
# Iyan is working on aaa and bbb


Tuesday, 15 April 2025

File Descriptors and File Deletion

In my previous post about inodes I mentioned that I would write a further post about the other big element regarding how the OS manages files, File Descriptors. So that's what this post is about. I already mentioned in that post that a process references the files that it has open by inode, not by path. It does so through File Descriptors. When a process opens a file it gets a File Descriptor, that works like this:

file descriptors index into a per-process file descriptor table maintained by the kernel, that in turn indexes into a system-wide table of files opened by all processes, called the file table. This table records the mode with which the file (or other resource) has been opened: for reading, writing, appending, and possibly other modes. It also indexes into a third table called the inode table that describes the actual underlying files.[3]

From ChatGPT we learn:

An entry in the system-wide File Table does not contain the original path that was used to open the file. Instead, it primarily contains:

A reference to the inode (which uniquely identifies the file on disk).
The file access mode (read, write, etc.).
The file offset (current read/write position in the file).
A reference count (number of processes using the same file table entry).

And if you are wondering why the path is not stored in the File Table:

Why is the original path not stored?

The kernel tracks it by its inode, not by the path. This is because:

Files can have multiple hard links (multiple paths referring to the same inode).
A file can be renamed or deleted while it is still open, making the original path potentially invalid.
Efficiency—storing the path for every open file would add overhead.

We know also that the Inode table does not contain information about the paths that point to an inode (but it contains a counter of how many paths (hardlinks) reference that inode, the link count). Notice that the inode table is stored on disk (otherwise filesystems would be ephemeral), typically at the beginning of its corresponding partition, and also in memory. When the filesystem is mounted, the inode table is loaded into memory to facilitate faster access and manipulation of file metadata. The thing is that when we use the lsof command to see the files open by a process we can see paths to those files (not just inode numbers), so how does that work? Chat GPT says:

When you run lsof -p PID, it reconstructs the paths based on available information by:

Checking symbolic links in /proc/PID/fd/ (e.g., /proc/1234/fd/3 might show a path like /home/user/file.txt).
Looking at /proc/PID/maps (for memory-mapped files).
Using /proc/PID/cwd (for relative paths).
In some cases, traversing /proc/mounts to help resolve paths.
However, this is an approximation because the exact original path used by open() may not always be retrievable.

When we delete a file (I mean: rm myFile.txt) that hard link will be removed, but there are 2 things that will prevent or postpone the deletion of the corresponding inode (and the datablocks referenced from it). Obviously, if there are other hardlinks for that file the inode will not be deleted. If no other hardlinks exist, the OS will check if any process has the file open (by searching in the File Table any entry pointing to that inode), and in that case, will wait. This is nicely explained again by ChatGPT:

When a file is deleted (unlink() in Linux/Unix), the OS follows this process:

Check the link count in the inode:
If there are other hard links (i.e., link count > 1), the directory entry is removed, but the inode remains because other names still reference it.
If there are no other hard links (link count == 1), then the inode might be deleted—but not yet.
Check the system-wide File Table:
If any process still has the file open, the inode and data remain in use until all processes close it.
The file is effectively invisible to new processes (since its directory entry is gone), but existing processes can still read/write to it.
Final cleanup (when the last process closes the file):
If the link count was already 0 and no file descriptors remain open, the inode and disk blocks are finally freed.

Thursday, 10 April 2025

Python, functions, codeobjects and closures

In this post I mentioned that when a Python function traps variables from an outer scope (free vars), that is, when the function is a closure, it gets added a __closure__ attribute that points to a tuple that contains a cellvar object for each trapped variable. Each cellvar object is a wrapper for the trapped value, adding one level of indirection. That's how multiple closures that have trapped the same value can share it, and modifications done by one of them is seen by all others, including the original funcion where the value was defined. That's important, because it means that once a value is trapped by a closure (and wrapped in a cellvar object), the function where the value was defined no longer access to it directly, but also through the cellvar object. I've been investigating a bit how the compiler and the runtime manage all this via code objects, __closure__, co_freevars, co_cellvars... and here it goes:

In dynamic, interpreted languages the border between compilation (if any) and execution can feel a bit blurry. In Python when the runtime is executing a file, every time it finds a class declaration or a function declaration it creates an object for it. If that declaration happens inside a loop (we can put a class or function declaration wherever we want) a new object for that class gets created in each iteration. OK, that's for objects, but what about their code? On one side we know that CPython compiles each python module to bytecodes before running it. On the other hand we probably know that a function object has a __code__ attribute that points to its code object. A code object has a bunch of attributes with information about the code, and the code itself (an array of bytes referenced by the co_code attribute). This article contains tons of information about codeobjects.

When I said that compilation and execution are a bit blurred it's because both of them happen at runtime. In a first stage code objects are created when the runtime imports a module and compiles it to bytecodes before executing it (those bytecodes will be saved to the __pycache__ for future reuse). Code objects are created not only for functions but also for modules (containing the top-level code in the module) and for classes (for the class-level code, that is, the code that defines class fields and member functions, that in turn will have their own code objects). Once this compilation-codeobject creation is done, we enter the seconde stage, that is execution proper. Execution starts by running the code in the codeobject for a module, running this code will find class declarations and function declarations. Each of these declarations create objects for those classes and functions (and for member functions contained in the classes). It's interesting that neither modules nor classes have a __code__ attribute. I guess this is so because the code in a module is executed only once (when it's imported), and the code in a class (class-level code) is executed only each time the class is defined, while for functions, when the function is defined its codeobject is not executed, but attached to the function object created at that time, so that it can be executed later each time the function object is invoked.

Let's dive deeper now into this relation between function objects, codeobjects and closures. It's all better explained here but I had managed to understand the process myself, so I'll write it here. When compiling a function a codeobject is created for that function and also for each other function nested inside it. When compiling the outer function (let's say outer_fn), it takes note of variable declarations and when an internal function is found (let's say inner_fn), the function is in turn compiled. The compiler takes note of variable declarations in inner_fn, and those variables used in inner_fn but not declared in it (nor received as parameters) are candidates to be free-vars. It checks the outer functions to see if these variables are declared there, and if so, they are confirmed as free-vars and set in the inner_fn.__code__.co_freevars tuple. Furthermore, these variables will be set in the co_cellvars of the outer function where they are defined (outer_fn.__code__.co_cellvars in this case). So at compilation time information about the variable names ("normal", freevar, cellvar) is stored in the codeobjects so that at execution time, when functions are created, they can assist the runtime in creating closures. What is very, very important, is that the compiler uses different bytecode instructions for accessing variables defined as co_cellvars or co_freevars than for accessing "normal" variables. The values for those variables are wrapped in cellvar objects, and as accessing its values invokes an extra indirection, specific bytecode instructions are emitted (this is so both for the outer function where the variable has been defined and for the inner function that traps it in its closure), instructions like LOAD_DEREF and STORE_DEREF.

At execution time, when a function declaration is found a function object is created, with its __code__ attribute pointing to its codeobject. In outer_fn as its codeobject has co_cellvars the corresponding variables are wrapped in cellvar objects, and when the inner_fn declaration is found, as its codeobject has co_freevars, a __closure__ attribute is set in the inner_fn object, with references to the cellvar objects that we created in outer_fn.

I think all this partially explains why code blocks executed by the compile()/exec() functions can not create closures. Let's see an example:



def test_closure_1():
    fn_st = "\n".join([
    "def format(msg):", 
    "    print(f'{dec}msg{dec}')"
    ])

    # this does not work. The function that I define inside exec can not find the "dec" variable
    d = {}
    dec = "||"
    exec(fn_st + "\n" + "d['fn'] = " + "format")
    format = d["fn"]
    format("a")

test_closure_1()
# name 'dec' is not defined

When test_closure_1 is compiled it can not see that sometime in the future we'll dynamically create a nested function, because that nested function is defined in a string and will be compiled in the future when test_closure_1 is executed, not now while test_closure_1 is being compiled. This means that it can not know that "format" would want to have "dec" as a freevar, so "dec" in test_closure_1 will be compiled as a normal variable, not as a cellvar. Because of that, format can not trap it as a freevar, cause it would be working on a cellvar while test_closure is working on a normal variable.

If the variable to be trapped were declared in the block itself, it would be possible for the compiler when invoked by exec() to treat it as a cellvar and trap it, but for whatever the reason Python does not support that, as we can see in this code:


def test_closure_2():
    fn_st = "\n".join([
    "dec = '||'",
    "def format(msg):", 
    "    print(f'{dec}msg{dec}')"
    ])

    # this does not work. The function that I define inside exec can not find the "dec" variable
    d = {}
    exec(fn_st + "\n" + "d['fn'] = " + "format")
    format = d["fn"]
    format("a")

test_closure_2()
# name 'dec' is not defined

There's another case that could be supported, but it's not. When we have a variable trapped by a "normal" closure (format1) and we try to trap it by a function (format2) defined inside an exec block. That variable is already managed as a cellvar by the surrounding function and the normal closure, so the exec() could see that and add it to format2.__closure__ but it does not.


def test_closure_3():
    # create a normal closure with dec freevar
    dec = "||"
    def format1(msg):
        print(f'{dec}msg{dec}')
    
    # and try to create another closure in an exec block
    fn_st = "\n".join([
        "def format2(msg):", 
        "    print(f'{dec}msg{dec}')"
    ])
    d = {}
    exec(fn_st + "\n" + "d['fn'] = " + "format2")
    format2 = d["fn"]
    
    format1("a")
    format2("b")
    
test_closure_3()
# ||msg||
# name 'dec' is not defined

Thursday, 3 April 2025

Python, leverage f_locals with exec()

In this post from 2 years ago I talked about how to create a function from a string in Python. As I explain there, given that eval() only works with expressions and exec() returns nothing, to "return" the function we had to do something a bit tricky with an assignment, particularly tricky due to the limitations of how exec() can interact with variables in its surrounding scope. I'll replicate here the code sample from that post:


fn_st = (
"def multiply(num1, num2):\n"
"    print('multiplying numbers')\n"
"    return num1 * num2\n"
)

def create_function(fn_st, fn_name):
    d = {}
    exec(fn_st + "\n" + "d['fn'] = " + fn_name)
    return d["fn"]


new_fn = create_function(fn_st, "multiply")
print("created")
print(new_fn(2, 3))
# multiplying numbers
# 6

As I explain in that post code compiled/executed by exec() or eval()) can read variables from the surrounding scope, but if it writes to them or creates new variables, those changes won't be available in the surrounding scope. To circumvent that we set the function as an entry in a dictionary, rather than directly in a variable, so with the extra indirection level it works. After writing that post I had found somewhere another technique a bit cleaner, let's see:


def create_function4(fn_st, fn_name):
    exec(fn_st, scope:={})
    return scope[fn_name]

print("started option4")
new_fn = create_function4(fn_st, "multiply")
print("created")
print(new_fn(2, 3))
# multiplying numbers
# 6

We can pass to exec() dictionaries representing the global and local variables to use in the block to execute. So in this case we pass a dictionary, that will be used as both local and global scope for the block, so the function that we define in exec gets defined in that dictionary, and we can retrieve the function from the dictionary in the outer scope. This technique corresponds to this in the documentation:

Pass an explicit locals dictionary if you need to see effects of the code on locals after function exec() returns.

On the other hand, the "unable to modify variables in the outer scope" behavior that we experience when we don't explicitly provide the globals and locals arguments corresponds to this in the documentation

In an optimized scope (including functions, generators, and coroutines), each call to locals() instead returns a fresh dictionary containing the current bindings of the function’s local variables and any nonlocal cell references. In this case, name binding changes made via the returned dict are not written back to the corresponding local variables or nonlocal cell references, and assigning, reassigning, or deleting local variables and nonlocal cell references does not affect the contents of previously returned dictionaries.

This brings me back to my post from late December about f_locals

FrameType.f_locals now returns a write-through proxy to the frame’s local and locally referenced nonlocal variables in these scopes.

This means that we can write the above function also this way, passing the f_locals of the current frame:


def create_function5(fn_st, fn_name):
    # I can not pass f_locals as globals: a TypeError: exec() globals must be a dict, not FrameLocalsProxy
    # but I can pass it as locals:
    #exec(compile(fn_st, "", "exec"), locals=inspect.stack()[0].frame.f_locals)
    exec(fn_st, locals=inspect.stack()[0].frame.f_locals)
    return locals()[fn_name]

print("started option5")
new_fn = create_function5(fn_st, "multiply")
print("created")
print(new_fn(2, 3))
# multiplying numbers
# 6

Notice that we have to pass f_locals as the locals parameter rather than as globals, cause passing it as globals we get an TypeError: exec() globals must be a dict, not FrameLocalsProxy

For this "function creation" case where we just want to retrieve the function, passing as parameter f_locals or passing a new dictionary does not make a particular difference (indeed it's way more verbose), but for cases where we want to modify local variables of the surrounding scope. f_locals is a game changer!


def another_test():
    print("another_test")
    a1 = "aaa"
    exec("a1 += 'bbb'", locals=inspect.stack()[0].frame.f_locals)
    print(f"a1: {a1}")

another_test()
# a1: aaabbb