Monday, 30 December 2024

Python altinstall and WSL2

When working with multiple Python versions installed on the same machine, on Windows I use pyenv, but on Linux I prefer to do an altinstall. This means downloading the Python source code and building it. It may sound scary, but it's pretty simple. Once done you'll have a new Python installation in /usr/local/bin - /usr/local/lib that will not interfere with the system/default one (the one that came "from factory" with your Linux distribution, that is used by the Operating System itself and that should not be modified) that is installed on /usr/bin - /usr/lib. Just use use python3.xx to invoke that newly installed version, and python3 to invoke the system Python.

What feels odd is that it's not something that is explained in so many places. The official Python documentation just mentions this. The more detailed instructions that I've always followed are here, and as it explains installation is that simple as this:

sudo apt install build-essential zlib1g-dev \
libncurses5-dev libgdbm-dev libnss3-dev \
libssl-dev libreadline-dev libffi-dev 

wget https://www.python.org/downloads/release/python-3xxx/Python-3.xx.x.tar.xz
tar xf Python-3.xx.x.tar.xz
./configure
make altinstall

The first step is particularly important. Python comes with python modules that use native modules, and to compile those native modules they need some -dev packages installed on your system (these source packages contain mainly C header files), otherwise the compilation of those modules will fail and you'll have an incomplete installation that will cause errors when trying to import those missing modules. If you plan to use sqlite on your system, given that the sqlite module that is part of the Python distribution depends on a native module, in order to compile it you must add this: libsqlite3-dev to the list of dependencies to install that I listed above.

My work laptop (the one provided by my employer I mean) is still a Windows one. I have no problem with that, I used to have good knowledge of Windows internals, and even now that I'm more of a Linux person (all my personal computers are Linux based) I still consider that Windows architecture is really good (though I've come to distaste the updates system, the restore points, the UI...). That said, I'm using WSL2 more and more these days. I have Python3.13 installed as an altinstall on it and it's been working perfectly fine for testing on linux stuff that I develop on Windows. The other day I went one step further and wanted to debug that code on linux. Your Windows VS Code can work with folders on your WSL2 installation just in the same way it works with code on a remote linux machine. The WSL extension works in combination with the Remote SSH extension, installing to your $HOME/.vscode-server/ folder in WSL2 the code it needs on the linux side (same as it does when working with any remote Linux server). I think all this remote development is something that a few years back one could not dream about.

With VS Code and the WSL extension combined, VS Code’s UI runs on Windows, and all your commands, extensions, and even the terminal, run on Linux. You get the full VS Code experience, including autocomplete and debugging, powered by the tools and compilers installed on Linux.

The thing is that when trying to run under the debugger my code on WSL2 I was confronted with this

debugpy/launcher/debuggee.py", line 6, in module
    import ctypes
  File "/usr/local/lib/python3.10/ctypes/__init__.py", line 8, in module
    from _ctypes import Union, Structure, Array
ModuleNotFoundError: No module named '_ctypes'

Initially I was thinking it would be some problem of the debugger itself, some issue with the amazing "remote development experience" that was making it fail to find that module, but just jumping into a WLS2 terminal, opening a Python3.13 REPL and trying to import _ctypes was causing the same error. So that _ctypes module was really missing on my Python3.13 WSL2 altinstallation.

Jumping to my main Ubuntu personal laptop, with also a Python3.13 altinstallation and importing _ctypes I got:

$ python3.13
Python 3.13.0 (main, Nov  9 2024, 16:10:52) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import _ctypes
>>> _ctypes.__file__
'/usr/local/lib/python3.13/lib-dynload/_ctypes.cpython-313-x86_64-linux-gnu.so'

lib-dynload seems to be where native modules get installed (I can see also for example the sqlite.so), si if the ctypes.so is missing is that some necessary -dev package was missing when I did my altinstall on WSL2. For the sake of experiment I decided to just copy the _ctypes___.so from my laptop to the windows WSL laptop. Doing that, I got another missing module, libffi, that is imported by _ctypes. Doing a sudo apt list --installed | grep libffi I see that there's not a libffi-dev package installed on my WSL2, so somehow when a time ago I installed the different -dev packages needed to compile Python I missed to install it (so the Python compilation could not create that libffi.so into lib-dynload), and the issue had not hit me until now. To fix the problem I installed libffi-dev, uninstalled python3.13 and did a new altinstall. It works like a charm now.

There does not seem to be an automatic mechanism to uninstall a Python version installed as altinstall (a Python install takes little space and indeed I assume that I could just have do a new install without removing the existing one and it would get correctly updated), but anyway, as explained here I removed this list of folders/files:

    directory /usr/local/lib/python3.13
    directory /usr/local/include/python3.13
    file /usr/local/lib/libpython3.13.a
    file /usr/local/lib/pkgconfig/python-3.13-embed.pc
    6 files /usr/local/bin/*3.13*

While checking this thing of the missing native module (.so) I also used these commands:
lsof -p [PID] | grep .so to see the shared objects loaded by a process (lsof was an old friend of mine)
readelf -d (this was new to me. It gives you information about an elf binary file (executable or shared object, the equivalent to a windows PE file), and among that information you can see the shared objects needed by that binary, eg:

readelf -d _ctypes.cpython-313-x86_64-linux-gnu.so
$ readelf -d _ctypes.cpython-313-x86_64-linux-gnu.so

Dynamic section at offset 0x21cf8 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libffi.so.8]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000c (INIT)               0x7000

Saturday, 21 December 2024

Python Generators Close and Return Enhancement

In my previous post I mentioned how I find it strange that the value returned by generator.close() (if it returns a value) is not made available in the next call to generator.next() (or generator.send()) as the value of the StopIteration exception. A few months ago I wrote about the problems with for loops for getting access to the value returned by a generator, and provided a solution. Well, I've improved such solution to have generator object that wraps the original generator and addresses both issues. If we close the generator and it returns a value, the next time we call next() or send() such value is available in StopIteration.value. Additionally, if a generator returns a value (either if it's closed or it finishes normally) that return value is made accessible in a result attribute of our Generator wrapper. OK, much talk, show me the code:


import inspect
from typing import Generator, Any

def cities_gen_fn():
    try:
        yield "Xixon"
        yield "Paris"
        yield "Lisbon"
        yield "Bilbao"
    except GeneratorExit:
        pass    
    # return this value both if closed or in a normal execution
    return "XPLB"


# wraps a generator in an "extended generator" that stores the value returned by close() to "return it" it the next call to .next() or .send() 
# it also stores the returned value if the original generator returns something
# that stored return-close value is only returned as StopIteration.value in the first call to .next()-.send(), ensuing calls return StopIteration.value as None

class GeneratorWithEnhancedReturnAndClose:
    def __init__(self, generator_ob: Generator[Any, Any, Any]):
        self.generator_ob = generator_ob
        self.result = None
        self.just_closed = False
        self.closed = False
        
    def __iter__(self):
        return self
    
    def _do_iterate(self, caller: str, val: Any) -> Any:
        if self.just_closed:
            self.just_closed = False
            ex = StopIteration()
            ex.value = self.result
            raise ex
        try:
            if caller == "__next__":
                return next(self.generator_ob)
            else:
                return self.generator_ob.send(val)
        except StopIteration as ex:
            if self.result is None:
                self.result = ex.value
            raise ex
            
    def __next__(self):
        return self._do_iterate("__next__", None)

    def send(self, val):
        return self._do_iterate("send", val)       

    def close(self):
        if not self.closed:
            self.closed = True
            self.just_closed = True 
            self.result = self.generator_ob.close()
            return self.result
        
    def throw(self, ex):
        return self.generator_ob.throw(ex)

print("- getting return value after for-loop")
cities = GeneratorWithEnhancedReturnAndClose(cities_gen_fn())
for city in cities:
    print(city)
print(f"return value: {cities.result}")

print("------------------------")
print("- using next() and close()")

cities = GeneratorWithEnhancedReturnAndClose(cities_gen_fn())
print(next(cities))
print(next(cities))
print(f"closing generator: {cities.close()}")
# first iteration after closing it returns the close-value in the StopIteration.value
try:
    next(cities)
except Exception as ex:
    print(f"generator finished {ex.value}")

# next iteration returns StopIteration with value = None
try:
    next(cities)
except Exception as ex:
    print(f"generator finished {ex.value}")

print(f"return value: {cities.result}")

print("------------------------")
print("- using send() and close()")
# test now that send() also works OK

def freak_cities_gen():
    try:
        w = yield "Xixon"
        w = yield f"{w}Paris{w}"
        w = yield f"{w}Lisbon{w}"
        yield f"{w}Bilbao{w}"
    except BaseException: #GeneratorExit:
        pass    
    # return this value both if closed or in a normal execution
    return "XPLB"
 
cities = GeneratorWithEnhancedReturnAndClose(freak_cities_gen())
print(next(cities))
print(cities.send("||"))
print(f"closing generator: {cities.close()}")
# first iteration after closing it returns the close-value in the StopIteration.value
try:
    next(cities) #it's the same using next or send
except Exception as ex:
    print(f"generator finished {ex.value}")

# next iteration returns StopIteration with value = None
try:
    cities.send("|") #it's the same using next or send
except Exception as ex:
    print(f"generator finished {ex.value}")

print(f"return value: {cities.result}")



# - getting return value after for-loop
# Xixon
# Paris
# Lisbon
# Bilbao
# return value: XPLB
# ------------------------
# - using next() and close()
# Xixon
# Paris
# closing generator: XPLB
# generator finished XPLB
# generator finished None
# return value: XPLB
# ------------------------
# - using send() and close()
# Xixon
# ||Paris||
# closing generator: XPLB
# generator finished XPLB
# generator finished None
# return value: XPLB



Notice that we could inspect the call stack to get from what method we are being called and rewrite the code this way:


    def _do_iterate_slow(self, val: Any) -> Any:
        # this is very cool with the use of introspection to check the caller name, but that's pretty slow
        if self.just_closed:
            self.just_closed = False
            ex = StopIteration()
            ex.value = self.result
            raise ex
        try:
            if inspect.stack()[1][3] == "__next__":
                return next(self.generator_ob)
            else:
                return self.generator_ob.send(val)
        except StopIteration as ex:
            if self.result is None:
                self.result = ex.value
            raise ex
           
    def __next__(self):
	return self._do_iterate_slow(None)

    def send(self, val):
        return self._do_iterate_slow(val)


But other than being very cool, that stack access is rather slow, so we better avoid such technique.

Tuesday, 17 December 2024

Closing Python Generators

This post is about some rarely used features of Python generators (JavaScript generators are pretty similar, but with some differences that would deserve its own post).

First of all, the internals of Python generators is pretty interesting. They are quite different from C# generators or Kotlin suspend functions, where the function is converted by the compiler into a class with a "state machine method" with labels for each "suspension point" and properties for the current label and local variables. In Python, the generator object created from a generator function points to that function as such, and holds a frame object with the variables and the next instruction to run. Each time the generator function is resumed it gets this frame object (gi_frame) (rather than starting with an unitialized one) containing its state and position of its next instruction (gi_frame.f_lasti). It's very nicely explained here. We can see with this simple code that the gi_frame and the frame taken (via inspect) from the stack in the generator function execution are indeed the same object, not a copy:


import inspect

def cities_gen_fn():
    print(f"frame id: {id(inspect.stack()[0].frame)}")    
    yield "Xixon"
    print(f"frame id: {id(inspect.stack()[0].frame)}")
    yield "Paris"
    yield "Lisbon"
    yield "Bilbao"

cities = cities_gen_fn()
print(next(cities))
print(f"gi_frame id: {id(cities.gi_frame)}")
print(next(cities))

# frame id: 2550405375184
# Xixon
# gi_frame id: 2550405375184
# frame id: 2550405375184


Python generator objects have a close() method that allows us to set the generator as finished. One common use case is when looping over a generator and at some point a condition tells us to stop. Of course you can leave the loop using the break statement, but that's a bit different, break will leave the loop immediatelly, not in the next iteration, and as the generator has not been finished, we still can continue to iterate it after the loop.



def cities_gen_fn():
    yield "Xixon"
    yield "Paris"
    yield "Lisbon"
    yield "Bilbao"

print("- using break")
cities = cities_gen_fn()
for city in cities:
    if (city := city.upper())[0] == "L":
        break
    print(city)
print(next(cities))

print("- using .close()")
cities = cities_gen_fn()
for city in cities:
    if (city := city.upper())[0] != "L":
        cities.close()
        print(city)
try:
    print(next(cities))
except StopIteration as ex:
    print("generator is finished")

# - using break
# XIXON
# PARIS
# Bilbao

# - using .close()
# XIXON
# generator is finished


I can think of some situation in the past where this .close() method would have come handy. Let's say we have a main function that creates a generator and delegates on other functions certain tasks involving iterating that generator. Each of those functions could determine based on its own logic that the generator is finished, so it would close it, and then the main function would no longer invoke the remaining functions with it. Unaware of this .close() functionality I was returning a "is_finished" boolean from each of those functions.

The documentation on .close() shows that it's a quite interesting and complex beast. Raises a GeneratorExit at the point where the generator function was paused. Wow, that's quite a bit mind blowing. So it's as if when the generator function is resumed somehow the interpreter injects a raise GeneratorExit() sentence in the place where the gi_frame.f_lasti is pointing to! If the generator does not catch the exception the generator finishes (the next iteration attempt will throw a StopIteration) and the close() call returns None (that's the behaviour in the examples above). Python3.13 has introduced a new feature, the generator can catch the exception and return a value to the close() method. The main effect, finishing the generator is the same, but we have this extra of returning a value to the caller. Let's see:



def cities_gen_fn():
    yield "Xixon"
    yield "Paris"
    yield "Lisbon"
    yield "Bilbao"


def cities2_gen_fn():
    try:
        yield "Xixon"
        yield "Paris"
        yield "Lisbon"
        yield "Bilbao"
    except BaseException: #GeneratorExit:
        return "No City"
        #this returned value in case of a close() is returned by close(), but not as return value of the generator (StopIteration.value is None)


for cities_gen in [cities_gen_fn(), cities2_gen_fn()]:
    print(next(cities_gen))
    print(f"close result: {cities_gen.close()}")
    print("generator has been closed")
    try:
        next(cities_gen)
    except Exception as ex:
        print(f"Exception: {type(ex)}, value: {ex.value}")
    print("--------------------------")

# Xixon
# close result: None
# generator has been closed
# Exception: 'StopIteration', value: None
# --------------------------
# Xixon
# close result: No City
# generator has been closed
# Exception: 'StopIteration', value: None
# --------------------------

What feels a bit odd to me is that the value returned by the generator to .close() is not considered as a generator return value and made available as the .value property of the next StopIteration exception.

We have another related method, generator.throw(). It's also used to finish a generator, but throwing exceptions, for which I don't see any clear use case.

Raises an exception at the point where the generator was paused, and returns the next value yielded by the generator function. If the generator exits without yielding another value, a StopIteration exception is raised. If the generator function does not catch the passed-in exception, or raises a different exception, then that exception propagates to the caller.

I'll show some example, but honestly I don't see when this method can be useful.



def cities_gen_fn():
    yield "Xixon"
    yield "Paris"
    yield "Lisbon"
    yield "Bilbao"

cities_gen = cities_gen_fn()
print(next(cities_gen))
try:
    print(f"throw result: {cities_gen.throw(Exception())}")
    print("after generator throw")
except Exception as ex:
    print(f"Exception: {ex}")
try:
    print("next iteration attempt")
    next(cities_gen)
except Exception as ex:
    print(f"Exception in next() call: {type(ex)}, value: {ex.value}")

# Xixon
# Exception: 
# next iteration attempt
# Exception in next() call: 'StopIteration', value: None


print("--------------------------")


def cities2_gen_fn():
    try:
        yield "Xixon"
        yield "Paris"
        yield "Lisbon"
        yield "Bilbao"
    except Exception: 
        yield "Except City"


cities_gen = cities2_gen_fn()

print(next(cities_gen))
print(f"throw result: {cities_gen.throw(Exception())}")
print("after generator throw")
try:
    print("next iteration attempt")
    next(cities_gen)
except Exception as ex:
    print(f"Exception in next() call: {type(ex)}, value: {ex.value}")


# Xixon
# throw result: Except City
# after generator throw
# next iteration attempt
# Exception in next() call: 'StopIteration', value: None


Monday, 2 December 2024

Python locals(), f_locals, local namespace

The freshly released Python 3.13 mentions some updates to the locals() behaviour. Reading those notes, confirms to me (as I have outlined here) that trying to create new variables in exec()/compile() will have no effect outside of the "block" executed in exec-compile itself (reassigning an "external" variable will not have effect either) the code will always run against an independent snapshot of the local variables in optimized scopes, and hence the changes will never be visible in subsequent calls to locals(), and also opens the door to some really interesting stuff: FrameType.f_locals now returns a write-through proxy to the frame’s local and locally referenced nonlocal variables in these scopes.

Let's go a bit deeper into the above statements. Each time we execute a function, a "local namespace" object is created for that function (it's a sort of dictionary), where local variables and parameters are stored (and also free vars if the function is a closure). I guess we can think of this local namespace object as JavaScript's Activation Object. Let's see:


def create_fn():
    trapped = "aaa"
    def fn(param1):
        nonlocal trapped
        trapped = "AAAA"
        local1 = "bbb"
        print(f"fn local namespace: {locals()}")
    return fn

fn1 = create_fn()
fn1("ppppppp")

# fn local namespace: {'param1': 'ppppppp', 'local1': 'bbb', 'trapped': 'AAAA'}


As aforementioned, code executed by the exec()/compile() functions receives a snapshot of the local namespace of the invoking function, meaning that adding a variable or reassigning a variable in that snapshot will not have effect outside the exec() itself. I mean:


def declare_new_variable(param1):
    # creating a new variable or setting an existing variable in exec will not crash,, but it in the local namespace snapshot that it receives
    # but will not have effect in the original local namespace
    print(f"- {declare_new_variable.__name__}")
    # create new variable
    exec(
        "a = 'Bonjour'\n"
        "print('a inside exec: ' + a)\n"
    )
    # a inside exec: Bonjour

    p_v = "bbb"
    # assign to existing variable
    exec(
        "p_v = 'cccc'\n"
        "print('pv inside exec: ' + p_v)\n"
    )
    # pv inside exec: cccc
    
    print(f"locals: {locals()}")
    # locals: {'param1': '11111', 'p_v': 'bbb'}
    # the new variable "a" has not been created in the local namespace, and p_v has not been updated
	

And now the second part of the first paragraph, the FrameType.f_locals. I've been playing with it to learn that from a Python function we can traverse its call stack, getting references to a write-through proxy of the local namespace of each stack frame. This means that from one function we have access (read and write) to any variable in any of its calling functions (any function down in the stack), and even "sort of" add new variables. I'm using inspect.stack() to get access to the stack-chain, then freely move through it, get the stack-frame I want, and use f_locals to get that "write-through proxy" to its local namespace.



def child2():
    print("- enter child2")
    c2_v1 = "child2 v1"
    c2_v2 = 200
    print("child2")
    parent_locals = inspect.stack()[2].frame.f_locals
    print(f"parent_locals viewed from child2: {parent_locals}")
    print("modify existing parent variable, p_v1")
    parent_locals["p_v1"] = parent_locals["p_v1"] + "_modified"
    print("add variable p_v3 to parent")   
    parent_locals["p_v3"] = "extra var"   
    # remove variable this way fails:
    #del parent_locals["p_v2"] 
    # TypeError: cannot remove variables from FrameLocalsProxy
    print("- exit child2")


def child1():
    print("- enter child1")
    c1_v1 = "child1 v1"
    c1_v2 = 20
    child2()
    print("- exit child1")


def parent():
    p_v1 = "parent v1"
    p_v2 = 2
    print("before calling child")
    print(f"parent: {locals()}")
    child1()
    print("after calling child")
    # p_v1 has been updated and p_v3 has been added:
    print(f"parent: {locals()}")

    # I can see the updated value of this var
    print(f"p_v1: {p_v1}")

    #but trying to acces the new variable like this will fail:
    try:
        print(f"p_v3: {p_v3}")
    except Exception as ex:
        print(f"Exception: {ex}")


parent()

# before calling child
# parent: {'p_v1': 'parent v1', 'p_v2': 2}
# - enter child1
# - enter child2
# child2
# parent_locals viewed from child2: {'p_v1': 'parent v1', 'p_v2': 2}
# modify existing parent variable, p_v1
# add variable p_v3 to parent
# - exit child2
# - exit child1
# after calling child
# parent: {'p_v1': 'parent v1_modified', 'p_v2': 2, 'p_v3': 'extra var'}
# p_v1: parent v1_modified
# Exception: name 'p_v3' is not defined


As you can see at the end of the above code, adding new variables to a function through the f_locals has an odd behaviour. A new entry is created in the local namespace corresponding to that f_locals. We can see the variable with locals() (regardless of whether it was added by code deeper in the stack chain) but trying to access it directly by its name will fail. The new variable exists in the local namespace, but it seems as if the variable name does not exist, and yes it's just that, as explained by this post:

Functions are special, because they introduce a separate local scope. The variables inside that scope are fixed when the function is compiled (the number and names of the variables, not their values). You can see that by inspecting a function's .__code__.co_varnames attribute.
That fixed registry of variable names is what is used when names are looked up from inside the function. And that registry is not updated when you're calling exec.