Thursday, 29 August 2024

Getting Source Code at Runtime

One surprising feature present both in JavaScript and Python is that we can get access at runtime to the source code of a function. The internals are a bit different, and from that stems the fact that in Python this feature is slightly more limited than in JavaScript. Let's see.

In JavaScript user defined functions have a toString() method that returns a function's source code (including comments).


function sayHi(name) {
    name = name.toUpperCase();
    console.log(`Hi ${name}`)
}

console.log(sayHi.toString())
// function sayHi(name) {
//     name = name.toUpperCase();
//     console.log(`Hi ${name}`)
// }


For "native" functions or bound functions (obtained with .bind()) toString will just return: function xxx() { [native code] }.


function prepend(pr, msg) {
    return `${pr}${msg}${pr}`;
}
let fn = prepend.bind(null, "||");

console.log(`- bound function: ${fn.toString()}`);
//- bound function: function () { [native code] }

Notice that toString() works perfectly fine for functions created at runtime via eval():


let fn4 = eval("() => console.log('hi');");

console.log(fn4.toString());
//() => console.log('hi')


toString also works for classes, returning the whole class. It feels a bit surprising to me, cause under the covers classes are syntax sugar, and for a class Person what we really have is a Person function that corresponds to the constructor. So indeed, I don't know how we would get just the constructor code (other than extracting its substring from MyClass.toString()).


class Person {
    constructor(name) {
        this.name = name;
    }

    walk() {
        console.log("I'm waking")
    }
}

console.log(Person.toString())
// class Person {
//     constructor(name) {
//         this.name = name;
//     }

//     walk() {
//         console.log("I'm waking")
//     }
// }

console.log(typeof Person); 
//function


So it seems that when we define a function its source code is stored in some internal property of the corresponding function object. Checking the Language Specification it seems function objects have [[SourceText]] internal slot for that.

Things in Python are a bit different. We can obtain the source code of non native functions, classes and modules, but with some limitations, basically: getsource only works if it can open the file the source code exists in. The functionality is provided by the inspect.getsource() function. inspect is a standard python module, but the implementation feels like a bit "hackerish", like a functionality that was not initially intended and was added by means of leveraging some low level details. I've just said that in JavaScript functions have an slot pointing to its source code. This is not so straightforward in Python.

In Python a Function object has an associated code object (attribute __code__) that gives us access to the code of that function (through the co_code attribute). But that's a bytes object containing the Python bytecodes, not the Python source code. The __code__ object has 2 extra attributes: co_filename (with the full path to the python module where the function is defined) and co_firstlineno (with the line in that file where the function starts) (this is well explained here. So if we have the file where that function was defined, inspect.getsource can extract its source code, like this:


def format(txt: str):
    return f"[[{txt.upper()}]]"

print(inspect.getsource(format))
print(f"filename: {format.__code__.co_filename}")
print(f"firstlineno: {format.__code__.co_firstlineno}")

# def format(txt: str):
#     return f"[[{txt.upper()}]]"

# filename: /media/ntfsData/@MyProjects/MyPython_3.10_Playground/inspect/inspect_tests.py
# firstlineno: 27

This technique won't work for functions defined dynamically with exec-eval. There's not a file from which to get the source, and we'll get an exception: OSError: could not get source code.


format2_st = """
def format2(txt: str):
    return f'[[{txt.upper()}]]'
"""

def create_function(fn_st, fn_name):
	exec(fn_st)
	return eval(fn_name)

format2 = create_function(format2_st, "format2")
print(format2("aaa"))
# [[AAA]]

try:
    print(inspect.getsource(format2))
except Exception as ex:
    print(ex)
    # OSError: could not get source code

print(f"filename: {format2.__code__.co_filename}")
print(f"firstlineno: {format2.__code__.co_firstlineno}")

# [[AAA]]
# could not get source code
# filename: 
# firstlineno: 2
-----------------

inspect.getsource can also get the source code of a class. Classes do not have an associated code object, so the technique used has to be a bit different. You can check the inspect.py source code if you feel much intrigued.


class Person:
    def __init__(self):
        super().__init__()
        print("Person.__init__")

    def say_hi_to(self, to: str):
        return f"{self.name} says Hi to {to}"
        
print(inspect.getsource(Person))

# class Person:
#     def __init__(self):
#         super().__init__()
#         print("Person.__init__")

#     def say_hi_to(self, to: str):
#         return f"{self.name} says Hi to {to}"

By the way, inspect.getsource() can retrieve its own source code! nice :-)


inspect.getsource(inspect.getsource)
Out[14]: 'def getsource(object):\n    """Return the text of the source code for an object.\n\n    The argument may be a module, class, method, function, traceback, frame,\n    or code object.  The source code is returned as a single string.  An\n    OSError is raised if the source code cannot be retrieved."""\n    lines, lnum = getsourcelines(object)\n    return \'\'.join(lines)\n'

Wednesday, 21 August 2024

Python Pipes and Infix Notation

I recently came across a discussion about adding support for UFCS to Python. As expected, such idea was dismissed, but it gave me some good food for thought, as I had never heard before about this Uniform Function Call Syntax thing. So the idea is:

Allows any function to be called using the syntax for method calls (as in object-oriented programming), by using the receiver as the first parameter and the given arguments as the remaining parameters.

No major language implements this feature. In Kotlin we have a couple of things slightly related: Extension Functions and Function Types with Receiver. I can get a reference to an existing function and type it as a Function Type with Receiver, and in that case I can invoke that function reference both through a receiver, or passing it over as the first parameter.

The main interest of UFCS for me is for chaining function calls (aka pipes) and for that I would prefer a pipe operator. In this previous post I mentioned a very interesting project where they use this a combination of 2 operators: |> as a pipe operator. For that to work functions using that new operator have to be decorated with a decorator that will rewrite its source. That's a pretty amazing and crazy approach, but there's a more simple one, using a wrapper object (to wrap the functions being piped) that overrides the binary operator | (__ror__). There's an amazing project that does that, and more! The main Pipe class is just like this:


class B:
    def __init__(self, f=None, *args, **kw):
        self.f = f
        self.args = args
        self.kw = kw


class Pipe         (B): __ror__ = lambda self, x: self.f(x, *self.args, **self.kw)


And we use it like this:


def transform(st: str) -> str:
    return st[0].upper() + st[1:]

def clean(st: str, to_remove: list[str]) -> str:
    for it in to_remove:
        st = st.replace(it, "")
    return st

def wrap(st, wrapper: str) -> str:
    return f"{wrapper}{st}{wrapper}"

print("this is not Asturies" 
    | Pipe(transform)
    | Pipe(clean, ["not"])
    | Pipe(wrap, "--")
)

# --This is  Asturies--

Let's see a typical use case with iterables, mapping, filtering...



cities = ["Paris", "Prague", "Lisbon", "Porto"]
print(cities 
    | Pipe(lambda items: map(str.upper, items))
    | Pipe(lambda items: filter(lambda x: x.startswith("P"), items))
    | Pipe(list)
)

# ['PARIS', 'PRAGUE', 'PORTO']

That's a bit verbose, so to improve that the (very) smart guy behind pipe21 added extra classes for all the common use cases (Filter, Map and many more). With that, we can rewrite the above code like this:


cities = ["Paris", "Prague", "Lisbon", "Porto"]
print(cities 
    | Map(str.upper)
    | Filter(lambda x: x.startswith("P"))
    | Pipe(list)
)

# ['PARIS', 'PRAGUE', 'PORTO']

I should mention that the almighty coconut language comes with support for pipes (indeed it supports multiple pipes operators).

Kotlin supports operator overloading, but it has a more limited set of operators that can be overloaded, and "|" is not among them, so it seems we can not port this approach.

Related to this I've come across another pretty amazing use of operator overloading (and the callable concept), enabling infix notation for function calls. I copy-paste below the code:


from functools import partial

class Infix(object):
    def __init__(self, func):
        self.func = func
        
    def __or__(self, other):
        return self.func(other)
        
    def __ror__(self, other):
        return Infix(partial(self.func, other))
        
    def __call__(self, v1, v2):
        return self.func(v1, v2)

>>> @Infix
... def add(x, y):
...     return x + y
...
>>> 5 |add| 6
11

>>> instanceof = Infix(isinstance)
>>>
>>> if 5 |instanceof| int:
...     print "yes"
...
yes

>>> curry = Infix(partial)
>>>
>>> def f(x, y, z):
...     return x + y + z
...
>>> f |curry| 3

>>> g = f |curry| 3 |curry| 4 |curry| 5
>>> g()
12        
        

I find particularly appealing the isinstance example. We are converting the isinstance function in a sort of instanceof operator that being used to javascript instanceof operator, and kotlin is operator (in Kotlin, as in JavaScript, reference equality is checked with ===) seems more natural to me. This idea is pretty amazing cause we are using operator overloading (that can only be applied to a predefined set of operators) to create sort of new operators!!!

Monday, 12 August 2024

Python Return From Generator

In my previous post I talked about returning values (along with yielding) from generators and said that it also applied to Python. While writing the Python equivalent to the code in that post I've realised that there are more differences than I expected.

Let's start by "normal" (synchronous) iterators. The most important difference is that Python iterators yield a value in each iteration (rather than a done-value pair as JavaScript does) and throw an StopIteration exception when they reach its end (while JavaScript yields a {done: true, value: undefined}). If we return a value from the generator it's set in the value attribute of the exception. We don't have access to that exception in for-in loop, and if we do an extra next() call once outside the loop, it throws an exception again, but this time with value set to None (so this behaviour is equivalent to the JavaScript one). This means that if we want access to that value, we either use a while loop, or use a class (that as in my previous post I called AsyncGeneratorWithReturn) that wraps that loop for us. So we end up with this code (notice how in __iter2__ we leverage the yield from construct)


from typing import Any, Callable

def citiesGen():
    yield "Paris"
    yield "Porto"
    return "Europe"


print("- for of loop")
cities = citiesGen()
for city in cities:
    print(city)

try:
    print(next(cities))
except StopIteration as ex:
    print(f"return value: {ex.value}")
    
# Paris
# Porto
# None

#that's the same behaviour as in JavaScript, once it has generated the "end" (first StopIteration Exception with the value attribute set)
#the next calls raise StopIteration with value as None

print("- while loop")
#this works fine:
cities = citiesGen()
while True:
    try:
        city = next(cities)
    except StopIteration as ex:
        print(f"return value: {ex.value}")
        break


print("- GeneratorWithReturn")

class GeneratorWithReturn:
    def __init__(self, generator_fn: Callable, strategy: int):
        self.generator_fn = generator_fn
        self.result = None
        self.strategy = strategy

    # this works fine
    def __iter1__(self):
        cities = self.generator_fn()
        while True:
            try:
                yield next(cities)
            except StopIteration as ex:
                self.result = ex.value
                break

    # but it can be refactored to just one line!
    def __iter2__(self):
        self.result = yield from self.generator_fn()        


    def __iter__(self):
        return getattr(self, f"__iter{self.strategy}__")()    


for strategy in (1, 2):
    cities = GeneratorWithReturn(citiesGen, strategy)
    for city in cities:
        print(city)
    print(f"return value: {cities.result}")


I said in my previous post that this idea of yielding and finally returning a value seemed mainly interesting to me for async iteration. We know that same as JavaScript Python features async generators (and async for loops). And the interesting thing is that they come with a limitation, we can not return values from them (we get a #SyntaxError: 'return' with value in async generator) and we can not use yield from (the equivalent to JavaScript's yield *). However, this is not a big deal, we can easily workaround the limitation by yielding an object of a particular class (that I've called GeneratorReturn) that wraps the return value, and wrapping the async for in an AsyncGeneratorWithReturn class that takes care of this return value (rather than taking care of a StopAsyncIteration exception).



import asyncio

class GeneratorReturn:
    def __init__(self, value):
        self.value = value

class AsyncGeneratorWithReturn:
    def __init__(self, agen_fn):
        self.agen_fn = agen_fn
        self.result = None

    async def __aiter__(self):
        agen = self.agen_fn()
        while True:
            if isinstance(res := await anext(agen), GeneratorReturn):
                self.result = res.value
                return
            else:
                yield res


async def designVacations():
    destinations = []
    await asyncio.sleep(1)
    destinations.append("Paris")
    yield "destination found: Paris"
    await asyncio.sleep(1)
    destinations.append("Porto")
    yield "destination found: Porto"
    	#return (" -> ").join(destinations)
    	#SyntaxError: 'return' with value in async generator
    yield GeneratorReturn((" -> ").join(destinations))


async def async_main():
    print("- AsyncGeneratorWithReturn");
    vacationsProcess = AsyncGeneratorWithReturn(designVacations)
    async for city in vacationsProcess:
        print(city)
    print(f"return value (travelPlan): ${vacationsProcess.result}")

    # destination found: Paris
    # destination found: Porto
    # return value (travelPlan): $Paris -> Porto

if __name__ == "__main__":
    asyncio.run(async_main())


It's interesting to notice that while Javascript for await can be used both with asynchronous and synchonous iterators, Python async for only works with asynchronous iterators. In Javascript the iteration method of asynchronous and synchronous iterators is called next, while in Python we have 2 different methods: anext() and next(). All this is related to the fact that while in JavaScript we can await for a Promise and for a normal value, in Python we can only await for an awaitable (coroutine, Future or Task).

Monday, 5 August 2024

JavaScript Return From Generator

Recently I came across a discussion about using the return value from a a generator. This applies both to Python and JavaScript, but in this post I'll focus on JavaScript. We know that generators yield values, but they can also return a value. So far I had only used return statements in generators to finish/close on condition, just a "return;" without a value. But you can do a "return x;" and that x will show up in the object returned in the last iteration (the one that indicates that the iteration has finished). That means that rather than the typical {done: true, value: undefined} you'll get a {done: true, value: x}. The thing is that the most common way to iterate a generator, the for-of loop, won't give us access to that value. Let's see an example:


function* citiesGen() {
    yield "Paris";
    yield "Porto";
    return "Europe";
}

console.log("- for of loop");

cities = citiesGen();
for (let city of cities) {
    console.log(city);
} 
console.log(cities.next())
// Paris
// Porto
// { value: undefined, done: true }

We iterate the yielded values, and the loop stops when the generator returns a {done: true, value: "Europe"}, but the loop does not give us acces to that value. If after the loop we invoke next() again, it will return this object: {done: true, value: undefined}, the value is undefined, no longer "Europe", so we've lost it.

To circunvent this problem we can use a less elegant while loop rather than a for-of, like this:


let city;
cities = citiesGen();
while (!city?.done) {
    city = cities.next();
    if (!city.done) {
        console.log(city.value);
    }
}
console.log(`return value: ${city.value}`) 

That works fine, but that while loop looks a bit ugly. We can wrap that logic into a class implementing the Iteration Protocols and providing access to the returned value. That way we can use a for-of loop, like this:


class GeneratorWithReturn {
    constructor(generatorFn) {
        this.generatorFn = generatorFn;
        this.result = undefined;
    }

    *[Symbol.iterator]() {
        let item;
        let generatorOb = this.generatorFn();       
        while (!item?.done) {
            item = generatorOb.next();
            if (!item.done) {
                yield item.value;
            }
        }
        this.result = item.value;        
    }
}

cities = new GeneratorWithReturn(citiesGen);
for (let city of cities) {
    console.log(city);
} 
console.log(cities.result);

//Paris
//Porto
//return value: Europe 

That looks much better, but we can rewrite our iterator in a very concise way by leveraging the yield* operator, that delegates to another iterable object. The very interesting thing is that with yield* we get acces to the yielded values, and also to the returned value, like this:


class GeneratorWithReturn {
    constructor(generatorFn) {
        this.generatorFn = generatorFn;
        this.result = undefined;
    }

    *[Symbol.iterator]() {
        this.result = yield* this.generatorFn();        
    } 
}

cities = new GeneratorWithReturn(citiesGen);
for (let city of cities) {
    console.log(city);
} 
console.log(cities.result);

//Paris
//Porto
//return value: Europe

This feature of returning a value from a generator does equally apply to async iterators. And this is really useful, cause while I don't see particular real use cases for returning a value in normal iterators, in async iterators it allows us to express very nicely the idea of an async function that produces (yields) intermediate values (for example completion percentages) and a final return value. Imagine we have an asynchronous process that calculates several destinations to return a travel plan. Apart from that final "travel plan" we want to get each destination as it gets "calculated. We can express it with an async generator like this:


function aSleep(delay) {
    return new Promise(resolve => setTimeout(resolve, delay))
}

async function* designVacations(){
    let destinations = []
    await aSleep(1000);
    destinations.push("Paris");
    yield "destination found: Paris";
    await aSleep(1000);
    destinations.push("Porto");
    yield "destination found: Porto";
    return destinations.join(" -> ");
}


We'll use it with an adapted version of the class that we've just seen, but that this time providing the Asynchronous Iteration


class AsyncGeneratorWithReturn {
    constructor(generatorFn) {
        this.generatorFn = generatorFn;
        this.result = undefined;
    }

    // notice that though I'm not using await inside the function I have to mark it as async
    // otherwise I get an error: TypeError: yield* (intermediate value) is not iterable
    async *[Symbol.asyncIterator]() {
        this.result = yield* this.generatorFn();        
    } 
}


(async function asyncMain() {
    vacationsProcess = new AsyncGeneratorWithReturn(designVacations);
    for await (let city of vacationsProcess) {
        console.log(city);
    } 
    console.log(`return value (travelPlan): ${vacationsProcess.result}`);
}

// destination found: Paris
// destination found: Porto
// return value (travelPlan): Paris -> Porto


Notice that we use yield* to delegate to another async iterator, there's not a await yield* kind of operator. This is pretty nice as the same operator works for delegating to normal or async iterators. Another point to notice is that though we don't have any await in our asyn generator method, it has to be marked as async: async *[Symbol.asyncIterator]().