Sunday 20 February 2022

itertools.tee implemented in JavaScript

Generator functions and Generator objects, iterators and iterables work in a quite similar way in Pyhton and JavaScript, though there are some differences that maybe I will address in a separate post. Both in JavaScript and Python generator functions return generator objects that are both iterables and iterators (so the generator object iterates itself). Because of this, you can iterate a generator object only once. Notice that I explained time ago that this works differently in C#.

Looking into the functionality provided by the Python itertools module I came across an intereting function, tee, that allows creating multiple independent iterators over the same iterable. These iterators point to a buffer around the original iterator. It's a nice idea and I've implemented it in JavaScript



function tee(iterable, num){
    let internalIterator = iterable[Symbol.iterator]();
    let buffer = [];

    //it's very interesting that a generator function can trap the outer variables in a closure, though
    //in the end what is being generated by the compiler is an object with a next() method, not a function that traps the activation object
    function* generatorFn(){
        let pos = 0;
        let finished = false;
        while (!finished){
            if (pos %lt; buffer.length){
                yield buffer[pos];
            }
            else{
                it = internalIterator.next();
                if (it.done){
                    finished = true;
                }
                else{
                    buffer.push(it.value);
                    yield it.value;
                }
            }
            pos++;
        }
    };
    
    let generatorObjs = []
    for (let i=0; i%lt;num; i++){
        generatorObjs.push(generatorFn());
    }
    return generatorObjs;
}


let generatorObj = (function*(){
    yield "a";
    yield "b";
    yield "c";
})()

let [iter1, iter2, iter3, iter4] = tee(generatorObj, 4);

console.log(iter1.next().value);

console.log(iter2.next().value);
console.log(iter2.next().value);
console.log(iter2.next().value);
console.log(iter2.next().value);

console.log(iter1.next().value);
console.log(iter1.next().value);
console.log(iter1.next().value);
for (it of iter3){
    console.log(it);
}

for (it of iter4){
    console.log(it);
}
/*
a
a
b
c
undefined
b
c
undefined
a
b
c
a
b
c
*/

After coding it (and uploading it to a gist) I realized of how the generators and closures machinery is even more impressive than I already knew. In the above code we have a generator function that captures in a closure the "internalIterator" and "buffer" variables in its lexical scope (well, it'll get them through the [[scope]] property...). The amazing thing is that as we know that "generator function" will create "generator objects" on which we will invoke the .next() method, and is really that .next() method who will be accessing the captured variables, so somehow the compiler has to translate the "closure behaviour" (that in this case is a sort of "virtual closure") to a normal object, adding I guess references to the trapped variables to each of the generator objects that it creates. Evenly impressive is the fact that the node.js debugger will show us everything at runtime as if we were just dealing with a normal closure.

Sunday 13 February 2022

async for vs for await

There's an important difference between how the async iteration constructs work in JavaScript (for await) and Python (async for), the ability (or lack of) to use it also to iterate a synchronous iterable/iterator.

Based on the idea that I mentioned in my previous post, the sloppy promise semantics, in JavaScript a for await loop can iterate both an asynchronous and a synchronous iterable. This is pretty coherent with the fact that we can also await for a non Promise value. The Iteration Protocols dictate that an Iterable object must have a method available with the Symbol.iterator key, that returns an Iterator object. This Iterator object must have a next method. For an Async Iterable we must have a Symbol.asyncIterator method, and the returned Async Iterator must have also a next method that returns a Promise.

A for await tries first to obtain an Async iterable checking if the object has a Symbol.asyncIterable, if that's not the case, it'll try to obtain a normal Iterator checking if the object has a Symbol.iterator. Then, as both iterables and asyncIterables have a next method, and as we know awaiting for a non awaitable value is perfectly fine, the Polymorphic behaviour is all set. Beautiful!.


let cities = ["Toulouse", "Lyon", "Xixon"];

async function* asyncCities(){
		for (let city of cities){
			yield await new Promise(res => setTimeout(() => res(city), 700));
		}
}


async function print(items){
	for await (let it of items){
		console.log(it.toUpperCase());
	}
}

(async () => {
	await print(cities);
	console.log("------");
	await print(asyncCities());
})();


That's not the case in Python. An async for will try to obtain an async iterator through the aiter() function (that invokes the __aiter__ method in the object). If the object lacks that __aiter__ method, we get a:
#TypeError: 'async for' requires an object with an __aiter__ method.
No attempt is done to invoke iter/__iter__. So writing code that supports both sync and async iterables is not so elegant:



import asyncio
import time

async def getCitiesAsync():
    print('getCitiesAsync')
    await asyncio.sleep(0.5)
    yield "Xixon"
    await asyncio.sleep(0.5)
    yield "Toulouse"
    await asyncio.sleep(0.5)
    yield "Paris"

def getCities():
    print('getCities')
    yield "Xixon"
    yield "Toulouse"
    yield "Paris"



async def printCities(cities):
    if hasattr(cities, "__aiter__"):
        async for city in cities:
            print(city)
    else:
        for city in cities:
            print(city)


async def main():
    await printCities(getCitiesAsync())



Sunday 6 February 2022

async/await JavaScript - Python differences

I've been working a bit with async/await in Python lately, and there are some slight differences with JavaScript that I will document here.

Python's asynchronous machinery is provided by the asyncio module. We have coroutines, Tasks and Futures, I already talked about it, and I won't copy/paste the documentation, I'll just say that most of the time you will just work with coroutines (coroutine functions are those declared with async and they return a coroutine object. We can think of coroutine objects as JavaScript Promises.

JavaScript runtimes start an event loop on their own (nothing runs in JavaScript without an event loop), but in Python you have to start it yourself by calling asyncio.run(). Python's event-loop and coroutines are based on generators, and it does not use a ThreadPool like .Net does. This is well explained here and here

OK, so let's go now to the practical stuff, those minor differences with JavaScript

The most important difference is that coroutines are Lazy. Invoking a coroutine function returns a coroutine object, but the code that we wrote inside the function won't be executed until an await is done on that coroutine. In Javascript the code inside an async function starts to run as soon as we invoke the function.


import asyncio
import time

async def test():
    print("inside test")
    aux = await asyncio.sleep(1)
    return "test result"

async def main():
    print('inside main')
    aux = asyncio.sleep(1)
    print(type(aux)) #<class 'coroutine'>
    await aux
    print('next')
    aux = test()
    print(type(aux)) #<class 'coroutine'>
    #if we don't do an await aux, "test" is never executed
    #and we get a warning:
    #RuntimeWarning: coroutine 'test' was never awaited


#very interesting, the code in the "main" coroutine is not started until we call asyncio.run
#(I assume inside asyncio.run there is an await)
#it's different from javascript
cr = main()
print(type(cr))
time.sleep(3)
asyncio.run(cr)

In Javascript we can await for a non promise value. I mean, we can do:
let a = await "yyy";.
What happens here is that the JavaScript engine does a call to Promise.resolve(x) for each "await x;" sentence that it finds. If "x" is a Promise, it just returns that Promise, else, it returns a Promise that resolves to x.
This is not like that in Python. Awaiting for a non awaitable (coroutine, Task or Future) value throws an exception:


a = await "a"
#TypeError: object str can't be used in 'await' expression

In JavaScript when we await for a Promise that resolves to another Promise, the await will await for that internal promise, and so on...


(async () => {
	let pr = new Promise(resFn => setTimeout(() => {
		console.log("resolving first Promise");
		resFn(new Promise(resFn2 => setTimeout(() => {
				console.log("resolving second Promise");
				resFn2("Bonjour!");
		}, 3000)));
	}, 2000));
	let txt = await pr;
	console.log(txt);
	//after 5 seconds Bonjour gets printed
})();

This is not like that in Python. If a coroutine resolves to another coroutine, it's that second coroutine what await will "return"


import asyncio

async def __getCity__():
    await asyncio.sleep(1)
    return "Paris"


async def getCity():
    await asyncio.sleep(0.5)
    print("after first sleep")
    #if the coroutine resolves to another coroutine, the "await" in "main" calling "getCity" does not wait for the internal one (contrary to JavaScript)
    return __getCity__()
    
    #so we should write this line rather than the one above
    #return await __getCity__()


async def main():
    
    city = await getCity()
    print(city)

asyncio.run(main())

#after first sleep
#<coroutine object __getCity__ at 0x7fad74b34540>


We can say that for the 2 previous cases JavaScript works differently because of what seems to be known as the sloppy promise semantics