Last year I wrote a post explaining how we can avoid stack overflows in javascript deep recursive functions by making the function async and placing an "await 'nothing';" before the recursive call (just read that post if you want a reminder). That works fine in JavaScript because you can await anything, not just a Promise (well indeed it will be converted in a Promise.resolve("anything")). Awaiting that already resolved promise will anyway suspend the current function, putting in the microtask queue the "continuation" function for that promise, that is, the call that will resume that function in the point after the await.
Things are a bit different in Python asyncio. You now that we have several types of awaitables (coroutines, Tasks, Futures...). In this post, when I say coroutines I'm referring to native coroutines, the ones that you define with the async keyword. I'm not talking about the coroutines that you could define in older versions of python by means of some decorator (honestly I have never used them). Native Coroutines return a Coroutine object, and its code will not run until you await for it. Additionally, they don't return a Future, they just get suspended/resumed in their await calls, and end up returning a value, but not as the result attribute of a Future created by the runtime. This is pretty different from JavaScript async functions, that always create a Promise that gets resolved to the value returned by the functions after its different suspensions/resumes. This is pretty important because a Python coroutine will suspend if the last coroutine in the stack of coroutine calls explicitely yields the control to the event loop. Coroutines are implemented as generators, so this "yield control" means yielding a Future or just None. When the future reaches the event-loop it will set a callback in the future (via add_done_callback) to resume the coroutine. This answer explains it nicely:
This part is a common misconception. Python's await doesn't mean "yield control to the event loop", it means "start executing the awaitable, allowing it to suspend us along with it". So yes, if the awaited object chooses to suspend, the current coroutine will suspend as well, and so will the coroutine that awaits it and so on, all the way to the event loop. But if the awaited object doesn't choose to suspend, as is the case with aprint, neither will the coroutine that awaits it.
There are several deep-dive articles that have helped me understand (maybe not fully, but at least to a "that's enough level") how coroutines are implemented by means of generators, yield-from, tasks and futures. This article is probably the most complete thing you'll find around. And this and this are also a good read. Bear in mind that as far as I've understood, for a chain of coroutine calls to run you will always have a Task that is controlling it, either directly created by you invoking asyncio.create_task, or created by the event-loop itself. When I mentioned above that coroutines yield futures to the eventloop, well, indeed the Future reaches the Task (that invoked the coroutine with task._step -> coroutine.send), so it's the Task who adds a callback to the Future. From the tenthousandmeters article:
What does task.__step() do with a future? It calls future.add_done_callback() to add to the future a callback that reschedules task.__step(). If the result is already available, the callback is invoked immediately. Otherwise, it's invoked when someone/something sets the result by calling future.set_result().
Being aware of those differences between Python and JavaScript, I thought I would not have problems to implement the "prevent stackoverflow" trick in Python as I did in JavaScript. Basically, instead of a simple "await 'whatever'" I would have to do an await asyncio.sleep(0)" that is the official way to return control to the eventloop. That way the current function gets suspended, meaning that we temporarily exit it, and when resuming it, I was assuming that the stack would be clean, meaning that the stack would not keep growing to an overflow. I was wrong. Given a simple recursive function:
def build_message(msg, counter):
"""
Normal recursion. Will overflow the stack at around iteration 1000
"""
if counter == 0:
return msg
# print(f"{counter}")
msg += "a"
return build_message(msg, counter-1)
If I write it in an async way like this, and invoke it with counter = 1000, it fails:
async def async_build_message_KO(msg, counter):
"""
Try to prevent the stackoverflow by making it a coroutine and suspending it via asyncio.sleep
It does not work
"""
if counter == 0:
return msg
sleeper = asyncio.sleep(0.001)
print(f"{msg[0]} - {counter}")
# print(f"{counter} - {type(sleeper).__name__}")
await sleeper
msg += "a"
return await async_build_message_KO(msg, counter-1)
The above code causes a stackoverflow once the stack reaches the 1000 items limit. I tried sleep(0.001) (not just with sleep(0)) to make sure that we are really suspending the execution for a (minimum) amount of time. I was quite confused, and googling around I found a solution, invoking the recursion via create_task. With the above I can invoke it with counter=5000 and it will work nicely. Notice that it works perfectly fine both if using tail recursion, as in the original sample, and if I move the "msg += a" operation after the recursive call. (notice that python does not feature tail-call optimizations)
async def async_build_message_OK(msg, counter):
if counter == 0:
return msg
print(msg[0])
# It works fine, both if implemented as Tail recursion:
# msg += "a"
# return await asyncio.create_task(async_build_message_2(msg, counter-1))
# or as Non Tail recursion:
msg = await asyncio.create_task(async_build_message_OK(msg, counter-1))
msg += "a"
return msg
OK, that works, nice, but why does my first approach fail? Well, it took me time to understand it, but it makes sense now. First, the stack managed by the python interpreter representing the python function calls is kept in the heap as a list of stackframe objects (then for sure the native CPython code of interpreter uses a "normal stack" but it manages the python code list of stackframes as a stack, with a limit of how many frames we can have in it). Second, generators (and as I've just said, native coroutines are based on generators) are implemented in python in a very interesting way:
the key thing to remember is that python generators encapsulate a stack frame and a code object. The stack frame is allocated on heap memory and holds a pointer to the last bytecode instruction that was run on a code object within the context of the stack frame. It is the last instruction pointer which tells the CPython interpreter which line to execute next when a generator function is resumed. These are the core building blocks over which generator functions thrive.
So each native coroutine is linked to a stack frame, and even if I'm suspending the coroutine first with await asyncio.sleep() as I resume it and do the recursive call, the coroutines stack up, along with their stack frames. When I stack up 1000 of them, the stack crashes.
Once we have managed to understand why the first attempt was failling, we shoud also try to understand why the second attempt, using create_task works! The thing is that in that case we have coroutines, but they dont form a stack of coroutine calls, because each of them has been started by a task, so in the end up with a chain of Tasks in the heap (with each task managing a coroutine), not with a chain/stack of couroutines. Those chained tasks (or Futures) are chained with callbacks set by add_done_callback and that will be invoked by the eventloop as the corresponding set_result is invoked. So the call to asyncio.create_task to execute the recursive call prevents the creation of a stack of coroutines. Then, when the last coroutines ends and returns a value it sets the result of its controlling task, so the eventloop schedules the Task that was controlling the coroutine that created that task (its task._step() method as I previously mentioned), and it will resume the coroutine, executing the "post-recursion code". This in turns finishes the coroutine, that sets the result of its controlling task, and so on.
Thinking it twice, this python solution is doing just what my JavaScript code from last year does implicitly. As JavaScript async functions always return a Promise, the recursive calls are not directly linked, but linked through those returned Promises, so no stack of calls and no stack overflow.
By the way, Javascript generators seem to be implemented in a very similar way to Python ones, as explained here. The data is kept between the different invokations (pause/resume) of the generator by means of maintaining the ExecutionContext object between calls, and a pointer to the current position in the code is also kept. Additionally, async-await in JavaScript seems also to be based on generators
This iterator object is used to control the generator function and keeps alive the execution context of the generator function. The iterator object exposes a [[GeneratorLocation]] property which I assume keeps track of where the program execution of the generator is paused.
The Python and JavaScript implementations are pretty interesting when you come from a C# background, because in .Net generators (oddly called iterators...) are implemented through tons of compiler magic that ends up creating sort of state machines.
No comments:
Post a Comment