Sunday 10 May 2020

Prevent Overfilling the Stack

Recursion can be terribly useful, but there is a problem with it. A deep recursion involves having many function calls "alive" (waiting for the last one to complete and do the return path). Each of these calls is a new frame pushed on the stack, and programs limit the size a thread stack can reach, meaning that occasionally we can reach that limit before completing, crashing our program. When our recursive function has nothing pending to do once the child call returns (the recursive call is the last thing done in our function) an optimization can be done, reusing the same stack for all calls, I'm talking about Tail Call Optimizations. Indeed, as you can read here there are 2 related options: Tail Call Optimizations (TCO) and Proper Tail Calls (PTC). Current JavaScript engines do not implement any of them, so if we run into a deep recursion we are fucked, our code will throw a Maximum call stack size exceeded exception (at least in node.js).

To avoid this we need a way to release our current stack frame before doing the recursive call, so that its stack frame replaces the current frame rather than being pushed on top. We have an obvious way to invoke a function after the current function ends, delay its invokation with setTimeout. So let's see an example of how to transform a recursive function to use setTimeout and avoid overfilling the stack.

We can calculate the factorial of a number in 2 ways, the most common one being:

function factorial(n) {
 return (n !== 0) ? n * factorial(n - 1) : 1;
}

In the above, we leave an operation to be done after the recursive call is done. We can transform it so that nothing is left for doing after ther recursive call.

function tailFactorial(n, total = 1) {
    //console.trace();
    if (n === 0) {
        return total;
    }    // proper tail call
    return tailFactorial(n - 1, n * total);
}

If the JavaScript runtime implemented PTC, using the second factorial function the stack would not crash regardless of the number of calls, but we can see that this is not the case and when using a big enough number of recursions (largeValue) we crash the stack in both cases:

const smallValue = 100;
const largeValue = 50000;

function runFactorial(factorialFn){
 let res = factorialFn(smallValue);
 console.log("factorial: " + res);

 //stack overflow:
 try {
  res = factorialFn(largeValue);
  console.log("factorial: " + res);
 }
 catch (ex){
  console.log("Exception: " + ex.message);
  //Maximum call stack size exceeded
 }
}

[factorial, tailFactorial].forEach(fn => runFactorial(fn));

/* Output:
factorial: 9.33262154439441e+157
Exception: Maximum call stack size exceeded
factorial: 9.332621544394418e+157
Exception: Maximum call stack size exceeded
*/

As I mentioned before, we can fix this by delaying the function call with setTimeout (so that the current stack frame gets freed). I'm returning a Promise that will get resolved once the last recursion is done:

function factorialWithTimeoutAndPromise(n, total = 1, resFn){
 //initial call 
 if (resFn === undefined){
  var pr = new Promise((res, rej) => resFn = res);
 }
 if (n === 0){
  resFn(total);
 }
 else{
  setTimeout(() => factorialWithTimeoutAndPromise(n-1, n * total, resFn), 0);
 }
 return pr;
}

The above works fine, but even using a timeout of 0, the operation takes quite time (almost 1 minute on my laptop). From what I've read this setTimeout(0) can cause up to 4ms delay:
someone told me that setTimeout(0) creates, on average, a 4 ms delay. They claimed that’s the time it takes to pop the callback off the stack, onto the callback queue and back on the stack again.

So we better find a balance between normal recursion (to keep things fast), and setTimeout (to avoid overfilling the stack). In this version below I do normal recursive calls up to 50, then schedule the next call and clean the stack, and so on.

function factorialWithTimeoutAndPromiseImproved(n, total = 1, counter = 0, resFn){
 //initial call 
 if (resFn === undefined){
  var pr = new Promise((res, rej) => resFn = res);
 }
 if (n === 0){
  resFn(total);
 }
 else{
  if (counter < 50){
   factorialWithTimeoutAndPromiseImproved(n-1, n * total, ++counter, resFn);
  }
  else{
   setTimeout(() => factorialWithTimeoutAndPromiseImproved(n-1, n * total, 0, resFn), 0);
  }
 }
 return pr;
}

(async() => {
 console.log("-- using Timeout");

 let options = [
  { 
   name: "Not optimized",
   value: smallValue,
   fn: factorialWithTimeoutAndPromise
  },
  { 
   name: "Not optimized",
   value: largeValue,
   fn: factorialWithTimeoutAndPromise
  },
  { 
   name: "Optimized",
   value: smallValue,
   fn: factorialWithTimeoutAndPromiseImproved
  },
  { 
   name: "Optimized",
   value: largeValue,
   fn: factorialWithTimeoutAndPromiseImproved
  }
 ];
 
 for (let option of options) {
  console.log(option.name + " factorial for: " + option.value);
  let start = Date.now();
  let result = await option.fn(option.value);
  console.log("result: " + result);
  console.log("it took " + (Date.now() - start) + "\n");
 }

})();

/* Output:
Not optimized factorial for: 100
result: 9.332621544394418e+157
it took 132

Not optimized factorial for: 50000
result: Infinity
it took 57441

Optimized factorial for: 100
result: 9.332621544394418e+157
it took 2

Optimized factorial for: 50000
result: Infinity
it took 1187

*/

I've uploaded all the code into a gist

No comments:

Post a Comment