Thursday 23 May 2013

Critical Exceptions

Last year I wrote a couple of posts [1] and [2] about exception handling and process crashes. I've learnt something new recently that somehow completes those previous posts, so here it goes.

For a new project on which I'm involved at work one of the requirements for the .Net based server side is that errors in certain parts of the system (let's call them plugins) should never bring down the whole process, let's call it "failure isolation". OK, .Net + units of isolation... Application Domains rapidly spring to mind. Well, such solution is wrong for this case.

With Application Domains you get quite a few things, like access isolation (code running in one Application Domain can run with restricted permissions), the chance to unload the Application Domain (and all its assemblies) from memory... but you don't get the kind of failure isolation we need here, as an Application Domain is not an "execution unit" like processes or threads. The code in the different Application Domains in your process is not associated to specific threads, one thread can run code in different Application Domains (it starts in one method, then call a method of an object residing in a different Application Domain... and so on). So an exception in an Application Domain will bubble up the thread's stack out of the Application Domain, that is, exceptions propagate through Application Domains, and an uncaught one will have effects beyond the Application Domain where it was thrown. This is so, even if we were running the code in that Application Domain in a new thread, cause as I already explained here an uncaught exception in any thread of a .Net application (not just in the main one) will bring down the whole process. To sum up, an unhandled exception in one Application Domain can crash the whole process.

OK, so in the end, regardless whether we need Application Domains or not for isolating groups of objects and granting different permissions and regardless whether we're creating new threads ... we'll need to make sure that any time we invoke code in those plugins we do it wrapped in a try-catch clause. Hum, it appears like the whole thing is much more simple than I firstly had reckoned. Well, this is not the case, as something so far unknown to me enters the scene: there are exceptions that can not be caught, they will just skip your catch block.

These exceptions are called Corrupted State Exceptions and are pretty well explained here. The CLR will mark certain exceptions as CSE ones (memory violations, stack overflows), and your normal catch blocks won't be able to catch them. You can test it easily by running something like this:

class Program
    {
        static void ThrowCriticalException()
        {
            int i = 0;
            Action ac = null;
            ac = () =>
            {
                Console.WriteLine("call: " + i++);
                ac();
            };
            ac();
        }
        static void Main(string[] args)
        {
            Console.WriteLine("started");
            try
            {
                ThrowCriticalException();
            }
            catch (Exception ex)
            {
                Console.WriteLine("Exception caught");
            }

            Console.WriteLine("Press key to exit");
            Console.ReadKey();
        }
    }
}

Notice that if you were throwing the StackOverflow yourself by doing a throw new StackOverflowException();, that case could be caught, this is, as the article explains, because it's the system who will mark an exception as a CSE one, and if it finds that it's been thrown by the user, it won't consider it critical. Also from the article it would seem like decorating your method with the [HandleProcessCorruptedStateExceptions] attribute would allow you to catch it, but I've tested and it's still no possible.

We should realize that not being able to catch these exceptions is a feature rather than a limitation, cause a program where invalid memory accesses have happened should not be allowed to continue to run. Notice that using a generic catch(Exception ex) is mostly a dangerous practice.

By the way, that msdn article corroborates what I'd just said in the first paragraphs:

Exceptions raised on a thread of execution follow the thread through native and managed code, across AppDomains, and, if not handled by the program, are treated as unhandled exceptions by the operating system.

A single thread in one AppDomain can bring down an entire CLR instance by not handling an exception

Also, though this should be obvious I'll put it here for further reference, Application Domains have no effect on Garbage Collection, I mean, Garbage Collection works at the process level, going across Application Domains.

No comments:

Post a Comment