Daniel Irvine on building software
Our .NET AppDomain nightmare
15 January 2013
My team has been struggling with AppDomain support over the past couple of days and I wanted to tell our story.
AppDomain support in .NET is very interesting and has a variety of uses. In our case, we have a program that acts as a host for user-written modules. In addition to being “AppDomain isolated”, these user-written modules can be “process isolated”, meaning hosted in standard Windows processes. The user (i.e. the person writing the module) gets to choose. The advantage to AppDomain isolation over process isolation is faster initialization and higher throughput for communication between modules.
Unfortunately I misunderstood how AppDomains work and this mistake has come back to haunt me. I had assumed that AppDomains behave like processes when their work is done: once a process has no further instructions to complete, it exits. I thought that an AppDomain would similarly unload once all its active threads had finished. This isn’t true; AppDomains only unload when you tell them to unload. So if you want a child AppDomain to unload itself once it’s finished, it must execute this line:
AppDomain.Unload(AppDomain.CurrentDomain);The host AppDomain could also unload the child AppDomain, but the child would still need to have some API to signal to the host that its work was done.
We didn’t think this was necessary, and so our user-written modules don’t have any signal that they are finished. They are written just like any standard .NET executable, which quits when the last instruction is executed. In the past two years we’ve had numerous modules written in this way and delivered to end-users.
Everything went swimmingly until we added a feature that required the host to perform post-processing after a module had finished its job. Great, we thought, we’ll just hook on to the AppDomain.DomainUnload event and wait until it’s fired. Except it was never fired and that’s when I realized my error.
So how about periodically checking for an “unused” AppDomain? That’s difficult, because there’s no way to enumerate all threads running in an application and no way to determine all threads which hold references to the AppDomain. I even thought about counting the number of threads before any AppDomains started but of course this is ridiculous, as the CLR has its own set of threads under its control that could increase or decrease at any time. There’s also the possibility of requiring user modules to use a wrapper when creating Threads, allowing them to be tracked, but that is also equally ridiculous since it’d be almost impossible to enforce.
Since this is so late in the game I can’t change the semantics of the user module life-cycle--in other words I can’t now insist that modules signal that they have finished, since all the existing modules would break.
I’m sure we’ll find a solution soon enough.