AppDomains, Threads, CultureInfos And Paracetamol

Saturday, October 31, 2015 11:58 AM

It all started with the Umbraco installer hanging after the first application restart (due to web.config being updated), when running in the Visual Studio debugger. The install would not complete, the site would become unresponsive, and the server (IIS or IIS Express) process would use all of one CPU core until killed. When not running in the debugger, the condition would be quite rare and almost impossible to reproduce, though it would happen.

Breaking into the debugger shows a runaway thread currently trying to create a new System.Runtime.Caching.MemoryCache instance, and lost somewhere in .NET Framework code. Enabling Framework code debugging makes it possible to step into that code, and leads us to the root of the problem. The runaway thread is somewhere in PerformanceCounter constructor and more precisely in PerformanceCounterLib.IsCustomCategory(string, string). There, it is caught in a while loop that never ends (full code for IsCustomCategory is on the Reference Source website):

var culture = CultureInfo.CurrentCulture;
while (culture != CultureInfo.InvariantCulture)
{
  library = GetPerformanceCounterLib(machine, culture);
  if (library.IsCustomCategory(category))
    return true;
  culture = culture.Parent;
}

It is never exiting the loop because culture here is the invariant culture, but not the CultureInfo.InvariantCulture object. OK, it is generally considered bad practice to do reference comparisons of CultureInfo objects, as they are classes and not structs, but... isn't the invariant culture a special one that is supposed to be unique?

And in addition... why would it work when not debugging? Why would it fail now, even though a quick disassembly of Framework v2 code shows that the reference comparison was already there? Where is this strange invariant culture object coming from?

Looking into CultureInfo

When inspecting the culture object in the debugger, we see that a culture has a m_createdDomainID private field which contains the identifier of the domain that created the culture. Or so it seems. In reality, it contains the identifier of the domain where the culture was assigned to the current thread (see the code for Thead).

The debugger further tells us that the current domain is domain number 3, that the its static invariant culture m_createdDomainID value is zero, meaning it has never been assigned to a thread, and that the other invariant culture m_createdDomainID value is... 2! That object was created and assigned to a thread in another domain!

Uh... objects flowing from domains to domains?!

Just to be sure, try...

var culture = CultureInfo.GetCultureInfo("fr-FR");
Thread.CurrentThread.CurrentCulture = culture;

while (culture != CultureInfo.InvariantCulture) // .Parent is lazy
  culture = culture.Parent;
culture = culture.Parent;

var domain = AppDomain.CreateDomain("other");
var type = typeof (Helper);
var helper = (Helper) domain.CreateInstanceAndUnwrap(type.Assembly.FullName, type.FullName);
helper.Run(); // never returns

... with ...

public class Helper : MarshalByRefObject
{
  public void Run()
  {
    var culture = Thread.CurrentThread.CurrentCulture;
    while (culture != CultureInfo.InvariantCulture)
      culture = culture.Parent;
  }
}

... and that should put you in an infinite loop. Note that because the Parent property is lazy-initialized, we want to make sure we initialize it in the first domain, else it would be initialized in the second domain and therefore would belong to that domain.

AppDomain CultureInfo Leak

What happens is called a CultureInfo leak. I am not making that name up: it is used in comments in the reference source. It is subtly documented in the Application domains and cultures paragraph of MSDN page presenting Application Domains, which reads

[...] If the culture that is associated with a thread has been explicitly set by using the Thread.CurrentCulture property, it continues to be associated with that thread when the thread crosses application domain boundaries [...]

The idea is this. A process can host several AppDomain instances, each with their own set of assemblies, static objects, etc. and you probably have been taught that each domain was completely isolated from other domains.

However, that is not entirely true. There are things, such as the ThreadPool, which are cross-domains. In other words, there is only one ThreadPool per process and a CLR Thread object is shared by all domains.

Everything that is managed by AppDomain and Thread is carefully crafted as to not leak from domains to domains. Everything that is [ThreadStatic], for example, actually is static to one thread, in one domain. Everything, except cultures.

When a culture is assigned to a thread, eg with ...

Thread.CurrentThread.CurrentCulture
  = CultureInfo.GetCultureInfo("fr-FR");

... the culture is actually stored in a private field within the thread. And when that thread object is used by another application domain, the culture object goes with the tread and is visible in the other domain. Both domains see the same Thread instance, and the same CultureInfo instance.

And the fun begins when that instance is compared to CultureInfo static InvariantCulture property: precisely because it is static, there is one static invariant culture per domain.

If, in one domain, the invariant culture is assigned to the thread, it will be equal (as in, reference-equal) to the static InvariantCulture of that domain, but not to that of the other domain.

And before you try: there is an interesting mechanism in there that prevents cultures from leaking if they are not original CultureInfo instances, but only inherit from CultureInfo. It looks like culture leaks cannot be abused to implement super-fast inter-domain communication. What a pity ;-)

So, the Framework is Completely Broken™?

It is one of those situations that cannot be all black or white. The culture of a thread is not purely a Framework or CLR thing, and actually is an unmanaged property of the underlying OS thread. The Framework has to deal with it, and cannot simply "do the right thing".

The Framework implements workarounds in several places, and so does ASP.NET, etc. Just to name a few:

Async continuation

When some code is awaiting the result of an asynchronous method, and the method completes, the code resumes execution using a treadpool thread, which most probably will not be the thread that started the await, and so will not have the same culture. This can be worked around e.g. by:

var result = await GetResultAsync().WithCurrentCulture();

The actual code for WithCurrentCulture has been proposed by Stephen Toub back in 2011.

Alternatively, some StackOverflow posts (e.g. this one or this one) propose variants of a SynchronizationContext that deals with culture.

ThreadPool queueing

When work is queued on the ThreadPool via a call to QueueUserWorkItem the current culture does not flow to the thread that will be executing the callback. Again, StackOverlow points to various solutions to work around this. See also ThreadPoolExtensions in the MSBuild source code.

ASP.NET

ASP.NET does a lot to make sure that cultures sort of make sense. Figuring out details is left as an exercise for the reader.

What about us?

In our case however, it seems that despite all those workarounds, a foreign culture is leaking into our domain.

Enters OWIN

The last part of the puzzle is this: the infinite loop situation happens during our OWIN Startup method. And that method runs very early during the application initialization. It runs on a background thread, but not on a ThreadPool thread. Which means that nothing is sanitizing the thread's culture.

Q.E.D.

But why would the debugger make a difference? When running without the debugger, the thread that runs the startup code keeps changing on each restart. Of course, it is possible that at some point it has a foreign culture, but that would be an exception.

On the other hand, when running in the debugger, it is always the same thread (with ManagedThreadId being 1) that runs the startup code, so it is guaranteed to have a foreign culture.

Now what?

Another workaround is needed, that would replace the foreign culture by the equivalent culture, but one that is owned by the current domain. Our startup method now begins with:

app.SanitizeThreadCulture();

And that extension method looks like:

public static void SanitizeThreadCulture(this IAppBuilder app)
{
  var currentCulture = CultureInfo.CurrentCulture;

  // at the top of any culture should be the invariant culture,
  // find it doing an .Equals comparison ensure that we will
  // find it and not loop endlessly
  var invariantCulture = currentCulture;
  while (invariantCulture.Equals(CultureInfo.InvariantCulture) == false)
    invariantCulture = invariantCulture.Parent;

  if (ReferenceEquals(invariantCulture, CultureInfo.InvariantCulture))
    return;

  var thread = Thread.CurrentThread;
  thread.CurrentCulture = CultureInfo.GetCultureInfo(thread.CurrentCulture.Name);
  thread.CurrentUICulture = CultureInfo.GetCultureInfo(thread.CurrentUICulture.Name);
}

But it probably should be handled by OWIN itself, as it is impacting other applications such as SignalR (see SignalR issue #3414).

And now, about that Paracetamol...

comments powered by Disqus