The Invisible Load Balancing

Posted on December 21, 2016 in umbraco

The question ususally goes like this:

I am running a standard, standalone, Umbraco site. Why am I seeing activity on the umbracoCacheInstruction table? Why is the database server messenger processing instructions? Why is Umbraco behaving as if it were load-balancing? I am not load-balancing anything!

The answer is simple: yes you are.

It's all about Domains

Umbraco is an ASP.NET application and as such follows the ASP.NET architecture. Essentially: the web server (IIS) hosts one w3wp.exe process per application pool. And each application pool hosts one application domain per running website. The result can be summarized as: one website, one domain.

For those interested, details about processes and domains can be found in the Umbraco log:

2016-12-21 09:12:40,639 [P15108/D41/T82] INFO ...

Here, P15108 means "the process with identifier 15108" (which you should be able to view in the Task Manager), and D41 means "the domain with identifier 41". To be complete, T82 means "the Clr thread with identifier 82". If you use the Task Manager to kill the w3wp.exe process, you should see IIS start a new process automatically, and P15108 turn into something else in the log.

What happens then, when the website needs to restart, maybe because you have modified its web.config or some dlls in the ~/bin directory? Restarting the process itself would restart all other websites running on the application pool, so it is not appropriate.

Instead, the w3wp.exe process starts a new domain for the website, and redirects all requests to that new domain, while stopping and removing the old domain. And this is where it gets interesting: although w3wp.exe guarantees that only one domain at a time receives the requests, it does not prevent the two domains from overlapping, and they may be executing code at the same time.

For example, the old domain may be processing a long-running request, while the new domain starts processing new requests.

Why should we care?

Umbraco runs some in-memory caches that need to be maintained up-to-date. For example, when Umbraco starts (as in, a new domain), it reads the entire Xml cache from the umbraco.config file. And, whenever a document is published, it (eventually) writes the changes to the file.

Now, what-if the old domain is publishing some content, and writes to umbraco.config, but after the new domain has read its Xml cache? The end result would be that the new domain does not see the changes, and runs with a corrupted cache.

In that particular case, you might imagine solutions such as watching the file for changes... but we have other, more complex situations, to deal with. Such as... what happens if the two domains decide to process the scheduled publishing at the same time? Or rebuild the Examine indexes?

It is all, really, an edge case. The old domain is not receiving any new request, so it is really only finishing processing the "current" requests. The chances of a problem were slim. Which means, according to Mr Murphy, that problems could occur—and they did.

Load-Balance All The Things

In order to deal with that temporary concurrency, Umbraco relies on a mechanism that ensures that only one domain at a time can be the "main domain". When a new domain starts, it acquires the main domain status, by stealing it from the old domain. And some activities (such as processing the scheduled publishing) only take place on the main domain.

Practically, the new domain acquires the main domain status, and only then reads the umbraco.config file. In the meantime, the old domain loses the main domain status, and is not authorized to write to umbraco.config anymore. But it still writes instructions into the in-database load-balancing mechanism, and these instructions are processed by the new domain, thus keeping everything up-to-date.

In a similar way, if the old domain modifies some important entity such as a content type, it uses the load-balancing mechanism to notify the new domain. And thus, the mechanism (including the database server messenger, instructions table, etc.) need to be running even on single-server sites.

And this is why, even though your site runs standalone on one single server, it is indeed load-balancing, during short periods of time!

There used to be Disqus-powered comments here. They got very little engagement, and I am not a big fan of Disqus. So, comments are gone. If you want to discuss this article, your best bet is to ping me on Mastodon.