ZpqrtBnk

Deadlocking Sql Server Compact 4.0

Posted on May 12, 2016 in dotnet , umbraco

As part of the Umbraco NuCache / v8 effort, we are cleaning up the services that manage content and media types and items. This includes introducing some database-level distributed locks, to ensure that some operations we do on eg the content tree are totally isolated from each other. Being able to delete a content type right when a content item of that type is saved... is a bad idea.

The idea was to execute our changes within a RepeatableRead transaction and to use special nodes in the umbracoNode table as locks. In a RepeatableRead transaction, you know that any table record you have read will not change, ie you can read it again and get exactly the same values. This is achieved by acquiring locks: a shared lock when reading, and an exclusive lock when writing.

So, before we would read anything in umbracoNode we would first read a special record. And before we would write anything to umbracoNode we would first update that special record. In effect, it guarantees that many can read at a time, but only one can write.

When things go wrong

And it worked quite well. Until we started to stress-test this locking mechanism in v8. And then it started to fail and throw exceptions such as:

System.Data.SqlServerCe.SqlCeLockTimeoutException (0x80004005): SQL Server Compact timed out waiting for a lock. The default lock time is 2000ms for devices and 5000ms for desktops. The default lock timeout can be increased in the connection string using the ssce: default lock timeout property. [ Session id = 27,Thread id = 52832,Process id = 224,Table name = umbracoNode,Conflict type = u lock (x blocks),Resource = RID: 1032:33 ]

This was with 20 concurrent threads, trying to lock the content tree (via the new mechanism) and then insert a content node. No obvious reason for it to fail.

Increasing the lock time-out (as described here) would not do any good. When waiting longer on locks does not help, it is time to think about deadlocks.

But how do you troubleshoots locks in Sql Compact? Contrary to Sql Server, It is kind of a black box that you cannot really inspect and analyze while it is being used. It turns out though that you can display locking information in Sql Server Compact 4.0 and so the following code was added at the beginning of the failing test:

new Thread(() =>
{
  for (var i = 0; i < 10; i++)
  {
    var db = ApplicationContext.Current.DatabaseContext.Database;
    var info = db.Query<dynamic>("SELECT * FROM sys.lock_information;");
    Console.WriteLine("LOCKS:");
    foreach (var row in info)
    {
      Console.WriteLine($"> {row.request_spid} {row.resource_type} {row.resource_description} {row.request_mode} {row.resource_table} {row.resource_table_id} {row.request_status}");
    }
    Thread.Sleep(500);
  }
}).Start();

It dumps the current lock information every 500ms while our test runs.

Aha!

And this is what came out:

LOCKS:
[...]
> 10 DB  iu   GRANT
> 10 TAB  iu umbracoNode 1029 GRANT
> 10 PAG (data) 1032 iu umbracoNode 1029 GRANT
> 10 PAG (idx) 1031 s umbracoNode 1029 GRANT <-- holding s-lock on index (due to read)
> 10 RID 1032:33 u umbracoNode 1029 WAIT <-- waiting for u-lock on the lock record
> 10 MD  Sch-s umbracoNode 1029 GRANT
[...]
> 27 DB  ix   GRANT
> 27 TAB  ix umbracoNode 1029 GRANT
> 27 PAG (idx) 1031 x umbracoNode 1029 WAIT <-- waiting for x-lock on index (due to insert)
> 27 PAG (data) 1215 x umbracoNode 1029 GRANT
> 27 PAG (data) 1032 ix umbracoNode 1029 GRANT
> 27 RID 1032:33 x umbracoNode 1029 GRANT <-- holding x-lock on the lock record
> 27 RID 1032:41 x umbracoNode 1029 GRANT <-- holding x-lock on inserted record
> 27 MD  Sch-s umbracoNode 1029 GRANT
[...]

What we see here is... a deadlock. Request #10 wants to lock the "lock record" and has acquired a lock on an index page. Request #27 has acquired an exclusive lock on the "lock record", then it has inserted a record, and now it wants to lock the index page to complete the insertion.

Obviously our locking mechanism is not going to work, because it is not taking indexes in account.

Fixing

Probably, using the umbracoNode table for locks is not such a good idea. Too many things happen in that table. A new umbracoLock table has been introduced, which exclusively contains lock records. Things are now clearly isolated, and our test is happily successful even with 20+ concurrent threads.

There used to be Disqus-powered comments here. They got very little engagement, and I am not a big fan of Disqus. So, comments are gone. If you want to discuss this article, your best bet is to ping me on Mastodon.