Optimizing CSharp Null-Coalescing Operator

Posted on November 21, 2020 in dotnet

Consider the following code:

public Func<int> F { get; set; }

private int GetValue()
    // return the value produced by F, 
    // if F is not null, else zero

I naïvely implemented the function as:

if (F == null) return default;
return F();

Does the job.

Elegant Code

Yet, ReSharper squiggled the if statement, suggesting I might want to convert to:

return F == null ? default : F();

Which is nice indeed. Ah, ReSharper now squiggles the == null statement, suggesting I merge the conditional expression into:

return F?.Invoke() ?? default;

And... I started to wonder. It certainly is more compact. But is it cleaner? It reminds me of a recent thead initiated by Frans, on the topic of C# getting more and more syntax-complex, to the point that C++ looks elegant and you need ReSharper to remind you of all the new bells and whistles.

Thank you, but no. Obfuscation can sure be fun, but one day I might have to troubleshoot that code, and I do prefer the ?: pattern over the ?.Invoke ?? one. As Steve replied:

It's just obfuscating the intent of your code. Clarity over minimal keystrokes. I gotta support this code for years.

(rant) Every developer should have to spend some time doing 24/7 support on production-critical code. Once you have been waken up at 3am to spend hours trying to figure out what is wrong with some recent code changes while the entire factory waits for you to restart operations... you get a different idea of what elegant code is. (/rant)


And besides, it is not the same!

At that point I got a carried on by my rant, and started to think:

And so, the ?.Invoke ?? pattern has to be worse, performance-wise!

Time to turn to Sharplab.io. This amazing tool let you type C# code on the left, and presents you with the resulting IL or assembly code on the right:

We are going to compile the following code:

public Func<int> F { get; set; }
public int M1() 
    return F == null
        ? 0
        : F();
public int M2() 
    return F?.Invoke() ?? 0;

For M1 the IL code (in Release mode) is:

IL_0000: ldarg.0
IL_0001: call instance class [System.Private.CoreLib]System.Func`1<int32> C::get_F()
IL_0006: brfalse.s IL_0014

IL_0008: ldarg.0
IL_0009: call instance class [System.Private.CoreLib]System.Func`1<int32> C::get_F()
IL_000e: callvirt instance !0 class [System.Private.CoreLib]System.Func`1<int32>::Invoke()
IL_0013: ret

IL_0014: ldc.i4.0
IL_0015: ret

And for M2:

IL_0000: ldarg.0
IL_0001: call instance class [System.Private.CoreLib]System.Func`1<int32> C::get_F()
IL_0006: dup
IL_0007: brtrue.s IL_000c

IL_0009: pop
IL_000a: ldc.i4.0
IL_000b: ret

IL_000c: callvirt instance !0 class [System.Private.CoreLib]System.Func`1<int32>::Invoke()
IL_0011: ret

First, note that the ?. coalesce operator does not imply any kind of boxing, as I originally thought. So it is safe. In fact, the code produced for M2 looks simpler, especially with M1 invoking get_F() twice in case F is not null. However, remember this is only IL code. There is another level of optimization in the JIT, which leads to the following x86 assembly code:

    L0000: mov eax, [ecx+4]
    L0003: test eax, eax
    L0005: je short L000e
    L0007: mov ecx, [eax+4]
    L000a: call dword ptr [eax+0xc]
    L000d: ret
    L000e: xor eax, eax
    L0010: ret

    L0000: mov edx, [ecx+4]
    L0003: test edx, edx
    L0005: jne short L000a
    L0007: xor eax, eax
    L0009: ret
    L000a: mov ecx, [edx+4]
    L000d: call dword ptr [edx+0xc]
    L0010: ret

In other words, both methods end up doing almost exactly the same, at assembly level. And benchmarks (see below, if you really are into this) show that they perform the same. This is a good thing: it means that one can use one syntax or the other, without worrying about performance.

As for code elegance, I know what I prefer.


This is more details, for people who are into it.

Just to be sure, I have benchmarked the two methods. The BenchmarkDotNet code is quite simple (see this Gist) and I have executed it with F being null, or returning a constant, or computing a value... and, funny enough, I never get the exact same duration for both methods, and one or the other is always randomly slightly faster that the other. I guess that is due to glitches on my benchmark machine, and it means that the methods are equivalent.

Just to be super sure, I fired WinDBG at the benchmark, after it had run. First you want to locate the class:

> !name2ee Benchmarks!Benchmarks.NullCoalesce
Module:      00007ff98a5cf800
Assembly:    Benchmarks.dll
Token:       0000000002000005
MethodTable: 00007ff98a69ea80
EEClass:     00007ff98a689038
Name:        Benchmarks.NullCoalesce

Then you want to dump the method table:

> !dumpmt -md 00007ff98a69ea80
EEClass:         00007ff98a689038
Module:          00007ff98a5cf800
Name:            Benchmarks.NullCoalesce
mdToken:         0000000002000005
File:            D:\d\Benchmarks\bin\Release\netcoreapp3.1\Benchmarks.dll
BaseSize:        0x20
ComponentSize:   0x0
Slots in VTable: 11
Number of IFaces in IFaceMap: 0
MethodDesc Table
           Entry       MethodDesc    JIT Name
00007FF98A520090 00007ff98a4f0a80   NONE System.Object.Finalize()
00007FF98A520098 00007ff98a4f0a90   NONE System.Object.ToString()
00007FF98A5200A0 00007ff98a4f0aa0    JIT System.Object.Equals(System.Object)
00007FF98A5200B8 00007ff98a4f0ae0    JIT System.Object.GetHashCode()
00007FF98A538F30 00007ff98a69e9f8   NONE Hazelcast.Benchmarks.NullCoalesce..ctor()
00007FF98A538F20 00007ff98a69e9c8    JIT Benchmarks.NullCoalesce.get_F()
00007FF98A538F28 00007ff98a69e9e0   NONE Benchmarks.NullCoalesce.set_F(System.Func`1)
00007FF98A538F38 00007ff98a69ea08    JIT Benchmarks.NullCoalesce.BenchmarkM1()
00007FF98A538F40 00007ff98a69ea20    JIT Benchmarks.NullCoalesce.M1()
00007FF98A538F48 00007ff98a69ea38    JIT Benchmarks.NullCoalesce.BenchmarkM2()
00007FF98A538F50 00007ff98a69ea50    JIT Benchmarks.NullCoalesce.M2()

And... !dumpil can show us the IL code, but we really want the assembly code:

> !u 00007ff98a69ea20
Normal JIT generated code
Begin 00007FF98AAB5F20, size 17
00007ff9`8aab5f20 488b4108        mov     rax,qword ptr [rcx+8]
00007ff9`8aab5f24 4885c0          test    rax,rax
00007ff9`8aab5f27 740b            je      00007ff9`8aab5f34
00007ff9`8aab5f29 488b4808        mov     rcx,qword ptr [rax+8]
00007ff9`8aab5f2d 488b4018        mov     rax,qword ptr [rax+18h]
00007ff9`8aab5f31 48ffe0          jmp     rax
00007ff9`8aab5f34 33c0            xor     eax,eax
00007ff9`8aab5f36 c3              ret

> !u 00007ff98a69ea50  
Normal JIT generated code
Begin 00007FF98AAE4220, size 17
00007ff9`8aae4220 488b5108        mov     rdx,qword ptr [rcx+8]
00007ff9`8aae4224 4885d2          test    rdx,rdx
00007ff9`8aae4227 7503            jne     00007ff9`8aae422c
00007ff9`8aae4229 33c0            xor     eax,eax
00007ff9`8aae422b c3              ret
00007ff9`8aae422c 488b4a08        mov     rcx,qword ptr [rdx+8]
00007ff9`8aae4230 488b4218        mov     rax,qword ptr [rdx+18h]
00007ff9`8aae4234 48ffe0          jmp     rax

My .NET host (.NET Core 3.1.9) produces assembly code that is slightly different from what Sharplab.io shows, but the conclusion stands: both methods are, practically, equivalent.

There used to be Disqus-powered comments here. They got very little engagement, and I am not a big fan of Disqus. So, comments are gone. If you want to discuss this article, your best bet is to ping me on Mastodon.