I am the author and maintainer of ModelsBuilder, an essential tool for the Umbraco CMS. If you use it regularly, or any of my other Open Source tools, you may want to consider my GitHub Sponsorship Program and help ensure that the tools you depend on are maintained in a sustainable way. Thanks!

The Curse of NULL

Posted on June 10, 2022 in dotnet

C# has two main categories of types: value types and reference types. A variable of a value type contains an instance of the type, while a variable of a reference type contains a reference to an instance of the type. In a very non-technical way, one could say that a value type variable is the "thing" itself, while a reference type variable points to the "thing".

An integer, for instance, is a value type. An integer variable contains the value itself. It has to be a valid integer value (in the int.MinValue to int.MaxValue range), and is zero by default. And technically, anything that can fit in a value type (ignoring struct here) is valid: if an int is composed of four bytes i.e. 32 bits, any combination of these 32 bits is going to produce a valid int.

An object can be a reference type, when it is defined as a class. A class object variable contains a reference to the object, not the object itself. More precisely, it contains a number (32 bits on 32 bit OS, 64 bits on 64 bits OS) which is the address of the object in memory.

A reference type variable has to point to a valid object of the correct type, and of course not every number point to such an object: some numbers would be invalid. Luckily C# makes sure that, by default, all reference variables are valid.

Except for one situation.

Enters NULL

C# allows reference type variables to use the underlying value zero a.k.a. null, which is not a valid memory address, to indicate that they actually do not point to an object. It can be convenient, when you think of it, and the concept is widely used in databases: if the Name column contains null, we don't know the name of the person.

At C# code level however, null is sort of cursed. And why so? Because... if the variable thing is of type Thing which is defined as a class, and the value of the variable is null, what is the value of thing.Value? It cannot be determined, and trying to access it provokes a NullReferenceException at runtime. Once null is authorized, programmers have to take great care everywhere to handle, and guard against, null.

Oh and, the default value of a reference type variable is null.

Nullable Value Type

Now imagine that one wants a variable that indicates whether a user is OK with receiving spam, and aso imagine that this variable is meaningless as long as the user has not explicitly indicated whether they are OK with spam or not. Unfortunately, a bool variable is true or false and nothing else. Of course the proper way do handle this would be via two separate variables:

bool isOkWithSpam;
bool hasIndicatedSpamPreference;

But, inspired by the null usage for references, C# introduced the Nullable<T> structure which can at the same time carry a value, and indicate whether it has a value or not. And, to make it even friendlier, it introduced syntactic sugar that would allow one to pretend that the variable could be null:

bool? isOkWithSpam;    // compiles as: Nullable<bool> isOkWithSpam;
isOkWithSpam = null;   // compiles as: isOkWithSpam.HasValue = false;

Note that a Nullable<T> has "interesting" properties. You probably know already that the C# language can "box" an integer value type into an object. In the code below, the o variable is of type object and contains an actual System.Int32 value, obtained at the time of boxing (don't confuse it with a reference to i— such a reference does not exist and changing i does not change o):

var i = 33;
var o = (object) i;
Console.WriteLine(o.GetType()); // "System.Int32"

Now what happens when you box a Nullable<T>? You may think that you would get a variable of type object containing an actual Nullable<T> value. But actually... boxing unwraps the Nullable<T> structure, and what you finally get depends on whether there was a value or not:

var i = (int?) 33;
var o = (object) i; // o is of type System.Int32
var ni = (int?) null;
var no = (object) ni; // o is null

All in all, the C# language support for nullable values is pretty nice, you can even sum several int? value and get an actual result if no value is null, or null if one value is null... The language designers looked at nullable value type variables, and thought it was a very good thing.

Nullable Reference Type

Others found that the whole concept of a variable being null was very wrong and dangerous. They thought that a string variable should always contain an actual string, no matter what, maybe the empty string by default, but not never ever nothing.

However, imposing such a change on the CLR would be a massive breaking change. Actually, an impossible one. So they had this idea: technically, the CLR would still support null references, but the compiler would try its best to detect and prevent them.

When the nullable feature of the compiler is enabled, a string variable is assumed to be non-null. The following code would raise a warning (note: not an error).

string s;
s = null; // warning!

And, for consistency purposes, the value type nullable notation has been reused to indicate a reference variable that is authorized to be null:

string? s;
s = null; // no warning

Though initially annoying, this can quickly point to errors in code and is in fact quite nice.

Some caveats: if you write a library that expose a method such as:

public void WriteLength(string s)
{
    Console.WriteLine(s.Length);
}

The compiler will assume that this is safe, since the variable is of type string, which indicates it cannot be null. However, remember that this is a compiler-level check. Nothing prevents a user of your library to disable the nullable checks, and pass null to your method. Therefore, it is still a good practice to check parameters for null values in publicly-exposed methods:

public void WriteLength(string s)
{
    if (s == null) throw new ArgumentNullException(nameof(s));
    Console.WriteLine(s.Length);
}

But for everything internal, it should be safe to trust the compiler.

And then it becomes complicated

Now imagine a function that may return a value, or may want to indicate that it has no value to return. This is a typical use case for nullable.

public T? GetValue<T>() { ... }

Alas, with C# 8, that would not compile. The problem here is that T? can mean two very different things:

We end up with a function that can return Nullable<T> or T, which is an impossible situation. One has to constrain T for the code to compile:

public T? GetValValue<T>() where T : struct { ... }
public T? GetRefValue<T>() where T : class { ... }

C# 9 has a better support for nullable, and does compile the following function:

public T? GetValue<T>() { return default; }

So, can you guess the output of the following code:

Console.WriteLine(GetValue<string>());
Console.WriteLine(GetValue<int>());
Console.WriteLine(GetValue<int?>());

The first line outputs nothing, which makes sense: GetValue<string> returns a string? value, which really at IL code / CLR level is a string, and the default value is null.

The second line outputs... zero. How come? The reason is, even in version 9 C# cannot do miracle and return both T and Nullable<T>. The method is being called with T being int, so it returns default(int) i.e. zero, as T. So, even though the code now compiles, it does not exactly produces what we would expect (null). In fact, the method cannot return a null value type.

The third line outputs nothing. GetValue<int?> returns a... int?? value, I guess, which really is a int? with the second ? being discarded because int? is a value type. But in this case, the method is being called with T being int?, so it returns default(int?) i.e. null, as T (again), which happens to be int?, so we get a null value type. Are you still with me?

So what?

In most cases you will ignore all this and happily use nullable values here and there without issues. Just remember that nullable values and null references, though sharing a very similar syntax, are not exactly the same thing. And, come back and read again this post the day you will get this very odd behavior that you do not quite understand!

comments powered by Disqus