Sunday, 7 June 2026

SQL, NULL, Unknown

Lately I've been revisiting the rather particular behaviour of NULL in SQL, and it has led me into a better understanding of how different SQL is from General Programming Languages

- Binary Logic vs Ternary Logic

General Programming Languages (Python, JavaScript, ruby, Java...) use binary logic (Boolean logic in particular, and indeed that's the only logic I was aware of). Conditions are either True or False.
SQL uses a ternary logic (kleene logic), where we have TRUE, FALSE, and UNKNOWN

- The meaning of Missing Data

Both in General Programming Languages and in SQL we use null (None in Python) to represent missing data. There are 2 reasons for missing data, either it does not apply to that object, or we don't know it. Let's say we have an instance of a ShopItem class. Its expirationDate attribute can be null either because this object is a Book, and books do not expire, or because the printed date on this beans can is blurry (or we've had no time to read it yet) and then we don't know it, it's unknown.

In general Programming Languages null is a value (that represents that there's nothing here, there is no value here, for whatever the reason, either because it does not apply or because we don't know it), and with binary logic comparing a value to to another value is either true or false. So "a" == null is false, and null == null is true.

In SQL we have a sort of mismatch. On one hand we have ternary logic with that additional UNKOWN concept, but on the other hand we still have a single value, NULL, to represent both that it does not apply or that we don't know it. So how should NULL behave in comparisons? SQL designers decided to treat NULL as a marker that represents that the value is unknown (so we can not express that the value does not apply).

Once we have understood that, the apparent odd behaviour of NULL in comparisons suddenly makes sense. Any comparison using the standard operators (=, !=, <, >, <>) involving a NULL value will return UNKNOWN, even NULL = NULL or NULL != NULL return UNKNOWN. The negation of UNKNOWN (NOT UNKNWON) is also UNKNOWN.

What is odd is what I've just said, that SQL lacks a way to indicate that the value does not apply. It seems one of the main influences in the design of SQL ended realizing this was a serious problem, but too late:

Codd actually realized this flaw later in his life and proposed that SQL should have two different kinds of NULLs: A-Values (Absence) and I-Values (Information Unknown). Sadly, by then, SQL was already set in stone.

No comments:

Post a Comment