Sunday, 21 June 2026

Python Class Body and Lexical Scope

In my previous post about class creation in Python I mentioned how a code object is created for the code in the class body, and then that code object is executed as a function receiving a namespace object (created by __prepare__) as its locals. The class body will add attributes to that namespace, and can use whatever is already present in that namespace (if __prepare__ has put something there). This has made me wonder if apart from that, the class body can have access to its enclosing scope. Just remember that in this post we saw that methods in a class have access to its enclosing scope (they close over variables defined outside the class).

So the answer is YES, and it's just the closures mechanism in action. Let's see an example:


def create_class(id: str):
    class MyClass:
        # the class initialiation (the class body) has access to "external" variables in the enclosing scope, such as "id"
        # the code in the class body is placed in a codeobject that will run in a function having trapped (closed over) the "id" free variable
        class_id = id
        
        print(f"Free variables: {inspect.currentframe().f_code.co_freevars}")
        # Free variables: ('id',)

        def __init__(self, value):
            self.value = value

        def display(self):
            print(f"MyClass value: {self.value}, Class ID: {self.class_id}")

    return MyClass

cl = create_class("123")
instance = cl("aa")
instance.display()

# Free variables: ('id',)
# locals: {'__module__': '__main__', '__qualname__': 'create_class..MyClass', '__firstlineno__': 7, 'class_id': '123'}
# MyClass value: aa, Class ID: 123


As you can see, that class body (my understanding is that Python will execute the code object corresponding to that class body by putting it in a "synthetic function") has access to the id variable in the outer scope and can assign it to one of its attributes. We can see that 'id' in the list of freevars for the code object of the class body (that we get accessing the current frame from the class body itself). However, I came across something that confused me. If I print locals() from the class body, I can't see 'id' there. That's very strange, if I do the same from a normal function, locals shows both "normal" variables and those that the function has trapped in its closure, but, as I've said, for the class body, 'id' is missing in locals:


def create_class(id: str):
    class MyClass:
        # the class initialiation (the class body) has access to "external" variables in the enclosing scope, such as "id"
        # the code in the class body is placed in a codeobject that will run in a function having trapped (closed over) the "id" free variable
        class_id = id
        
        print(f"Free variables: {inspect.currentframe().f_code.co_freevars}")
        # Free variables: ('id',)

        # notice that locals() does not show the id free var
        # that's because we are running in "class scope" and locals() just shows its namespace, not the closure cells. The closure is a separate object that holds references to the free variables, and it is not part of the local namespace of the class body. However, the class body can still access the free variable "id" through the closure.
        print(f"locals: {locals()}")
        # {'__module__': '__main__', '__qualname__': 'create_class..MyClass', '__firstlineno__': 4, 'class_id': '123'}

While for a normal function, it's well there:



def outer(id):
    def inner(value):
        # locals() shows "id" free variable because we are in a function scope (optimized scope)
        print(f"locals: {locals()}")
        # {'value': 'bb', 'id': '456'}
        print(f"Inner function value: {value}, Outer ID: {id}")
    return inner

inner_func = outer("456")
inner_func("bb")


So, why is that? At the time of the Python3.13 release I wrote a post about locals(), f_locals and the "local namespace" (this is related to PEP-667). Well, indeed what I mention on that post (based on other articles) about the "local namespace" as a sort of dictionary is not correct for normal functions in recent Python versions. In "normal" functions we are working in an Optimized Scope. In this optimized scope local variables are not placed in a dictionary and accessed by key, but in a fastarray, and accessed by index (you can see this using dis to check the bytecodes of a function). This locals fastarray, part of the _PyInterpreterFrame for each running function, contains both local variables (including function arguments and those local variables that are cells, cellvars, because they are trapped by inner functions in its closure) and variables trapped by the function itself in its closure:

In CPython's frame object, the `fastlocals` array is laid out as:

[regular locals] [cellvars] [freevars]

At the beginning of a function that has variables in its __closure__ the `COPY_FREE_VARS` bytecode instruction copies cell references from `__closure__` into the frame's fastlocals array for quick access!*
After `COPY_FREE_VARS` executes, all variables (normal locals, cellvars and freevars) are accessed from the fastlocals array during function execution.

By the way, regarding the aforementioned _PyInterpreterFrame, I'll leverage to copy here some GPT wisdom about frames in recent (Python 3.11 and above) Python versions

The _PyInterpreterFrame is an internal C struct introduced in Python 3.11 that represents a stack frame for execution, aiming to improve performance by reducing the overhead of allocating full Python PyFrameObject objects.

Purpose: It holds the execution state for code objects, including local variables, globals, builtins, and the instruction pointer (f_lasti).
Performance: Unlike older Python versions where every frame was a full heap-allocated PyFrameObject, _PyInterpreterFrame is designed to be lightweight and often lives on the C stack, reducing garbage collection pressure.

The traditional PyFrameObject still exists, but it has been relegated to a "shadow" role. It is now treated purely as a compatibility API wrapper.

Python only creates a PyFrameObject on demand when a tool or a piece of code explicitly asks to inspect the call stack. This process is often referred to as materializing a frame.

Thursday, 11 June 2026

Class Statement vs Dynamic Class Creation

We know that along with the standard class statement, Python also allows us to create classes dynamically by calling type() (or another metaclass if our class has a metaclass other than type)

I already dedicated a rather thick post to type. Basically we use it for creating a new class like this: type(classname, superclasses, namespace). (the namespace is just a dictionary with the attributes).

So I was wondering if the compiler translates a class statement into a call to type(), and yes, more or less we can say so, but there are some extras. I've had a really insightful conversation with a GPT about this, and additionally I've found an excellent article that explains it in full detail

The steps that Python follows when it comes across with a class statement (class Foo(Base, metaclass=Meta): x = 1) are these (I'm taking it from a GPT discussion, it's basically the same that is explained in the linked article)

  • Step 1 — Determine the metaclass. Python calls __build_class__ (a builtin), which inspects the bases and the explicit metaclass= kwarg to resolve which metaclass to use (with MRO-based metaclass conflict resolution).
  • Step 2 — Prepare the namespace. The metaclass's __prepare__ classmethod is called: namespace = Meta.__prepare__('Foo', (Base,), **kwargs). This returns the dict (or dict-like object) that will serve as the class namespace. For type, this is just a plain dict. For enum.EnumMeta, for example, it returns a special _EnumDict.
  • Step 3 — Execute the class body. The compiled code object for the class body is executed as a function, with the namespace from step 2 as its locals(). This is the key insight: it's essentially exec(body_code, globals(), namespace). After this, namespace contains {'x': 1, '__module__': ..., '__qualname__': ...}.
  • Step 4 — Call the metaclass. Meta('Foo', (Base,), namespace) is called — which, for the default type, invokes type.__call__ → type.__new__ → type.__init__. This is where the actual class object is constructed.

How do the above steps look at the bytecode level? When the Python compiler (yes, in Python, where compilation is like a hidden step that happens the first time our code is run (or has changed), it's sometimes confusing to establish the difference between compilation time and execution time), comes across a class statement, it creates a code object for the code that we've placed inside that statement (the body of the class statement), along with code objects for each function (method) defined in that code, and creates a sequence of bytecode instructions that at runtime will make use of that code object (and many more things) to create a class object (yes, remember that classes are objects).

That sequence of bytecode instructions can vary slightly with Python versions (what I'll show below, that corresponds to python 3.14 is slightly different from what is shown in the aforementioned article), but the intent is the same.


class Person:
	pass

# translates into:

 0           RESUME                   0

  1           LOAD_BUILD_CLASS
              PUSH_NULL
              LOAD_CONST               0 (code object Person at 0x78d07576e730, file "class_creation.py", line 1)
              MAKE_FUNCTION
              LOAD_CONST               1 ('Person')
              CALL                     2
              STORE_NAME               0 (Person)
              LOAD_CONST               2 (None)
              RETURN_VALUE


So those seem like very few instructions for the complex 4 steps that I've just described!. Well, that's because all the magic happens in a builtin function __build_class, that is loaded by the LOAD_BUILD_CLASS bytecode instruction. The article makes a great job explaining these opcodes.

When we create a class dynamically using type() (or any other metaclass), we are directly at step 4, we are skipping the first 3 steps. Obviously it's us who choose the metaclass to use, and there's not class body to execute. And as we create ourselves the namespace object to pass to the metaclass the __prepare__ method that helps prepare that namespace is not executed. That's maybe the main difference then, that in the dynamic class creation the metaclass __prepare__ method does not intervene. That's interesting, cause indeed I was not familiar with that __prepare__ method (also referred as hook).

When talking about metaclasses I always think about __new__ and __init__ (and __call__ that intervenes when instances of a class created by the metaclass are created), I've talked about them in different posts, one of the most interesting being this, but was unfamiliar with __prepare__. We've seen that it allows us to prepare the namespace, OK, but when can we need that? Well, very rarely (this is particularly dark metaclass stuff). That can be material for another post, for now I'll just say that enum.EnumMeta makes use of it.

Sunday, 7 June 2026

SQL, NULL, Unknown

Lately I've been revisiting the rather particular behaviour of NULL in SQL, and it has led me into a better understanding of how different SQL is from General Programming Languages

- Binary Logic vs Ternary Logic

General Programming Languages (Python, JavaScript, ruby, Java...) use binary logic (Boolean logic in particular, and indeed that's the only logic I was aware of). Conditions are either True or False.
SQL uses a ternary logic (kleene logic), where we have TRUE, FALSE, and UNKNOWN

- The meaning of Missing Data

Both in General Programming Languages and in SQL we use null (None in Python) to represent missing data. There are 2 reasons for missing data, either it does not apply to that object, or we don't know it. Let's say we have an instance of a ShopItem class. Its expirationDate attribute can be null either because this object is a Book, and books do not expire, or because the printed date on this beans can is blurry (or we've had no time to read it yet) and then we don't know it, it's unknown.

In general Programming Languages null is a value (that represents that there's nothing here, there is no value here, for whatever the reason, either because it does not apply or because we don't know it), and with binary logic comparing a value to to another value is either true or false. So "a" == null is false, and null == null is true.

In SQL we have a sort of mismatch. On one hand we have ternary logic with that additional UNKOWN concept, but on the other hand we still have a single value, NULL, to represent both that it does not apply or that we don't know it. So how should NULL behave in comparisons? SQL designers decided to treat NULL as a marker that represents that the value is unknown (so we can not express that the value does not apply).

Once we have understood that, the apparent odd behaviour of NULL in comparisons suddenly makes sense. Any comparison using the standard operators (=, !=, <, >, <>) involving a NULL value will return UNKNOWN, even NULL = NULL or NULL != NULL return UNKNOWN. The negation of UNKNOWN (NOT UNKNWON) is also UNKNOWN.

What is odd is what I've just said, that SQL lacks a way to indicate that the value does not apply. It seems one of the main influences in the design of SQL ended realizing this was a serious problem, but too late:

Codd actually realized this flaw later in his life and proposed that SQL should have two different kinds of NULLs: A-Values (Absence) and I-Values (Information Unknown). Sadly, by then, SQL was already set in stone.