Sunday, 3 May 2026

Glibc

Compiling to native code, and furthermore for a Linux system... wow, sounds scary, and very, very far away from what I've been doing in the last decade(s). Well, the thing is that my employer decided some months ago that we had to compile to native code some of our Python applications. It's not something performance related, it's for preventing access to the source code of these applications. We were looking into cython, but we settled on Nuitka, an amazing piece of software that has been serving us so well.

Normally almost every native application compiled for a Linux system has been dynamically linked against glibc. OK, and, what's glibc?

The GNU C Library, commonly known as glibc, is the GNU Project implementation of the C standard library. It provides a wrapper around the system calls of the Linux kernel and other kernels for application use. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages).

So when a Linux native application (using glibc) starts, the dynamic linker (libdl.so) will dynamically load the shared objects (SO, .so files, the equivalent to windows DLL's) needed by the application (like glibc.so) and link the callsites to the functions imported from those libraries.

Obviously glibc evolves over time, so, what about versions? First, what glibc version is installed on my system? You can check the SO's loaded by a running process by doing: lsof -p PID | grep .so. Normally you'll see that it's using: libc.so.6 (in Ubuntu it located here: /usr/lib/x86_64-linux-gnu/libc.so.6). That 6 is not the version number (libc.so.6 is the name for the library since 1997!), the version number is something like 2.XX (2.39 in my ubuntu 24.04). You find it by using: ldd --version

So, what happens if I compile my application in a system with one version of glibc and try to run it in a system with a different version? Well, the situation is quite more fine-grained that I thought. Version numbers are not checked at the glibc level, but at the function level. This is so because glibc uses symbolic versioning

The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. This allows for extreme backward compatibility without breaking the system every time a single function is updated.
    
    The primary goal of symbol versioning is backward compatibility. It allows a single library file to provide multiple versions of the same function so that:

    Old binaries compiled against v2.10 continue to use the v2.10 implementation.

    New binaries compiled against v2.11 use the new v2.11 implementation.

So multiple versions of the same function live inside glibc, and your binary will dynamically link against the one it was compiled for. And, when does the version number of a function change? Normally it only changes if the function interface (the contract, the ABI) changes, but not if its only its internal implementation that changes. So if we compare symbolic versioning to semantic versioning (SemVer, a more familiar versioning schema), we could say that in symbolic versioning a version change corresponds to a Major version in semantic versioning.

You are exactly right: A new Symbol Version is functionally equivalent to a Major Version bump for that specific function. It signals to the linker that the "Contract" for that specific symbol has changed, and old programs should look for the previous contract elsewhere in the same file.

Notice how symbolic versioning is used for functions inside a library, while semantic versioning (when used) is normally used for libraries.

The glibc version (that one obtained with ldd --version) has no importance in terms of loading the library in memory (the dynamic linker will load libc.so.6 regardless of its "internal" version), the important part is the specific version of each function that we try to link.

I guess when you program in C you are aware of the version of each function that you are using, as you have to adapt your code to the ABI of the function if it has changed, but when that happens behind the scenes, that's quite different. In our case, we just write Python code, and the beautiful Nuitka takes care of transforming it to C and then compiling it to native. So it's Nuitka who takes care of writing the C code in accordance to the function versions inside the glibc in the system. So if then you run that binary in a system with an older glibc version it could happen that your binary is "pointing" to a function with a symbolic version (let's say openEncryptedFile@GLIBC_2.12) higher than the one in the older glibc (let's say openEncryptedFile@GLIBC_2.10) present in the current system, and your application will crash. Basically this means that you have to compile your Python application in a system with a glibc version <= that the glibc version in the target system. It feels odd at first, as the starting point is just the same Python code, and if in one system it can just use openEncryptedFile@GLIBC_2.10 why doesn't it compile it always with that 2.10 even if a bigger version (openEncryptedFile@GLIBC_2.12) is present? Well, that's how things work by default, when compiling, code will be linked to the highest version of that function present in the glibc in the compilation machine.

If you wonder if other .so libraries (SO, ELF libraries) also use symbolic versioning, it depends. For smaller, simpler libraries what is usually used is the SONAME approach, the library (.so file) name changes with each version (this is a coarse grained approach).

Symbolic versioning is the technically superior approach, but it is not the universal standard for all ELF libraries. It depends entirely on the library maintainers and their commitment to long-term ABI stability.

In the Linux ecosystem, there are two primary ways to manage library changes:  
1. The "SONAME" Approach (Common)

Most smaller or simpler libraries use the SONAME mechanism. 
You’ve likely seen files like libfoo.so.1 and libfoo.so.2.

    The Logic: If the developers change the interface, they increment the "Major" version number in the filename itself.  

    The Result: Programs linked against libfoo.so.1 will refuse to start 
    if only libfoo.so.2 is present. This is a "heavy-handed" fix because 
    it requires recompiling every program that uses the library even if 
    the specific function they use didn't actually change.

2. The "Symbol Versioning" Approach (Advanced)

This is what glibc uses. It is also used by heavy-hitters like OpenSSL, Qt, and libgcc.

    The Logic: The filename stays the same (e.g., libc.so.6 has been the name since 1997), 
    but individual functions inside the file are tagged with versions.

    The Result: Multiple versions of the same function can coexist in one file. 
    This allows for extreme backward compatibility without breaking the system every time a single function is updated.

To complete this post, I'll add some useful, related commands:

  • To check the SO's used by a given program.
    For a binary on disk: ldd /usr/bin/program_name
    For a running process: lsof -p [PID] | grep '\.so'
  • To view the symbols used by a program (the specific functions imported from SO's)
    All imported symbols: nm -Du
    Symbols + Versions: objdump -T | grep '*UND*'
    Only glibc symbols: objdump -T | grep 'GLIBC_'
    Library Version Map: readelf -V
  • To view the symbols/functions exported by glibc in your system: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6

Some additional findings related to the last command. For example I want to see the versions of pthread_spin_init present in my glibc: objdump -T /usr/lib/x86_64-linux-gnu/libc.so.6 | grep pthread_spin_init

That gives me:

0000000000a4130 g DF .text 000000000000000d GLIBC_2.34 pthread_spin_init 00000000000a4130 g DF .text 000000000000000d (GLIBC_2.2.5) pthread_spin_ini

Which is very interestinng as it shows us that a symbol version is not a sequential counter for that specific function. Instead, it is a timestamp or a marker of the glibc release that defined that specific version of the function's ABI. From a GPT:

How glibc handles ABI changes with symbol versioning

Original version: Suppose foo() was introduced in GLIBC_2.2.5. That version is tagged as foo@GLIBC_2.2.5.

ABI change in glibc 2.32: If glibc developers change the ABI of foo() in version 2.32 (e.g., change its behavior, arguments, or return type in a way that breaks compatibility), they will:

Keep the old implementation as foo@GLIBC_2.2.5.
Add a new implementation as foo@GLIBC_2.32.

At runtime:

A binary linked against glibc 2.2.5 will request foo@GLIBC_2.2.5, and the dynamic linker will resolve it to the old implementation.
A binary linked against glibc 2.32 will request foo@GLIBC_2.32, and get the new implementation.

This mechanism ensures backward compatibility while allowing glibc to evolve.