Sunday, 23 March 2025

FileSystems and Inodes

When writing this recent post about file-locking I came across an interesting Unix/Linux feature: you can remove (or move) an open file. The file entry in the Filesystem is removed (so you can not open it again), but if any process already has the file open, the data-blocks that make up the file will remain until any process that has the file open is closed. What they mention here is that the inode for that file remains (until all processes with a handle to the file are closed). This applies not just to files opened by a process, but to the process itself. I mean, a process corresponds to an executable file, I can remove that executable file and the process will remain running normally until it decides to finish. I've mentioned inodes, buff, I think I had not thought about what an inode is and how filesystems work in almost 2 decades! so I think it's time to refresh my mind and write down here a summary. From wikipedia

The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data.[1] File-system object attributes may include metadata (times of last change,[2] access, modification), as well as owner and permission data.[3]

A directory is a list of inodes with their assigned names. The list includes an entry for itself, its parent, and each of its children.

So an inode contains data about a file (attributes containing metadata) and the file data (via pointers to the file data-blocks). How the file system converts a path "/usr/xose/myFile.txt" to an inode which data and metadata can use goes like this: A directory is a file, a file which data (what is stored in the data-blocks pointed out from that directory inode) are pairs of fileName -> inodeNumer. So in my example, for the "xose" directory we have a file (an inode) that contains an entry like this: "myFile.txt, 11111" (inode number). Walking back, usr is a file with an entry "xose, xose-inode", and the same for the "/" root directory. OK, and where is the entry that tells us the inode number for the "/" root directory? Well, that's a fixed number, that in principle for all unix filesystems is inode 2. This discussion makes a good read:

Directories are just special files that map an inode number to a string filename. Each inode is numbered and usually represents an offset in some array-like structure in the filesystem. This mapping between inode to filename is a hard link. A file must have 1 or more hard links to be accessible. If you create another hard link, you’re just pointing another filename to the same inode. All of them are equally “the file”, and there’s no way to detect which hard link came first. As part of the inode contents, there’s a counter of how many hard links each inode has. It’s eligible for cleanup and reuse when this count is zero.

The root directory is usually some specially reserved inode number.

They mention hard links. I have to shamefully admit that I'd always being a bit confused about the hard link vs symbolic link difference (I come from a Windows background...) when it's a damn simple thing. A hard link corresponds to the "name" part in the "name, inode number" pairs that we have in a directory (remember, a directory is a file containing "name to inode pairs"). You can have multiple hard links pointing to a same inode, and indeed you can not differentiate which one was created first. That's why you can read in the wikipedia article Inodes do not contain their hard link names, only other file metadata. Sure, cause as multiple names can reference that inode, it would be a mess to keep track of that in the inode itself. Symbolic-links (aka soft links) are quite a different thing. They are files that contain just a path to another file. As wikipedia explains:

A symbolic link contains a text string that is automatically interpreted and followed by the operating system as a path to another file or directory. This other file or directory is called the "target". The symbolic link is a second file that exists independently of its target.

So as I've aforementioned, given a hard link:
"/usr/xose/myFile.txt", inside the "xose" directory-file we have an entry: [myFile.txt, 11111 (inode number)]
while for a soft link "/apps/important/file1.txt -> /usr/xose/myFile.txt" we have:
- an entry inside the "important" directory-file: [file1.txt, 22222 (inode number)]
- the data contained inside the 22222 inode, that is just "/usr/xose/myFile.txt"
- The OS (that knows how to treat symbolic links) will handle that path as if it had given to it in first instance.

From the previously linked discussion:

You asked about symbolic links. As I mentioned above, they’re a special kind of file. The filesystem knows to interpret its contents differently. The content of a directory is the mapping for filenames, but the content for symbolic links (soft links) is a file path string. Symlinks consume new inodes, and they do not increment the destination file’s hard link count. Deleting the destination file does not update any symlinks pointing to them.

So notice that an inode contains a counter of how many hard links point to it. There's another great discussion here:

The term hardlink is actually somewhat misleading. While for symlinks source and destination are clearly distinguishable (the symlink has its own entry in the inode table), this is not true for hardlinks. If you create a hardlink for a file, the original entry and the hardlink are indistinguishable in terms of what was there first. (Since they refer to the same inode, they share their file attributes such as owner, permissions, timestamps etc.) This leads to the statement that every directory entry is actually a hardlink, and that hardlinking a file just means to create a second (or third, or fourth...) hardlink. In fact, each inode stores a counter for the number of hardlinks to that inode.

The directory entries of "original file" and "hard link" are totally indistinguishable in quality: both establish a reference between a file name and the inode of a file.

One of the main visible differences between hardlinks and symlinks (a.k.a. softlinks) is that symlinks work across filesystems while hardlinks are confined to one filesystem. That is, a file on partition A can be symlinked to from partition B, but it cannot be hardlinked from there. This is clear from the fact that a hardlink is actually an entry in a directory, which consists of a file name and an inode number, and that inode numbers are unique only per file system.

Notice that a process references the files that it has open by inode, not by path. Well, this all comes down to File Descriptors and will be the subject of a next post.

Normally users don't have to deal with inodes in their daily life, but there are at least a couple of situations where some basic knowledge about them will come handy. In inode based filesystems normally (for example with ext4) the number of inodes of that filesystem is determined when that filesystem is created. This is so because inodes are preallocated and stored in an inode table. If we have many small files in our filesystem it could happen that we use all the available inodes without having used all the disk space, so we won't be able to create new files though our friend df -h will tell us that there's disk space available. Using df -i we can see the inode usage.

There are chances that we have heard about another inodes related concept, orphan inodes. An orphan inode is an inode that is still allocated but there's no longer any directory entry pointing to it. This can be something normal, like when we have deleted a file by it's still open by a process, as when the process finishes the inode will be released, or problematic, due to some system crash during a write operation, or a crash in the previous situation, a process keeps a open file that has been deleted and the OS crashes, so no time for deleting the inode. Hopefully, filesystems keep a list of orphan inodes and normally will be able to delete them in the next boot.

No comments:

Post a Comment