Debunking Unix's filesystems with file-descriptors, inodes and blocks, in that order

Published on 2024-12-29

Could be Interesting!

A peak into how file storage works in computing devices running unix system.

Vocabulary
  1. Kernel = Kernel (think: mediator between hardware and software functions) operations run in the Kernel space which is a reserved area of memory with the full control over the hardware and system resources. The kernel lives in the system's memory and is loaded into the RAM (Random Access Memory) when the computer is booted up. More specifically, the kernel is typically stored on the hard drive or other persistent storage (like an SSD) in the form of a file. When the operating system boots, this kernel file is loaded into memory to start the system and manage hardware and software interactions. Below commands tell the kernel running on macOS system.
    uname -v
    uname -a 
  2. Process table = Exists in Kernel space. It's a list of processes where Kernel keeps track of the information about processes, for example, process id (PID), process state, etc.
    ps aux
  3. File Descriptor = Exits in Kernel space. It is just a non-negative integer (starting from 0) that serves as an index into the Open File Table maintained by the Kernel. Common file descriptors are:
    -- 0 used for standard input (stdin),
    -- 1 for standard output (stdout) and
    -- 2 for standard error (stderr).
    Each process has its own file descriptor table maintained by the kernel. Processes can share file descriptors (e.g. after a fork).
  4. Open File table = Exits in Kernel space. It's a list of all files currently opened by the processes, and contains file descriptors for each open file, file pointers indicating the current position in the file, references to the vnode or inode structures for the file.
  5. Inodes = Index node (resides from block 2 since block 0 is "boot block" and block 1 is "superblock") stores all the block numbers (in order) that make up the data for a file. Usually, inodes occupies 10% of the blocks.
    struct inode {
      uint16_t i_mode; // bit vector of file // type and permissions
      uint8_t i_nlink; // number of references to file
      uint8_t i_uid; // owner
      uint8_t i_gid; // group of owner
      uint8_t i_size0; // most significant byte of size
      uint16_t i_size1; // lower two bytes of size (size is encoded in a three-byte number)
      uint16_t i_addr[8]; // device addresses constituting file
      uint16_t i_atime[2]; // access time
      uint16_t i_mtime[2]; // modify time
    };
    Note1: Two types of files data: (1) payload data (text in the documents, image in the png, etc.) (2) metadata (name, size, etc.). That means some blocks must store file payload data and some others, metadata.
    Note2: i_addr stores the numbers of blocks that contain payload file data. Since there is only space for 8 block numbers per inode, the largest a file can be is 4096 * 8 = 32,768 bytes (~32KB). That is not realistic! What if we have more than 8 block numbers for a file? The workaround is to store the block number in a block, and so on. This is called indirect block.
    Sidebar: Apple File System (APFS) does not use inode, instead it uses a B-tree structure to manage file metadata. (more on this later)
  6. Block = A unit of storage used by the file system, typicall 4KB (4096 bytes). It differs based on the configuration, however.
    boot block = It contains initial bootstrap code that loads the OS Kernel and information about the partition layout and boot loader.
    superblock = filesystem info, it is metadata of metadata if that makes sense.
    Another interesting thing is that the blocks could be scattered all over disk. For example, a file could have i_addr = [12, 200, 56, …].
    In order to know the block size, we can issue below commands in the shell/terminal:
    diskutil list 
    diskutil info /
    For example, mine returns with
    Device Block Size: 4096 Bytes
    In a nutshell,
    file-descriptors-example!

Note: file descriptors are assigned in ascending order (next FD is lowest unused).

References: