File Organization: The Inode

In this lesson, we discuss the layout of an inode while explaining the file organization in a vsfs.

We'll cover the following

One of the most important on-disk structures of a file system is the inode; virtually all file systems have a structure similar to this. The name inode is short for index node, the historical name given to it in UNIX“The UNIX Time-Sharing System” by M. Ritchie, K. Thompson. CACM Volume 17:7, 1974. The original paper about UNIX. Read it to see the underpinnings of much of modern operating systems., and possibly earlier systems, used because these nodes were originally arranged in an array, and the array indexed into when accessing a particular inode.

ASIDE: DATA STRUCTURE — THE INODE

The inode is the generic name that is used in many file systems to describe the structure that holds the metadata for a given file, such as its length, permissions, and the location of its constituent blocks. The name goes back at least as far as UNIX and probably further back to Multics if not earlier systems. It is short for index node, as the inode number is used to index into an array of on-disk inodes in order to find the inode of that number. As we’ll see, the design of the inode is one key part of file system design. Most modern systems have some kind of structure like this for every file they track, but perhaps call them different things (such as dnodes, fnodes, etc.).

Each inode is implicitly referred to by a number (called the i-number), which we’ve earlier called the low-level name of the file. In vsfs (and other simple file systems), given an i-number, you should directly be able to calculate where on the disk the corresponding inode is located. For example, take the inode table of vsfs as shown in the previous lesson: 20-KB in size (5 4-KB blocks) and thus consisting of 80 inodes (assuming each inode is 256 bytes). Further assume that the inode region starts at 12KB (i.e, the superblock starts at 0KB, the inode bitmap is at address 4KB, the data bitmap at 8KB, and thus the inode table comes right after). In vsfs, we thus have the following layout for the beginning of the file system partition (in closeup view):

To read inode number 32, the file system would first calculate the offset into the inode region (32.sizeof(inode)32\: .\: sizeof(inode) or 8192), add it to the start address of the inode table on disk (inodeStartAddr = 12KB12KB), and thus arrive upon the correct byte address of the desired block of inodes: 20KB20KB. Recall that disks are not byte addressable, but rather consist of a large number of addressable sectors, usually 512 bytes. Thus, to fetch the block of inodes that contains inode 32, the file system would issue a read to a sector 20×1024512\frac{20 \times 1024}{512}, or 40, to fetch the desired inode block. More generally, the sector address sector of the inode block can be calculated as follows:

Press + to interact
blk = (inumber * sizeof(inode_t)) / blockSize;
sector = ((blk * blockSize) + inodeStartAddr) / sectorSize;

Inside each inode is virtually all of the information you need about a file:

  • its type (e.g., regular file, directory, etc.),
  • its size,
  • the number of blocks allocated to it,
  • protection information (such as who owns the file, as well as who can access it),
  • some time information, including when the file was created, modified, or last accessed,
  • information about where its data blocks reside on disk (e.g., pointers of some kind).

We refer to all such information about a file as metadata; in fact, any information inside the file system that isn’t pure user data is often referred to as such. An 1 example inode from ext2“The Second Extended File System: Internal Layout” by Dave Poirier. 2009. Available: http://www.nongnu.org/ext2-doc/ext2.html. Some details on ext2, a very simple Linux file system based on FFS, the Berkeley Fast File System. We’ll be reading about it in the next chapter. is shown in the figureType info is kept in the directory entry, and thus is not found in the inode itself. below.

One of the most important decisions in the design of the inode is how it refers to where data blocks are. One simple approach would be to have one or more direct pointers (disk addresses) inside the inode; each pointer refers to one disk block that belongs to the file. Such an approach is limited: for example, if you want to have a file that is really big (e.g., bigger than the block size multiplied by the number of direct pointers in the inode), you are out of luck.

The multi-level index

To support bigger files, file system designers have had to introduce different structures within inodes. One common idea is to have a special pointer known as an indirect pointer. Instead of pointing to a block that contains user data, it points to a block that contains more pointers, each of which point to user data. Thus, an inode may have some fixed number of direct pointers (e.g., 12), and a single indirect pointer. If a file grows large enough, an indirect block is allocated (from the data-block region of the disk), and the inode’s slot for an indirect pointer is set to point to it. Assuming 4-KB blocks and 4-byte disk addresses, that adds another 1024 pointers; the file can grow to be (12+1024).4K(12+1024)\:.\:4K or 4144KB.

Not surprisingly, in such an approach, you might want to support even larger files. To do so, just add another pointer to the inode: the double indirect pointer. This pointer refers to a block that contains pointers to indirect blocks, each of which contains pointers to data blocks. A double indirect block thus adds the possibility to grow files with an additional 1024.10241024\:.\:1024 or 1-million 4KB blocks, in other words supporting files that are over 4GB in size. You may want even more, though, and we bet you know where this is headed: the triple indirect pointer.

Overall, this imbalanced tree is referred to as the multi-level index approach to pointing to file blocks. Let’s examine an example with twelve direct pointers, as well as both a single and a double indirect block. Assuming a block size of 4 KB, and 4-byte pointers, this structure can accommodate a file of just over 4 GB in size (i.e., (12+1024+10242)×4KB(12 + 1024 + 1024^2) \times 4KB). Can you figure out how big of a file can be handled with the addition of a triple-indirect block? (hint: pretty big)

TIP: CONSIDER EXTENT-BASED APPROACHES

A different approach is to use extents instead of pointers. An extent is simply a disk pointer plus a length (in blocks); thus, instead of requiring a pointer for every block of a file, all one needs is a pointer and a length to specify the on-disk location of a file. Just a single extent is limiting, as one may have trouble finding a contiguous chunk of on-disk free space when allocating a file. Thus, extent-based file systems often allow for more than one extent, thus giving more freedom to the file system during file allocation.

In comparing the two approaches, pointer-based approaches are the most flexible but use a large amount of metadata per file (particularly for large files). Extent-based approaches are less flexible but more compact. In particular, they work well when there is enough free space on the disk and files can be laid out contiguously (which is the goal for virtually any file allocation policy anyhow).

Many file systems use a multi-level index, including commonly-used file systems such as Linux ext2“The Second Extended File System: Internal Layout” by Dave Poirier. 2009. Available: http://www.nongnu.org/ext2-doc/ext2.html. Some details on ext2, a very simple Linux file system based on FFS, the Berkeley Fast File System. We’ll be reading about it in the next chapter. and ext3, NetApp’s WAFL, as well as the original UNIX file system. Other file systems, including SGI XFS and Linux ext4, use extents instead of simple pointers; (they are akin to segments in the discussion of virtual memory).

You might be wondering: why use an imbalanced tree like this? Why not a different approach? Well, as it turns out, many researchers have studied file systems and how they are used, and virtually every time they find certain “truths” that hold across the decades. One such finding is that most files are small. This imbalanced design reflects such a reality; if most files are indeed small, it makes sense to optimize for this case. Thus, with a small number of direct pointers (12 is a typical number), an inode can directly point to 48 KB of data, needing one (or more) indirect blocks for larger files. See Agrawal et. al“A Five-Year Study of File-System Metadata” by Nitin Agrawal, William J. Bolosky, John R. Douceur, Jacob R. Lorch. FAST ’07, San Jose, California, February 2007. An excellent recent analysis of how file systems are actually used. Use the bibliography within to follow the trail of file-system analysis papers back to the early 1980s. for a recent-ish study. The table below summarizes those results.

Of course, in the space of inode design, many other possibilities exist. After all, the inode is just a data structure, and any data structure that stores the relevant information, and can query it effectively, is sufficient. As file system software is readily changed, you should be willing to explore different designs should workloads or technologies change.

Get hands-on with 1400+ tech skills courses.