Traverse EXT2 File System Tree in Unix/Linux

Given an EXT2 file system and the pathname of a file, e.g. /a/b/c, the problem is how to find the file. To find a file amounts to finding its inode. The algorithm is as follows.

1. Traversal Algorithm

  • Read in the superblock. Check the magic number s_magic (0xEF53) to verify it’s indeed an EXT2 FS.
  • Read in the group descriptor block (1 + s_first_data_block) to access the group 0 descriptor. From the group descriptor’s bg_inode_table entry, find the inodes begin block number, call it the InodesBeginBlock.
  • Read in InodeBeginBlock to get the inode of /, which is INODE #2.
  • Tokenize the pathname into component strings and let the number of components be n. For example, if pathname=/a/b/c, the component strings are “a”, “b”, “c”, with n = 3. Denote the components by name[0], name[1], ..,name[n-1].
  • Start from the root INODE in (3), search for name[0] in its data block(s). For simplicity, we may assume that the number of entries in a DIR is small, so that a DIR inode only has 12 direct data blocks. With this assumption, it suffices to search 12 (non-zero) direct blocks for name[0]. Each data block of a DIR INODE contains dir_entry structures of the form

[ino rec_len name_len NAME] [ino rec_len name_len NAME] …….

where NAME is a sequence of nlen chars without a terminating NULL. For each data block, read the block into memory and use a dir_entry *dp to point at the loaded data block. Then use name_len to extract NAME as a string and compare it with name[0]. If they do not match, step to the next dir_entry by

dp = (dir_entry *)((char *)dp + dp->rec_len);

and continue the search. If name[0] exists, we can find its dir_entry and hence its inode number.

  • Use the inode number, ino, to locate the corresponding INODE. Recall that ino counts from 1. Use Mailman’s algorithm to compute the disk block containing the INODE and its offset in that block.

blk = (ino – 1)       / INODES_PER_BLOCK + InodesBeginBlock;

offset = (ino – 1) % INODES_PER_BLOCK;

Then read in the INODE of /a, from which we can determine whether it’ s a DIR. If /a is not a DIR, there can’t be /a/b, so the search fails. If it’s a DIR and there are more components to search, continue for the next component name[1]. The problem now becomes: search for name[1] in the INODE of /a, which is exactly the same as that of Step (5).

  • Since Steps 5-6 will be repeated n times, it’s better to write a search function

u32 search(INODE *inodePtr, char *name)

{

// search for name in the data blocks of current DIR inode

// if found, return its ino; else return 0

}

Then all we have to do is to call search() n times, as sketched below.

Assume: n, name[0], …., name[n-1] are globals

INODE *ip points at INODE of /

for (i=0; i<n; i++){

ino = search(ip, name[i])

if (!ino){ // can’t find name[i], exit;}

use ino to read in INODE and let ip point to INODE

}

If the search loop ends successfully, ip must point at the INODE of pathname. Traversing large EXT2/3 file systems with many groups is similar. The reader may consult Chap. 3 of [Wang, 2015] for details.

2. Convert Pathname to INODE

Given a device containing an EXT2 file system and a pathname, .e.g. /a/b/c/d, write a C function.

INODE *path2inode(int fd, char *pathname) // assume fd=file descriptor

which returns an INODE pointer to the file’s inode or 0 if the file in inaccessible (see Problem 7). As will be seen shortly, the path2inode() function is the most important function in a file system.

3. Display INODE Disk Blocks

Write a C program, showblock, which prints all disk block (numbers) of a file (see Problem 8).

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *