EXT2 File System in Unix/Linux

1. EXT2 File System

For many years, Linux used EXT2 (Card et al. 1995) as the default file system. EXT3 (ETX3, 2014) is an extension of EXT2. The main addition in EXT3 is a journal file, which records changes made to the file system in a journal log. The log allows for quicker recovery from errors in case of a file system crash. An EXT3 file system with no error is identical to an EXT2 file system. The newest extension of EXT3 is EXT4 (Cao et al. 2007). The major change in EXT4 is in the allocation of disk blocks. In EXT4, block numbers are 48 bits. Instead of discrete disk blocks, EXT4 allocates contiguous ranges of disk blocks, called extents. Other than these minor changes, the file system structure and file operations remain the same. The purpose of this book is to teach the principles of file systems. Large file storage capacity is not the primary goal. Principles of file system design and implementation, with an emphasis on simplicity and compatibility with Linux, are the major focal points. For these reasons, we shall use ETX2 as the file system. The goal of this chapter is to lead the reader to implement a complete EXT2 © Springer International Publishing AG, part of Springer Nature 2018 file system that is totally Linux compatible. The premise is that if the reader understands one file system well, it should be relatively easy to adapt to any other file systems.

2. EXT2 File System Data Structures

2.1. Create Virtual Disk by mkfs

Under Linux, the command

mke2fs [-b blksize -N ninodes] device nblocks

creates an EXT2 file system on a device with nblocks blocks of blksize bytes per block and ninodes inodes. The device can be either a real device or a virtual disk file. If blksize is not specified, the default block size is 1 KB. If ninoides is not specified, mke2fs will compute a default ninodes number based on nblocks. The resulting EXT2 file system is ready for use in Linux. As a specific example, the following commands

dd if=/dev/zero of=vdisk bs=1024 count=1440

mke2fs vdisk 1440

creates an EXT2 files system on a virtual disk file named vdisk with 1440 blocks of 1 KB block size.

2.2. Virtual Disk Layout

The layout of such an EXT2 file system is shown in Fig. 11.1.

To begin with, we shall assume this basic file system layout first. Whenever appropriate, we point out the variations, including those in large EXT2/3 file systems on hard disks. The following briefly explains the contents of the disk blocks.

Block#0: Boot Block: B0 is the boot block, which is not used by the file system. It is used to contain a booter program for booting up an operating system from the disk.

2.3. Superblock

Block#1: Superblock: (at byte offset 1024 in hard disk partitions): B1 is the superblock, which contains information about the entire file system. Some of the important fields of the superblock structure are shown below.

The meanings of most superblock fields are obvious. Only a few fields deserve more explanation.

s_first_data_block = 0 for 4KB block size and 1 for 1KB block size. It is used to determine the start block of group descriptors, which is s_first_data_block + 1.

s_log_block_size determines the file block size, which is 1KB*(2**s_log_block_size), e.g.. 0 for 1KB block size, 1 for 2KB block size and 2 for 4KB block size, etc. The most often used block size is 1KB for small file systems and 4KB for large file systems.

s_mnt_count= number of times the file system has been mounted. When the mount count reaches the max_mnt_count, a fsck session is forced to check the file system for consistency.

s_magic is the magic number which identifies the file system type. For EXT2/3/4 files systems, the magic number is 0xEF53.

2.4. Group Descriptors

Block#2: Group Descriptor Block (s_first_data_block+1 on hard disk): EXT2 divides disk blocks into groups. Each group contains 8192 (32 K on HD) blocks. Each group is described by a group descriptor structure.

Since a virtual floppy disk (FD) has only 1440 blocks, B2 contains only 1 group descriptor. The rest are 0’s. On hard disks with a large number of groups, group descriptors may span many blocks. The most important fields in a group descriptor are bg_block_bitmap, bg_inode_bitmap and bg_inode_table, which point to the group’s blocks bitmap, inodes bitmap and inodes start block, respectively. For the Linux formatted EXT2 file system, blocks 3 to 7 are reserved. So bmap=8, imap=9 and inode_table = 10.

2.5. Block and Inode Bitmaps

Block#8: Block Bitmap (Bmap): (bg_block_bitmap): A bitmap is a sequence of bits used to represent some kind of items, e.g. disk blocks or inodes. A bitmap is used to allocate and deallocate items. In a bitmap, a 0 bit means the corresponding item is FREE, and a 1 bit means the corresponding item is IN_USE. A FD has 1440 blocks but block#0 is not used by the file system. So the Bmap has only 1439 valid bits. Invalid bits are treated as IN_USE and set to 1’s.

Block#9: Inode Bitmap (Imap): (bg_inode_bitmap): An inode is a data structure used to represent a file. An EXT2 file system is created with a finite number of inodes. The status of each inode is represented by a bit in the Imap in B9. In an EXT2 FS, the first 10 inodes are reserved. So the Imap of an empty EXT2 FS starts with ten 1’s, followed by 0’s. Invalid bits are again set to 1’s.

2.6. Inodes

Block#10: Inodes (begin) Block: (bg_inode_table): Every file is represented by a unique inode structure of 128 (256 in EXT4) bytes. The essential inode fields are listed below.

In the inode structure, i_mode is a u16 or 2-byte unsigned integer.

    | 4  | 3 |   9     |

i_mode = |tttt|ugs|rwxrwxrwx|

In the i_mode field, the leading 4 bits specify the file type, e.g. tttt= 1000 for REG file, 0100 for DIR, etc. The next 3 bits ugs indicate the file’s special usage. The last 9 bits are the rwx permission bits for file protection.

The i_size field is the file size in bytes. The various time fields are number of seconds elapsed since 0 hr 0 min, 0 s of January 1, 1970. So each time filed is a very large unsigned integer. They can be converted to calendar form by the library function

char *ctime(&time_field)

which takes a pointer to a time field and returns a string in calendar form. For example,

printf(“%s”, ctime(&inode.i_atime); // note: pass & of time field prints i_atime in calendar form.

The i_block[15] array contains pointers to disk blocks of a file, which are

Direct blocks:                      i_block[0] to i_block[11], which point to direct disk blocks.

Indirect blocks:                   i_block[12] points to a disk block, which contains 256 (for 1KB BLKSIZE) block numbers, each points to a disk block.

Double Indirect blocks:      i_block[13] points to a block, which points to 256 blocks, each of which points to 256 disk blocks.

Triple Indirect blocks:        i_block[14] is the triple-indirect block. We may ignore this for “small” EXT2 file systems.

The inode size (128 or 256) is designed to divides block size (1 KB or 4 KB) evenly, so that every inode block contains an integral number of inodes. In the simple EXT2 file system, the number of inodes is (a Linux default) 184. The number of inodes blocks is equal to 184/8 = 23. So the inodes blocks include B10 to B32. Each inode has a unique inode number, which is the inode’s position in the inode blocks plus 1. Note that inode positions count from 0, but inode numbers count from 1. A 0 inode number means no inode. The root directory’s inode number is 2. Similarly, disk block numbers also count from 1 since block 0 is never used by a file system. A zero block number means no disk block.

2.7. Data Blocks

Data Blocks Immediately after the inodes blocks are data blocks for file storage. Assuming 184 inodes, the first real data block is B33, which is i_block[0] of the root directory /.

2.8. Directory Entries

EXT2 Directory Entries A directory contains dir_entry structures, which is

struct ext2_dir_entry_2{

u32 inode;                // inode number; count from 1, NOT 0

u16 rec_len;              // this entry’s length in bytes

u8 name_len;              // name length in bytes

u8 file_type;             // not used

char name[EXT2_NAME_LEN]; // name: 1-255 chars, no ending NULL

};

The dir_entry is an open-ended structure. The name field contains 1 to 255 chars without a terminating NULL. So the dir_entry’s rec_len also varies.

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *