1. h file
This file contains the data structure types of the EXT2 file system, such as superblock, group descriptor, inode and directory entry structures. In addition, it also contains the open file table, mount table, PROC structures and constants of the file system.
/***** type.h file for EXT2 FS *****/
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <linux/ext2_fs.h>
#include <libgen.h>
#include <string.h>
#include <sys/stat.h>
// define shorter TYPES for convenience
typedef struct ext2_group_desc GD;
typedef struct ext2_super_block SUPER;
typedef struct ext2_inode INODE;
typedef struct ext2_dir_entry_2 DIR;
#define BLKSIZE 1024
// Block number of EXT2 FS on FD
#define SUPERBLOCK 1
#define GDBLOCK 2
#define ROOT_INODE 2
// Default dir and regular file modes
#define DIR_MODE 0x41ED
#define FILE_MODE 0x81AE
#define SUPER_MAGIC 0xEF53
#define SUPER_USER 0
// Proc status
#define FREE 0
#define BUSY 1
// file system table sizes
#define NMINODE 100
#define NMTABLE 10
#define NPROC 2
#define NFD 10
#define NOFT 40
// Open File Table
typedef struct oft{
int mode;
int refCount;
struct minode *minodePtr;
int offset;
}OFT;
// PROC structure
typedef struct proc{
struct Proc *next;
int pid;
int uid;
int gid;
int ppid;
int status;
struct minode *cwd;
OFT *fd[NFD];
}PROC;
(2). global.c file: This file contains global variables of the file system. Examples of global variables are
When the file system starts, we initializes all the global data structures and let running point at PROC[0], which is process P0 of the superuser (uid = 0). As in a real file system, every operation is due to the current running process. We begin with a superuser process because it does not need any file protection. File protection by permission checking will be enforced later in Level 3 of the FS implementation.
int fs_init()
{
int i,j;
for (i=0; i<NMINODE; i++) // initialize all minodes as FREE
minode[i].refCount = 0;
for (i=0; i<NMTABLE; i++) // initialize mtables as FREE
mtable[i].dev = 0;
for (i=0; i<NOFT; i++) // initialize ofts as FREE
oft[i].refCount = 0;
for (i=0; i<NPROC; i++){ // initialize PROCs
proc[i].status = READY; // ready to run
proc[i].pid = i; // pid = 0 to NPROC-1
proc[i].uid = i; // P0 is a superuser process
for (j=0; j<NFD; j++) // all file descriptors are NULL
proc[i].fd[j] = 0; // link list
}
proc[NPROC-1].next = &proc[0]; // circular list
running = &proc[0]; // P0 runs first
}
During file system operation, global data structures are regarded as system resources, which are used and released dynamically. Each set of resources is managed by a pair of allocate and deallocate functions. For example, mialloc() allocates a FREE minode for use, and midalloc() releases a used minode. Other resource management functions are similar, which will be shown later when they are actually needed.
MINODE *mialloc() // allocate a FREE minode for use
{
int i;
for (i=0; i<NMINODE; i++){
MIONDE *mp = &minode[i];
if (mp->refCount == 0){
mp->refCount = 1;
return mp;
}
}
printf(“FS panic: out of minodes\n”);
return 0;
}
int midalloc(MINODE *mip) // release a used minode
{
mip->refCount = 0;
}
2. Utility Functions
(3). util.c file: This file contains commonly used utility functions of the file system. The most important utility functions are read/write disk block functions, iget(), iput() and getino(), which are explained in more detail.
(3).1. get_block/put_block functions: We assume that a block device, e.g. a real or virtual disk, can only be read or written in unit of block size. For real disks, this is due to hardware constraints. For virtual disks, we assume that read/write is also by block size, so that the code can be ported to real disks if desired. For a virtual disk, we first open it for R|W mode and use the file descriptor as the device number. The following functions read/write a virtual disk block into/from a buffer area in memory.
int get_block(int dev, int blk, char *buf)
{
lseek(dev, blk*BLKSIZE, SEEK_SET);
int n = read(dev, buf, BLKSIZE);
if (n<0) printf(“get_block [%d %d] error\n”, dev, blk);
}
int put_block(int dev, int blk, char *buf)
{
lseek(dev, blk*BLKSIZE, SEEK_SET);
int n = write(dev, buf, BLKSIZE);
if (n != BLKSIZE)
printf(“put_block [%d %d] error\n”, dev, blk);
}
(3).2. iget(dev, ino) function: This function returns a pointer to the in-memory minode containing the INODE of (dev, ino). The returned minode is unique, i.e. only one copy of the INODE exists in memory. In a real file system, the returned minode is locked for exclusive use until it is either released or unlocked. For simplicity, we shall assume that minode locking is unnecessary, which will be explained later.
MINODE *iget(int dev, int ino)
{
MINODE *mip;
MTABLE *mp;
INODE *ip;
int i, block, offset;
char buf[BLKSIZE];
// serach in-memory minodes first
for (i=0; i<NMINODES; i++){
MINODE *mip = &MINODE[i];
if (mip->refCount && (mip->dev==dev) && (mip->ino==ino)){
mip->refCount++;
return mip;
}
}
// needed INODE=(dev,ino) not in memory
mip = mialloc(); // allocate a FREE minode
mip->dev = dev; mip->ino = ino; // assign to (dev, ino)
block = (ino-1)/8 + iblock; // disk block containing this inode
offset= (ino-1)%8; // which inode in this block
get_block(dev, block, buf);
ip = (INODE *)buf + offset;
mip->INODE = *ip; // copy inode to minode.INODE
// initialize minode mip->refCount = 1;
mip->mounted = 0;
mip->dirty = 0;
mip->mountptr = 0;
return mip;
}
(3).3. The iput(INODE *mip) function: This function releases a used minode pointed by mip. Each minode has a refCount, which represents the number of users that are using the minode. iput() decrements the refCount by 1. If the refCount is non-zero, meaning that the minode still has other users, the caller simply returns. If the caller is the last user of the minode (refCount = 0), the INODE is written back to disk if it is modified (dirty).
int iput(MINODE *mip)
{
INODE *ip;
int i, block, offset;
char buf[BLKSIZE];
if (mip==0) return;
mip->refCount–;
if (mip->refCount > 0) return;
if (mip->dirty == 0) return;
// write INODE back to disk
block = (mip->ino – 1) / 8 + iblock;
offset = (mip->ino – 1) % 8;
// get block containing this inode get_block(mip->dev, block, buf);
ip = (INODE *)buf + offset; // ip points at INODE
*ip = mip->INODE; // copy INODE to inode in block
put_block(mip->dev, block, buf); // write back to disk
midalloc(mip); // mip->refCount = 0;
}
(3).4. getino() function: The getino() function implements the file system tree traversal algorithm. It returns the INODE number (ino) of a specified pathname. To begin with, we assume that in the level-1 file system implementation, the file system resides on a single root device, so that there are no mounted devices and mounting point crossings. Mounted file systems and mounting point crossing will be considered later in level-3 of the file system implementation. Thus, the getino() function essentially returns the (dev, ino) of a pathname. The function first uses the tokenize() function to break up pathname into component strings. We assume that the tokenized strings are in a global data area, each pointed by a name[i] pointer and the number of token strings is nname. Then it calls the search() function to search for the token strings in successive directories. The following shows the tokenize() and search() functions.
char *name[64]; // token string pointers
char gline[256]; // holds token strings, each pointed by a name[i]
int nname; // number of token strings
int tokenize(char *pathname)
{
char *s;
strcpy(gline, pathname); nname = 0;
s = strtok(gline, “/”); while(s){
name[nname++] = s;
s = strtok(0, “/”);
}
}
int search(MINODE *mip, char *name)
{
int i;
char *cp, temp[256], sbuf[BLKSIZE];
DIR *dp;
for (i=0; i<12; i++){ // search DIR direct blocks only
if (mip->INODE.i_block[i] == 0)
return 0;
get_block(mip->dev, mip->INODE.i_block[i], sbuf);
dp = (DIR *)sbuf;
cp = sbuf;
while (cp < sbuf + BLKSIZE){
strncpy(temp, dp->name, dp->name_len);
temp[dp->name_len] = 0;
printf(“%8d%8d%8u %s\n”,
dp->inode, dp->rec_len, dp->name_len, temp);
if (strcmp(name, temp)==0){
printf(“found %s : inumber = %d\n”, name, dp->inode);
return dp->inode;
}
cp += dp->rec_len;
dp = (DIR *)cp;
}
}
return 0;
}
int getino(char *pathname)
{
MINODE *mip; int i, ino;
if (strcmp(pathname, “/”)==0){
return 2; // return root ino=2
}
if (pathname[0] == ‘/’)
mip = root; // if absolute pathname: start from root
else
mip = running->cwd; // if relative pathname: start from CWD
mip->refCount++; // in order to iput(mip) later
tokenize(pathname); // assume: name[ ], nname are globals
for (i=0; i<nname; i++){ // search for each component string
if (!S_ISDIR(mip->INODE.i_mode)){ // check DIR type
printf(“%s is not a directory\n”, name[i]);
iput(mip);
return 0;
}
ino = search(mip, name[i]);
if (!ino){
printf(“no such component name %s\n”, name[i]);
iput(mip);
return 0;
}
iput(mip); // release current minode
mip = iget(dev, ino); // switch to new minode
}
iput(mip);
return ino;
}
(3).5. Use of getino()/iget()/iput(): In a file system, almost every operation begins with a pathname, e.g. mkdir pathname, cat pathname, etc. Whenever a pathname is specified, its inode must be loaded into memory for reference. The general pattern of using an inode is
. ino = getino(pathname);
. mip = iget(dev, ino);
. use mip->INODE, which may modify the INODE;
. iput(mip);
There are only a few exceptions to this usage pattern. For instance,
chdir : iget the new DIR minode but iput the old DIR minode.
open : iget the minode of a file, which is released when the file is closed.
mount: iget the minode of mountPoint, which is released later by umount.
In general, iget and iput should appear in pairs, like a pair of matched parentheses. We may rely on this usage pattern in the implementation code to ensure that every INODE is loaded and then released properly.
(4). Minodes Locking: In a real file system, each minode has a lock field, which ensures that the minode can only be accessed by one process at a time, e.g. when modifying the INODE. Unix kernel uses a busy flag and sleep/wakeup to synchronize processes accessing the same minode. In other systems, each minode may have a mutex lock or a semaphore lock. A process is allowed to access a minode only if it holds the minode lock. The reason for minodes locking is as follows.
Assume that a process Pi needs the inode of (dev, ino), which is not in memory. Pi must load the inode into a minode entry. The minode must be marked as (dev, ino) to prevent other processes from loading the same inode again. While loading the inode from disk Pi may wait for I/O completion, which switches to another process Pj. If Pj needs exactly the same inode, it would find the needed minode already exists. Without the lock, Pj would proceed to use the minode before it is even loaded in yet. With the lock, Pj must wait until the minode is loaded, used and then released by Pi. In addition, when a process read/write an opened file, it must lock the file’s minode to ensure that each read/write operation is atomic. For simplicity, we shall assume that only one process runs at a time, so that the lock is unnecessary. However, the reader should beware that minodes locking is necessary in a real file system.
Exercise 11.7 Design and implement a scheme, which uses the minode[ ] area as a cache memory for in-memory INODES. Once an INODE is loaded into a minode slot, try to keep it in memory for as long as possible even if no process is actively using it.
3. Mount-Root
(5). mount_root.c file: This file contains the mount_root() function, which is called during system initialization to mount the root file system. It reads the superblock of the root device to verify the device is a valid EXT2 file system.Then it loads the root INODE (ino = 2) of the root device into a minode and sets the root pointer to the root minode. It also sets the CWD of all PROCs to the root minode. A mount table entry is allocated to record the mounted root file system. Some key information of the root device, such as the number of inodes and blocks, the starting blocks of the bitmaps and inodes table, are also recorded in the mount table for quick access.
/************ FSl.l.c file ************/
#include “type.h”
#include “util.c”
// global variables
MINODE minode[NMINODES], *root;
MTABLE mtable[NMTABLE];
PROC proc[NPROC], *running;
int ninode, nblocks, bmap, imap, iblock;
int dev;
char gline[25], *name[16]
int nname;
char *rootdev = “mydisk”
int fs_init()
{
}
int mount_root(char *rootdev) // mount root file system
{
int i;
MTABLE *mp;
SUPER *sp;
GD *gp;
char buf[BLKSIZE];
dev = open(rootdev, O_RDWR);
if (dev < 0){
printf(“panic : can’t open root device\n”)
exit(1);
}
/* get super block of rootdev */
get_block(dev, 1, buf);
sp = (SUPER *)buf;
/* check magic number */
if (sp->s_magic != SUPER_MAGIC){
printf(“super magic=%x : %s is not an EXT2 filesys\n”,
sp->s_magic, rootdev);
exit(0);
}
// fill mount table mtable[0] with rootdev information
mp = &mtable[0]; // use mtable[0] mp->dev = dev;
// copy super block info into mtable[0]
ninodes = mp->ninodes = sp->s_inodes_count;
nblocks = mp->nblocks = sp->s_blocks_count;
strcpy(mp->devName, rootdev);
strcpy(mp->mntName, “/”);
get_block(dev, 2, buf);
gp = (GD *)buf;
bmap = mp->bmap = gp->bg_blocks_bitmap;
imap = mp->imap = gp->bg_inodes_bitmap;
iblock = mp->iblock = gp->bg_inode_table;
printf(“bmap=%d imap=%d iblock=%d\n”, bmap, imap iblock);
// call iget(), which inc minode’s refCount
root = iget(dev, 2); // get root inode
mp->mntDirPtr = root; // double link
root->mntPtr = mp;
// set proc CWDs
for (i=0; i<NPROC; i++) // set proc’s CWD
proc[i].cwd = iget(dev, 2); // each inc refCount by 1
printf(“mount : %s mounted on / \n”, rootdev);
return 0;
}
int main(int argc, char *argv[ ])
{
char line[128], cmd[16], pathname[64];
if (argc > 1)
rootdev = argv[1];
fs_init();
mount_root(rootdev);
while(1){
printf(“P%d running: “, running->pid);
printf(“input command : “);
fgets(line, 128, stdin);
line[strlen(line)-1] = 0;
if (line[0]==0)
continue;
sscanf(line, “%s %s”, cmd, pathname);
if (!strcmp(cmd, “ls”)) ls(pathname);
if (!strcmp(cmd, “cd”)) chdir(pathname);
if (!strcmp(cmd, “pwd”)) pwd(running->cwd);
if (!strcmp(cmd, “quit”)) quit();
}
}
int quit() // write all modified minodes to disk
{
int i;
for (i=0; i<NMINODES; i++){
MINODE *mip = &minode[i];
if (mip->refCount && mip->dirty){
mip->refCount = 1;
iput(mip);
}
}
exit(0);
}
How to Is: Is [pathname] lists the information of either a directory or a file. In Chap. 5 (Sect. 5.4), we showed an ls program, which works as follows.
- ls_dir(dirname): use opendir() and readdir() to get filenames in the directory. For each filename, call ls_file(filename).
- ls_file(filename): stat the filename to get file information in a STAT structure. Then list the STAT information.
Since the stat system call essentially returns the same information of a minode, we can modify the original ls algorithm by using minodes directly. The following shows the modified ls algorithm
/************************ Algorithm of ls *******************************/
-
- From the minode of a directory, step through the dir_entries in the data blocks of the minode.INODE. Each dir_entry contains the inode number, ino, and name of a file. For each dir_entry, use iget() to get its minode, as in
MINODE *mip = iget(dev, ino);
Then, call ls_file(mip, name).
-
- ls_file(MINODE *mip, char *name): use mip->INODE and name to list the file information.
How to chdir [pathname]: The algorithm of chdir is as follows.
/*************** Algorithm of chdir***************/
-
-
- int ino = getino(pathname);
- MINODE *mip = iget(dev, ino);
- Verify mip->INODE is a DIR
- iput(running->cwd);
- running->cwd = mip;
-
HOW TO pwd: The following shows the algorithm of pwd, which uses recursion on the directory minodes.
/*************** Algorithm of pwd ****************/
rpwd(MINODE *wd){
-
-
- if (wd==root) return;
- from wd->INODE.i_block[0], get my_ino and parent_ino
- pip = iget(dev, parent_ino);
- from pip->INODE.i_block[ ]: get my_name string by my_ino as LOCAL
- rpwd(pip); // recursive call rpwd(pip) with parent minode
- print “/%s”, my_name;
-
}
pwd(MINODE *wd){
if (wd == root) print “/”
else rpwd(wd);
}
// pwd start:
pwd(running->cwd);
Exercise 11.8 Implement step (2) of the pwd algorithm as a utility function.
int get_myino(MINODE *mip, int *parent_ino)
which returns the inode number of . and that of .. in parent_ino.
Exercise 11.9 Implement step (4) of the pwd algorithm as a utility function.
int get_myname(MINODE *parent_minode, int my_ino, char *my_name)
which returns the name string of a dir_entry identified by my_ino in the parent directory.
4. Implementation of Base File System
The programming task here is to complete the above mount_root.c program as the base file system. Run it with a virtual disk containing an EXT2 file system. The outputs of the base file system should look like Fig. 11.8. The figure only shows the results of the ls command. It should also support the cd and pwd commands.
Fig. 11.8 Sample outputs of mount_root
Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.