File System Level-1 Functions in Unix/Linux

1. mkdir: The command

mkdir pathname

makes a new directory with pathname. The permission bits of the new directory are set to the default value 0755 (owner can access and rw, others can access but can only READ).

1.  Algorithm of mkdir

mkdir creates an empty directory with a data block containing the default. and .. entries. The algorithm of mkdir is

/********* Algorithm of mkdir pathname*********/

  •  divide pathname into dirname and basename, e.g. pathname=/a/b/c, then

dirname=/a/b; basename=c;

  •  // dirname must exist and is a DIR:

pino = getino(dirname);

pmip = iget(dev, pino);

check pmip->INODE is a DIR

  •  // basename must not exist in parent DIR:

search(pmip, basename) must return 0;

  •  call kmkdir(pmip, basename) to create a DIR;

kmkdir() consists of 4 major steps:

    •  Allocate an INODE and a disk block:

ino = ialloc(dev);

blk = balloc(dev);

    •  mip = iget(dev, ino) // load INODE into a minode

initialize mip->INODE as a DIR INODE;

mip->INODE.i_block[0] = blk;

other i_block[ ] = 0;

mark minode modified (dirty);

iput(mip); // write INODE back to disk

    •  make data block 0 of INODE to contain . and .. entries;

write to disk block blk.

    •  enter_child(pmip, ino, basename); which enters

(ino, basename) as a direntry to the parent INODE;

  •  increment parent INODE’s links_count by 1 and mark pmip dirty;

iput(pmip);

Most steps of the mkdir algorithm are self-explanatory. Only step (4) needs more explanation. We illustrate step (4) in more detail by example code. In order to make a directory, we need to allocate an inode from the inodes bitmap, and a disk block from the blocks bitmap, which rely on test and set bits in the bitmaps. In order to maintain file system consistency, allocating an inode must decrement the free inodes count in both superblock and group descriptor by 1. Similarly, allocating a disk block must decrement the free blocks count in both superblock and group descriptor by 1. It is also worth noting that bits in the bitmaps count from 0 but inode and block numbers count from 1. The following code segments show the detailed steps of (4).

(4).1 Allocate Inode and Disk Block:

// tst_bit, set_bit functions

int tst_bit(char *buf, int bit){

return buf[bit/8] & (1 << (bit % 8));

}

int set_bit(char *buf, int bit){

buf[bit/8] |= (1 << (bit % 8));

}

int decFreeInodes(int dev)

{

// dec free inodes count in SUPER and GD

get_block(dev, 1, buf);

sp = (SUPER *)buf;

sp->s_free_inodes_count–;

put_block(dev, 1, buf);

get_block(dev, 2, buf);

gp = (GD *)buf;

gp->bg_free_inodes_count–;

put_block(dev, 2, buf);

}

int ialloc(int dev)

{

int i;

char buf[BLKSIZE];

// use imap, ninodes in mount table of dev

MTABLE *mp = (MTABLE *)get_mtable(dev);

get_block(dev, mp->imap, buf);

for (i=0; i<mp->ninodes; i++){

if (tst_bit(buf, i)==0){

set_bit(buf, i);

put_block(dev, mp->imap, buf);

// update free inode count in SUPER and GD

decFreeInodes(dev);

return (i+1);

}

}

return 0; // out of FREE inodes

}

Allocation of disk blocks is similar, except it uses the blocks bitmap and it decrements the free blocks count in both superblock and group descriptor. This is left as an exercise.

Exercise 11.10 Implement the function

int balloc(int dev)

which allocates a free disk block (number) from a device.

(4).2. Create INODE: The following code segment creates an INODE=(dev, ino) in a minode, and writes the INODE to disk.

MINODE *mip = iget(dev, ino);

INODE *ip = &mip->INODE;

ip->i_mode = 0x41ED;        // 040755: DIR type and permissions

ip->i_uid = running->uid;   // owner uid

ip->i_gid = running->gid;   // group Id

ip->i_size = BLKSIZE;       // size in bytes

ip->i_links_count = 2;      // links count=2 because of . and ..

ip->i_atime = ip->i_ctime = ip->i_mtime = time(0L);

ip->i_blocks =2;            // LINUX: Blocks count in 512-byte chunks

ip->i_block[0] = bno;       // new DIR has one data block

ip->i_block[1] to ip->i_block[14] = 0;

mip->dirty = 1;            // mark minode dirty

iput(mip);                 // write INODE to disk

(4).3. Create data block for new DIR containing . and .. entries

char buf[BLKSIZE];

bzero(buf, BLKSIZE); // optional: clear buf[ ] to 0

DIR *dp = (DIR *)buf;

// make . entry dp->inode = ino;

dp->rec_len = 12;

dp->name_len = 1; dp->name[0] = ‘.’;

// make .. entry: pino=parent DIR ino, blk=allocated block

dp = (char *)dp + 12;

dp->inode = pino;

dp->rec_len = BLKSIZE-12; // rec_len spans block

dp->name_len = 2;

dp->name[0] = d->name[1] = ‘.’;

put_block(dev, blk, buf); // write to blk on diks

(4).4. Enter new dir_entry into parent directory: The function

int enter name(MINODE *pip, int ino, char *name)

enters a [ino, name] as a new dir_entry into a parent directory. The enter_name algorithm consists of several steps.

/****************** Algorithm of enter name *******************/

for each data block of parent DIR do // assume: only 12 direct blocks

{

if (i_block[i]==0) BREAK;      // to step (5) below

    •  Get parent’s data block into a buf[ ];
    •  In a data block of the parent directory, each dir_entry has an ideal length

ideal_length = 4*[ (8 + namelen + 3)/4 ]        //a multiple of 4

All dir_entries rec_len = ideal_length, except the last entry. The rec_len of the LAST entry is to the end of the block, which may be larger than its ideal_length.

    •  In order to enter a new entry of name with n_len, the needed length is

need_length = 4*[ (8 + n_len + 3)/4 ]           //a multiple of 4

    •  Step to the last entry in the data block:

get_block(parent->dev, parent->INODE.i_block[i], buf);

dp = (DIR *)buf;

cp = buf;

while (cp + dp->rec_len < buf + BLKSIZE){

cp += dp->rec_len;

dp = (DIR *)cp;

}

// dp NOW points at last entry in block

remain = LAST entry’s rec_len – its ideal_length;

if (remain >= need_length){

enter the new entry as the LAST entry and

trim the previous entry rec_len to its ideal_length;

}

goto step (6);

}

The following diagrams show the block contents before and after entering [ino, name] as a new entry.

(5). If no space in existing data block(s):

Allocate a new data block; increment parent size by BLKSIZE;

Enter new entry as the first entry in the new data block with rec_len=BLKSIZE.

The following diagrams shows the new data block containing only one entry.

  1. creat: Creat creates an empty regular file. The algorithm of creat is

2. Algorithm of creat

/************** Algorithm of creat *************/

creat(char * pathname)

{

This is similar to mkdir() except

    1.  the INODE.i_mode field is set to REG file type, permission bits set to 0644 = rw-r–r–, and
    2.  no data block is allocated for it, so the file size is 0.
    3.  links_count =1; Do not increment parent INODE’s links_count

}

It is noted that the above creat algorithm differs from Unix/Linux in that it does not open the file for WRITE mode and return a file descriptor. In practice, creat is rarely used as a stand-alone function. It is used internally by the open() function, which may create a file, open it for WRITE and return a file descriptor. The open operation will be implemented later in level-2 of the file system.

3. Implementation of mkdir-creat

The programming task here is to implement mkdir-creat functions and add the corresponding mkdir and creat commands to the base file system. Demonstrate the system by testing the mkdir and creat functions. Figure 11.9 shows the sample outputs of the resulting file system. It shows the detailed steps of executing the command mkdir dirl, as described in the mkdir Algorithm.

  1. rmdir: The command

rmdir dirname

removes a directory. As in Unix/Linux, in order to remove a DIR, the directory must be empty, for the following reasons. First, removing a non-empty directory implies the removal of all the files and subdirectories in the directory. Although it is possible to implement a recursive rmdir operation, which removes an entire directory tree, the basic operation is still remove one directory at a time. Second, a non-empty directory may contain files that are actively in use, e.g. files opened for read/write, etc. Removing such a directory is clearly unacceptable. Although it is possible to check whether there are any active files in a directory, doing this would incur too much overhead. The simplest way out is to require that a directory must be empty in order to be removed. The algorithm of rmdir is.

4. Algorithm of rmdir

/************ Algorithm of rmdir *************/

    •  get in-memory INODE of pathname:

ino = getino(pathanme);

mip = iget(dev, ino);

    •  verify INODE is a DIR (by INODE.i_mode field);

minode is not BUSY (refCount = 1);

verify DIR is empty (traverse data blocks for number of entries = 2);

    •  /* get parent’s ino and inode */

pino = findino(); //get pino from .. entry in INODE.i_block[0]

pmip = iget(mip->dev, pino);

    •  /* get name from parent DIR’s data block

findname(pmip, ino, name); //find name from parent DIR

    •  remove name from parent directory */

rmchild(pmip, name);

    •  dec parent links_count by 1; mark parent pimp dirty;

iput(pmip);

    •  /* deallocate its data blocks and inode */

bdalloc(mip->dev, mip->INODE.i_blok[0];

idalloc(mip->dev, mip->ino);

iput(mip);

In the rmdir algorithm, most steps are simple and self-explanatory. We only need to explain the following steps in more detail.

  1. How to test DIR empty: Every DIR’s links_count starts with 2 (for the. and .. entries). Each subdirectory increments its links_count by 1 but regular files do not increment the links_count of the parent directory. So, if a DIR’s links_count is greater than 2, it is definitely not empty. However, if links_count = 2, the DIR may still contain regular files. In that case, we must traverse the DIR’s data blocks to count the number of dir_entries, which must be greater than 2. This can be done by the same technique of stepping through dir_entris in the data blocks of a DIR INODE.
  2. How to deallocate inode and data block: When removing a DIR, we must deallocate its inode and data block (numbers). The function

idalloc(dev, ino)

deallocates an inode (number). It clears the ino’s bit in the device’s inodes bitmap to 0. Then it increments the free inodes count in both superblock and group descriptor by 1.

int clr_bit(char *buf, int bit) // clear bit in char buf[BLKSIZE]

{ buf[bit/8] &= ~(1 << (bit%8)); }

int incFreeInodes(int dev)

{

char buf[BLKSIZE];

// inc free inodes count in SUPER and GD

get_block(dev, 1, buf);

sp = (SUPER *)buf;

sp->s_free_inodes_count++;

put_block(dev, 1, buf);

get_block(dev, 2, buf);

gp = (GD *)buf;

gp->bg_free_inodes_count++;

put_block(dev, 2, buf);

}

int idalloc(int dev, int ino)

{

int i;

char buf[BLKSIZE];

MTABLE *mp = (MTABLE *)get_mtable(dev);

if (ino > mp->ninodes){ // niodes global

printf(“inumber %d out of range\n”, ino);

return;

}

// get inode bitmap block

get_block(dev, mp->imap, buf);

clr_bit(buf, ino-1);

// write buf back

put_block(dev, mp->imap, buf);

// update free inode count in SUPER and GD

incFreelnodes(dev);

}

Deallocating a disk block number is similar, except it uses the device’s blocks bitmap and it increaments the free blocks count in both superblock and groupdescriptor. This is left as an exercise.

Exercise 11.11 Implement a bdalloc(int dev, int bno) function which deallocates a disk block (number) bno.

  1. Remove dir_entry form parent DIR: The function

rm_child(MINODE *pmip, char *name)

removes the dir_entry of name from a parent directory minode pointed by pmip. The algorithm of rm_child is as follows.

/*************** Algorithm of rm child ****************/

  • Search parent INODE’s data block(s) for the entry of name
  • Delete name entry from parent directory by
    • if (first and only entry in a data block){

In this case, the data block looks like the following diagram.

deallocate the data block; reduce parent’s file size by BLKSIZE;

compact parent’s i_block[ ] array to eliminate the deleted entry if it’s between nonzero entries.

}

    •  else if LAST entry in block{

Absorb its rec_len to the predecessor entry, as shown in the following diagrams.

xxxxxjlNO rlen nlen NAME |yyy (add rec_len to yyy) |

—————————————————-

}

    • else: entry is first but not the only entry or in the middle of a block:

{

move all trailing entries LEFT to overlay the deleted entry;

add deleted rec_len to the LAST entry; do not change parent’s file

size;

The following diagrams illustrate the block contents before and after deleting such an entry.

How to move trailing entries LEFT? Hint: memcpy(dp, cp, size);

}

5. Implementation of rmdir

The programming task here is to implement the rmdir function and add the rmdir command to the file system. Compile and run the resulting program to demonstrate rmdir operation. Figure 11.10 shows the sample outputs of the resulting file system. The figure shows the detailed steps of removing a directory.

  1. link: The command

link old_file new_file

creates a hard link from new_file to old_file. Hard links can only be applied to regular files, not to DIRs, because linking to DIRs may create loops in the file system name space. Hard link files share the same inode. Therefore, they must be on the same device. The algorithm of link is as follows.

6. Algorithm of link

/******************** Algorithm of link *******************/

  • // verify old_file exists and is not a DIR;

oino = getino(old_file);

omip = iget(dev, oino);

check omip->INODE file type (must not be DIR).

  •  // new_file must not exist yet:

getino(new_file) must return 0;

  •  creat new_file with the same inode number of old_file:

parent = dirname(new_file);

child = basename(new_file);

pino = getino(parent);

pmip = iget(dev, pino);

// creat entry in new parent DIR with same inode number of old_file

enter_name(pmip, oino, child);

  •  omip->INODE.i_links_count++; // inc INODE’s links_count by 1

omip->dirty =1;               // for write back by iput(omip)

iput(omip);

iput(pmip);

We illustrate the link operation by an example. Consider.

link /a/b/c /x/y/z

The following diagrams show the outcome of the link operation. It adds an entry z to the data block of /x/y. The inode number of z is the same ino of a/b/c, so they share the same INODE, whose links_count is incremented by 1.

  1. unlink: The command

unlink filename

unlinks a file. It decrements the file’s links_count by 1 and deletes the file name from its parent DIR. When a file’s links_count reaches 0, the file is truly removed by deallocating its data blocks and inode. The algorithm of unlink is

7. Algorithm of unlink

/************** Algorithm of unlink *************/

(1). get filenmae’s minode:

ino = getino(filename);

mip = iget(dev, ino);

check it’s a REG or symbolic LNK file; can not be a DIR

(2). // remove name entry from parent DIR’s data block:

parent = dirname(filename); child = basename(filename);

pino = getino(parent);

pimp = iget(dev, pino);

rm_child(pmip, ino, child);

pmip->dirty = 1;

iput(pmip);

(3). // decrement INODE’s link_count by 1

mip->INODE.i_links_count–;

(4).       if (mip->INODE.i_links_count > 0)

mip->dirty = 1;    // for write INODE back to disk

(5).       else{ // if links_count = 0: remove filename

deallocate all data blocks in INODE;

deallocate INODE;

}

iput(mip);  // release mip

  1. symlink: The command

symlink old_file new_file

creates a symbolic link from new_file to old_file. Unlike hard links, symlink can link to anything, including DIRs, even files not on the same device. The algorithm of symlink is

8. Algorithm of symlink

/********* Algorithm of symlink old_file new_file *********/

(1). check: old_file must exist and new_file not yet exist;

(2). creat new_file; change new_file to LNK type;

(3). // assume length of old_file name <= 60 chars

store old_file name in newfile’s INODE.i_block[ ] area.

set file size to length of old_file name

mark new_file’s minode dirty;

iput(new_file’s minode);

(4). mark new_file parent minode dirty;

iput(new_file’s parent minode);

  1. readlink: The function

int readlink(file, buffer)

reads the target file name of a symbolic file and returns the length of the target file name. The algorithm of readlink() is.

9. Algorithm of readlink

/************* Algorithm of readlink (file, buffer) *************/

(1). get file’s INODE in memory; verify it’s a LNK file

(2). copy target filename from INODE.i_block[ ] into buffer;

(3). return file size;

10. Other Level-1 Functions

Other level-1 functions include access, chmod, chown, change file’s time fields, etc. The operations of all such functions are of the same pattern:

(1). get the in-memory INODE of a file by

ino = getino(pathname);

mip = iget(dev,ino);

(2). get information from INODE or modify the INODE;

(3). if INODE is modified, set mip->dirty to zonzero for write back;

(4). iput(mip);

Examples of Other Level-1 Operations

  1. chmod oct filename: Change filename’s permission bits to octal value
  2. utime filename: change file’s access time to current time:

11. Programming Project #1: Implementation of File System Level-1

The programming project #1 is to complete the level-1 implementation of the file system and demonstrate the file system. Figure 11.11 shows the sample outputs of running the Level 1 implemen­tation of the file system. The ls command shows that hi is a symbolic link to the directory dir1.

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *