File System Level-2 Functions in Unix/Linux

Level-2 of the file system implements read and write operations of file contents. It consists of the following functions: open, close, lseek, read, write, opendir and readdir.

1. Open Operation

In Unix/Linux, the system call

int open(char *filename, int flags);

opens a file for read or write, where flags is one of O_RDONLY, O_WRONLY, O_RDWR, which may be bitwise or-ed with O_CREAT, O_APPEND, O_TRUNC, etc. These symbolic constants are defined in the fcntl.h file. On success, open() returns a file descriptor (number), which is used in subsequent system calls, such as read(), write(), lseek() and close(), etc. For simplicity, we shall assume the parameter flags = 0|1i2i3 or RDWRRWAP for READ|WRITE| RDWRAPEND, respectively. The algorithm of open() is.

1. Algorithm of open

/******************* Algorithm of open *********************/

(1). get file’s minode:

ino = getino(filename);

if (ino==0){   // if file does not exist

creat(filename);           // creat it first, then

ino = getino(filename); // get its ino

}

mip = iget(dev, ino);

(2). allocate an openTable entry OFT; initialize OFT entries:

mode = 0(RD) or 1(WR) or 2(RW) or 3(APPEND)

minodePtr = mip;              // point to file’s minode

refCount = 1;

set offset = 0 for RD|WR|RW; set to file size for APPEND mode;

(3). Search for the first FREE fd[index] entry with the lowest index in PROC;

fd[index] = &OFT;          // fd entry points to the openTable entry;

(4). return index as file descriptor;

Figure 11.12 shows the data structure created by open. In the figure, (1) is the PROC structure of the process that calls open(). The returned file descriptor, fd, is the index of the fd[ ] array in the PROC structure. The contents of fd[fd] points to an OFT, which points to the minode of the file. The OFT’s refCount represents the number of processes that share the same instance of an opened file. When a process opens a file, the refCount in the OFT is set to 1. When a process forks a child process, the child process inherits all the opened file descriptors of the parent, which increments the refCount of every shared OFT by 1. When a process closes a file descriptor, it decrements OFT.refCount by 1. When OFT.refCount reaches 0, the file’s minode is released and the OFT is deallocated. The OFT’s offset is a conceptual pointer to the current byte position in the file for read/write. It is initialized to 0 for RD|WR| RW mode or to file size for APPEND mode.

2. Iseek

In Linux, the system call.

lseek(fd, position, whence); // whence=SEEK_SET or SEEK_CUR

sets the offset in the OFT of an opened file descriptor to the byte position either from the file beginning (SEEK_SET) or relative to the current position (SEEK_CUR). For simplicity, we shall assume that the new position is always from the file beginning. Once set, the next read/write begins from the current offset position. The algorithm of lseek is trivial. It only needs to check the requested position value is within the bounds of [0, fileSize-1]. We leave the implementation of lseek as an exercise.

Exercise 11.12 Write C code for the lseek function int lseek(int fd, int position).

  1. Close Operation

4. Read Regular Files

In Unix/Linux, the system call

int read(int fd, char *buf, int nbytes);

reads nbytes from an opened file descriptor into a buffer area in user space. The read system call is routed to the read function in the OS kernel. For regular files, the algorithm of read is.

/********* Algorithm of read(int fd, char *buf, int nbytes) *********/

(1). count =0; // number of bytes read

offset = OFT.offset; // byte offset in file to READ

compute bytes available in file: avil = fileSize – offset;

(2). while (nbytes && avil){

compute logical block: lbk = offset / BLKSIZE;

start byte in block:     start = offset % BLKSIZE;

(3).  convert logical block number, lbk, to physical block number, blk,

through INODE.i_block[ ] array;

(4).    get_block(dev, blk, kbuf); // read blk into char kbuf[BLKSIZE];

char *cp = kbuf + start;

remain = BLKSIZE – start;

(5).   while (remain){  // copy bytes from kbuf[ ] to buf[ ]

*buf++ = *cp++;

offset++; count++;           // inc offset, count;

remain–; avil–; nbytes–; // dec remain, avail, nbytes;

if (nbytes==0 || avil==0) break;

} // end of while(remain)

}     // end of while(nbytes && avil)

(6). return count;

The algorithm of read() can be best explained in terms of Fig. 11.13. Assume that fd is opened for READ. The offset in the OFT points to the current byte position in the file from where we wish to read nbytes. To the file system in kernel, a file is just a sequence of contiguous bytes, numbered from 0 to fileSize-1. As Fig. 11.13 shows, the current byte position, offset, falls in a logical block.

lbk = offset / BLKSIZE,

the byte to start read is.

start = offset % BLKSIZE

and the number of bytes remaining in the logical block is.

remain = BLKSIZE – start.

At this moment, the file has.

avil = fileSize – offset

bytes still available for read. These numbers are used in the read algorithm.

For small EXT2 files systems, the block size is 1 KB and files have at most double indirect blocks. For read, the algorithm of converting logical blocks to physical blocks is.

/* Algorithm of Converting Logical Blocks to Physical Blocks */

int map(INODE, lbk){                   // convert lbk to blk via INODE

if (lbk < 12)                    // direct blocks

blk = INODE.i_block[lbk];

else if (12 <= lbk < 12+256){     // indirect blocks

read INODE.i_block[12] into int ibuf[256];

blk = ibuf[lbk-12];

}

else{ // doube indirect blocks; see Exercise 11.13 below.

}

return blk;

}

Exercise 11.13 Complete the algorithm for converting double indirect logical blocks to physical blocks. Hint: Mailman’s algorithm.

Exercise 11.14 For simplicity and clarity, step (5) of the read algorithm transfers data one byte at a time and updates the control variable on each byte transfer, which are not very efficient. Optimize the code by transferring a maximal sized chunk of data at a time.

5. Write Regular Files

In Unix/Linux, the system call

int write(int fd, char buf[ ], int nbytes);

writes nbytes from buf in user space to an opened file descriptor and returns the actual number of bytes written. The write system call is routed to the write function in the OS kernel. For regular files, the algorithm of write is

/******* Algorithm of write(int fd, char *buf, int nbytes) *******/

(1). count =0;          // number of bytes written

(2). while (nbytes){

compute logical block: lbk = oftp->offset / BLOCK_SIZE;

compute start byte:  start = oftp->offset % BLOCK_SIZE;

(3). convert lbk to physical block number, blk, through the i_block[ ] array;

(4). read_block(dev, blk, kbuf); // read blk into kbuf[BLKSIZE];

char *cp = kbuf + start; remain = BLKSIZE – start;

(5). while (remain){                // copy bytes from buf[ ] to kbuf[ ]

*cp++ = *buf++;

offset++; count++;        // inc offset, count;

remain —; nbytes–;      // dec remain, nbytes;

if (offset > fileSize) fileSize++; // inc file size

if (nbytes <= 0) break;

} // end while(remain)

(6).     write_block(dev, blk, kbuf);

} // end while(nbytes)

(7). set minode dirty =1; // mark minode dirty for iput() when fd is closed

return count;

The algorithm of write can be best explained in terms of Fig. 11.14. In the figure, the offset in the OFT is the current byte position in the file to write, which falls in a logical block. It uses offset to compute the logical block number, lbk, the start byte position and the number of bytes remaining in the logical block. It then converts the logical block to physical block through the file’s INODE.i_block array.Then it reads the physical block into a kbuf and transfer data from buf into kbuf, which may increase the file size if offset exceeds the current file size. After transferring data to kbuf, it writes the buffer back to disk. The write algorithm stops when all nbytes are written. The return value is nbtyes unless write fails.

The algorithm of converting logical block to physical block for write is similar to that of read, except for the following difference. During write, the intended data block may not exist. If a direct block does not exist, it must be allocated and recorded in the INODE. If the indirect block i_block [12] does not exist, it must be allocated and initialized to 0. If an indirect data block does not exist, it must be allocated and recorded in the indirect block. Simliarly, if the double indirect block i_block [13] dose not exist, it must be allocated and initialized to 0. If an entry in the double indirect block does not exist, it must be allocated and initialized to zero. If a double indirect block does not exist, it must be allocated and recorded in a double indirect block entry, etc.

Exercise 11.15 In the write algorithm, a file opened for WRITE mode may start with no data blocks at all. Complete the algorithm for coverting logical block numbers to physical blocks, including direct, indirect and double indirect blocks.

Exercise 11.16 For simplicity and clarity, step (5) of the write algorithm transfers data one byte at a time and updates the control variable on each byte transfer. Optimize the code by transferring a maximal sized chunk of data at a time.

Exercise 11.17 With open() and read(), implement the command cat filename, which displays the contents of a file.

Exercise 11.18 With open(), read() and write(), implement the command cp src dest, which copies a src file src to a dest file.

Exercise 11.19 Implement the command function mv file1 file2.

6. Opendir-Readdir

Unix considers everything as a file. Therefore, we should be able to open a DIR for read just like a regular file. From a technical point of view, there is no need for a separate set of opendir() and readdir() functions. However, different Unix systems may have different file systems. It may be difficult for users to interpret the contents of a DIR file. For this reason, POSIX specifies opendir and readdir operations, which are independent of file systems. Support for opendir is trivial; it’s the same open system call, but readdir() has the form.

struct dirent *ep = readdir(DIR *dp);

which returns a pointer to a dirent structure on each call. This can be implemented in user space as a library I/O function. Instead, we shall implement opendir() and readir() directly.

int opendir(pathaname)

{ return open(pathname, RD|O_DIR); }

where O_DIR is a bit pattern for opening the file as a DIR. In the open file table, the mode field contains the O_DIR bit.

int readdir(int fd, struct udir *udirp) // struct udir{DIR udir;};

{

// same as read() except:

use the current byte offset in OFT to read the next dir_entry;

copy the dir_entry into *udirp;

advance offset by dir_entry’s rec_len for reading next dir_entry;

}

7. Programming Project #2: Implementation of File System Level-2

The programming project #2 is to complete the implementation of File System Level-2 and demon­strate read, write, cat, cp and mv operations in the file system. The reader may consult earlier Chapters for cat and cp operations. For the mv operation, see Problem 12. Figure 11.15 show the sample outputs of running the complete Level 2 file system.

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *