File Operations in Unix/Linux

1. File Operation Levels

File operations consist of five levels, from low to high, as shown in the following hierarchy.

  •  Hardware Level: File operations at hardware level include

fdisk : divide a hard disk, USB or SDC drive into partitions.

mkfs : format disk partitions to make them ready for file systems.

fsck : check and repair file system.

defragmentation: compact files in a file system.

Most of these are system-oriented utility programs. An average user may never need them, but they are indispensable tools for creating and maintaining file systems.

  •  File System Functions in OS Kernel: Every operating system kernel provides support for basic file operations. The following lists some of these functions in a Unix-like system kernel, where the prefix k denotes kernel functions.

  • System Calls: User mode programs use system calls to access kernel functions. As an example, the following program reads the second 1024 bytes of a file.

#include <fcntl.h>

int main(int argc, char *argv[ ]) // run as a.out filename {

int fd, n; char buf[1024];

if ((fd = open(argv[1], O_RDONLY)) < 0) //if open() fails

exit(1);

lseek(fd, 1024, SEEK_SET);      // lseek to byte 1024

n = read(fd, buf, 1024);        // try to read 1024 bytes

close(fd);

}

The functions open(), read(), lseek() and close() are C library functions. Each of these library functions issues a system call, which causes the process to enter kernel mode to execute a corresponding kernel function, e.g. open goes to kopen(), read goes to kread(), etc. When the process finishes executing the kernel function, it returns to user mode with the desired results. Switch between user mode and kernel mode requires a lot of actions (and time). Data transfer between kernel and user spaces is therefore quite expensive. Although it is permissible to issue a read(fd, buf, 1) system call to read only one byte of data, it is not very wise to do so since that one byte would come with a terrific cost. Every time we have to enter kernel, we should do as much as we can to make the journey worthwhile. In the case of read/write files, the best way is to match what the kernel does. The kernel reads/writes files by block size, which ranges from 1KB to 8KB. For instance, in Linux, the default block size is 4KB for hard disks and 1KB for floppy disks. So each read/write system call should also try to transfer one block of data at a time.

  • Library I/O Functions: System calls allow the user to read/write chunks of data, which are just a sequence of bytes. They do not know, nor care, about the meaning of the data. A user often needs to read/write individual chars, lines or data structure records, etc. With only system calls, a user mode program must do these operations from/to a buffer area by itself. Most users would consider
    this too inconvenient. The C library provides a set of standard I/O functions for convenience, as well as for run-time efficiency. Library I/O functions include:

FILE mode I/O: fopen(),fread(); fwrite(),fseek(),fclose(),fflush()

char mode I/O: getc(), getchar() ugetc(); putc(),putchar()

line mode I/O: gets(), fgets(); puts(), fputs()

formatted I/O: scanf(),fscanf(),sscanf(); printf(),fprintf(),sprintf()

With the exceptions of sscanf()/sprintf(), which read/write memory locations, all other library I/O functions are built on top of system calls, i.e. they ultimately issue system calls for actual data transfer through the system kernel.

  •  User Commands: Instead of writing programs, users may use Unix/Linux commands to do file operations. Examples of user commands are

mkdir, rmdir, cd, pwd, ls, link, unlink, rm, cat, cp, mv, chmod, etc.

Each user command is in fact an executable program (except cd), which typically calls library I/O functions, which in turn issue system calls to invoke the corresponding kernel functions. The processing sequence of a user command is either

Command => Library I/O function => System call => Kernel Function

OR Command ======================== > System call => Kernel Function

  • Sh Scripts: Although much more convenient than system calls, commands must be entered manually, or in the case of using GUI, by dragging file icons and clicking the pointing device, which is tedious and time-consuming. Sh scripts are programs written in the sh programming language, which can be executed by the command interpreter sh. The sh language include all valid Unix/Linux commands. It also supports variables and control statements, such as if, do, for, while, case, etc. In practice, sh scripts are used extensively in Unix/Linux systems programming. In addition to sh, many other script languages, such as Perl and Tcl, are also in wide use.

2. File I/O Operations

Figure 7.1 shows the diagram of file I/O operations.

In Fig. 7.1, the upper part above the double line represents kernel space and the lower part represents user space of a process. The diagram shows the sequence of actions when a process read/ write a file stream. Control flows are identified by the labels (1) to (10), which are explained below.

———————–  User Mode Operations ——————————-

  • . A process in User mode executes

FILE *fp = fopen(“file”, “r”); or FILE *fp = fopen(“file”, “w”);

which opens a file stream for READ or WRITE.

  •  fopen() creates a FILE structure in user (heap) space containing a file descriptor, fd, a fbuf [BLKSIZE] and some control variables. It issues a fd = open(“file”, flags=READ or WRITE) syscall to kopen() in kernel, which constructs an OpenTable to represent an instance of the opened file. The OpenTable’s mptr points to the file’s INODE in memory. For non-special files, the INODE’s i_block array points to data blocks on the storage device. On success, fp points to the FILE structure, in which fd is the file descriptor returned by the open() syscall.
  • fread(ubuf, size, nitem, fp): READ nitem of size each to ubuf by

. copy data from FILE structure’s fbuf to ubuf, if enough, return;

. if fbuf has no more data, then execute (4a).

    •  issue read(fd, fbuf, BLKSIZE) system call to read a file block from kernel to fbuf, then copy data to ubuf until enough or file has no more data.
    • fwrite(ubuf, size, nitem, fp): copy data from ubuf to fbuf;

. if (fbuf has room): copy data to fbuf, return;

. if (fbuf is full) : issue write(fd, fbuf, BLKSIZE) system call to write a

block to kernel, then write to fbuf again.

Thus, fread()/fwrite() issue read()/write() syscalls to kernel, but they do so only if necessary and they transfer data in chunks of block size for better efficiency. Similarly, other Library I/O Functions, such as fgetc/fputc, fgets/fputs, fscanf/fprintf, etc. also operate on fbuf in the FILE structure, which is in user space.

========================= Kernel Mode Operations ========================

  •  File system functions in kernel:

Assume read(fd, fbuf[ ], BLKSIZE) system call of non-special file.

  •  In a read() system call, fd is an opened file descriptor, which is an index in the running PROC’s fd array, which points to an OpenTable representing the opened file.
  •  The OpenTable contains the files’s open mode, a pointer to the file’s INODE in memory and the current byte offset into the file for read/write. From the OpenTable’s offset,

. Compute logical block number, lbk;

. Convert logical block number to physical block number, blk, via INODE.i_block [ ] array.

  •  Minode contains the in-memory INODE of the file. The INODE.i_block[ ] array contains pointers to physical disk blocks. A file system may use the physical block numbers to read/ write data from/to the disk blocks directly, but these would incur too much physical disk I/O.
  •  In order to improve disk I/O efficiency, the OS kernel usually uses a set of I/O buffers as a cache memory to reduce the number of physical I/O. Disk I/O buffer management will be covered in Chap. 12.
    • For a read(fd, buf, BLKSIZE) system call, determine the needed (dev, blk) number, then consult the I/O buffer cache to

.get a buffer = (dev, blk);

.if (buffer’s data are invalid){

start_io on buffer;

wait for I/O completion;

}

.copy data from buffer to fbuf;

.release buffer to buffer cache;

    • For a write(fd, fbuf, BLKSIZE) system call, determine the needed (dev, blk) number, then consult the I/O buffer cache to

.get a buffer = (dev, blk);

.write data to the I/O buffer;

.mark buffer as dataValid and DIRTY (for delay-write to disk);

.release the buffer to buffer cache;

  •  Device I/O: Physical I/O on the I/O buffers ultimately go through the device driver, which consists of start_io() in the upper-half and disk interrupt handler in the lower-half of the driver.

—————–  Upper-half of disk driver ——————-

start_io(bp): //bp=a locked buffer in dev_list, opcode=R|W(ASYNC) {

enter bp into dev’s I/O_queue;

if (bp is FIRST in I/O_queue)

issue I/O command to device;

}

—————–  Lower-half of disk driver ————-

Device_Interrupt_Handler:

{

bp = dequeue(first buffer from dev.I/O_queue); if (bp was READ){

mark bp data VALID;

wakeup/unblock waiting process on bp;

}

else      // bp was for delay write

release bp into buffer cache;

if (dev.I/O_queue NOT empty)

issue I/O command for first buffer in dev.I/O_queue;

}

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *