Library I/O Functions vs. System Calls in Unix/Linux

1. Library I/O Functions

System calls are the basis of file operations, but they only support read/write of chunks of data. In practice, a user program may want to read/write files in logical units most suited to the application, e.g. as lines, chars, structured records, etc. which are not supported by system calls. Library I/O functions are a set of file operation functions, which provide both user convenience and overall efficiency (GNU I/O on streams 2017; GNU libc 2017; GNU Library Reference Manual 2017).

2. Library I/O Functions vs. System Calls

Almost every operating system that supports C programming also provides library functions for file I/O. In Unix/Linux, library I/O functions are built on top of system calls. In order to illustrate their intimate relationship, we first list a few of them for comparison.

System Call Functions: open(), read(), write(), lseek(), close();

Library I/O Functions: fopen(), fread(), fwrite(), fseek(), fclose();

From their strong similarities, the reader can almost guess that every library I/O function has its root in a corresponding system call function. This is indeed the case as fopen() relies on open(), fread() relies on read(), etc. The following C programs illustrate their similarities and differences.

The left-hand side shows a program that uses system calls. The right-hand side shows a similar program that uses library I/O functions. Both programs print the contents of a file to the display screen. The two programs look similar but there are fundamental differences between them.

Line 1: In the system call program, the file descriptor fd is an integer. In the library I/O program, fp is a FILE stream pointer.

Line 2: The system call open() opens a file for read and returns an integer file descriptor fd, or -1 if open() fails. The library I/O function fopen() returns a FILE structure pointer, or NULL if fopen() fails.

Line 3: The system call program uses a while loop to read/write the file contents. In each iteration, it issues a read() system call to read up to 4KB chars into a buf[ ]. Then it writes each char from buf[ ] to the file descriptor 1, which is the standard output of the process. As pointed out before, using system calls to write one byte at a time is grossly inefficient. In contrast, the library I/O program simply uses fgetc(fp) to get chars from the FILE stream, output them by putchar() until EOF.

Besides the slight differences in syntax and functions used, there are some fundamental differences between the two programs, which are explained in more detail below.

Line 2: fopen() issues an open() system call to get a file descriptor fd. If the open() call fails, it returns a NULL pointer. Otherwise, it allocates a FILE structure in the program’s heap area. The FILE structure contains an internal buffer char fbuf[BLKSIZE] and an integer fd field. It records the file descriptor returned by open() in the FILE structure, initializes fbuf[ ] as empty, and returns the address of the FILE structure as fp.

Line 3: fgetc(c, fp) tries to get a char from the file stream fp. If the fbuf[ ] in the FILE structure is empty, it issues a read(fd, fbuf, BLKSIZE) system call to read BLKSIZE bytes from the file, where BLKSIZE matches the file system block size. Then it returns a char from fbuf[ ]. Subsequently, fgetc() returns a char from fbuf[ ] as long as it still has data. Thus, the library I/O read functions issues read() syscalls only to refill the fbuf[ ] and they transfer data from the OS kernel to user space always in BLKSZISE. Similar remarks also apply to library I/O write functions.

Exercise 9.1: In the system call program of Example 9.1, writing each char by a system call is very inefficient. Replace the for loop by a single write() system call.

Example 9.2: Copy files: Again, we list two versions of the program side by side to illustrate their similarities and differences

Both programs copy a src file to a dest file. Since the system call program was already explained in Chapter 6, we shall only discuss the program that uses library I/O functions.

  • fopen() uses a string for Mode, where “r” for READ, “w” for WRITE. It returns a pointer to a FILE structure. fopen() first issues an open() system call to open the file to get a file descriptor number fd. If the open() system call fails, fopen() returns a NULL pointer. Otherwise, it allocates a FILE structure in the program’s heap area. Each FILE structure contains an internal buffer, fbuf
    [BLKSIZE], whose size usually matches the BLKSIZE of the file system. In addition, it also has pointers, counters and status variables for manipulating the fbuf[ ] contents and a fd field, which stores the file descriptor from open(). It initializes the FILE structure and returns fp which points at the FILE structure. It is important to note that the FILE structure is in the process’ User mode Image. This means that calls to library I/O functions are ordinary function calls, not system calls.
  • The programs terminates if any of the fopen() calls has failed. As mentioned above, fopen() returns a NULL pointer on failure, e.g. if the file can not be opened for the indicated mode.
  • Then it uses a while loop to copy the file contents. Each iteration of the while loop tries to read BLKSIZE bytes from the source file, and write n bytes to the target file, where n is the returned value from fread(). The general forms of fread() and fwrite() are

int n = fread(buffer, size, nitems, FILEptr);

int n = fwrite(buffer,size, nitems, FILEptr);

where size is the record size in bytes, nitems is the number of records to be read or written, and n is the actual number of records read or written. These functions are intended for read/write structured data objects. For example, assume that the buffer area contains data objects of structured records

struct record{…. }

We may use

n = fwrite(buffer, sizeof(struct record), nitem, FILEptr);

to write nitem records to a file. Similarly,

n = fread(buffer, sizeof(struct record), nitem, FILEptr);

reads nitem records from a file.

The above program tries to read/write BLKSIZE bytes at a time. So, it has size = 1 and nitems = BLKSIZE. As a matter of fact, any combination of size and nitems such that size*nitems = BLKSIZE would also work. However, using a size > 1 may cause problem on the last fread() because the file may have fewer than size bytes left. In that case, the returned n is zero even if it has read some data. To deal with the “tail” part of the source file, we may add the following lines of code after the while loop:

fseek(fp, total, SEEK_SET); // fseek to byte total

n = fread(buf, 1, size, fp); // read remaining bytes

fwrite(buf,1, n, gp); // write to dest file

total += n;

fseek() works exactly the same as lseek(). It positions the file’s R|W pointer to the byte location total. From there, we read the file as 1-byte objects. This will read all the remaining bytes, if any, and write them to the target file.

  • After the copying is complete, both files are closed by fclose(FILE *p).

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *