Programming Project: Comparison of I/O Buffer Management Algorithms in Unix/Linux

The programming project is to implement a simulation system to compare the performances of the Unix I/O buffer management algorithms and the PV algorithm using semaphores. Although the immediate goal is I/O buffer performance, the real goal is for the reader to understand I/O operations of file systems. The following describes the project.

1. System Organization

Figure 12.1 shows the organization of the simulation system.

(1). Box#1: User Interface: This is the user interface part of the simulation system. It prompts for input commands, show command executions, displays system status and execution results, etc. During development, the reader may input commands manually for tasks to execute. During final testing, tasks should have their own sequence of input commands. For example, each task may read an input file containing commands. After a task switch, task inputs will be read from a different file.

Fig. 12.1 System
Organization Diagram

2. Multitasking System

(2). Box#2: This is the CPU side of a multitasking system, which simulates the kernel mode of a uniprocessor (single CPU) file system. It is essentially the same multitasking system described in Chap. 4 for user-level threads, except for the following modifications. When the system starts, it creates and runs a main task with the lowest priority, but it creates ntask working tasks, all with priority 1, and enters them into the readyQueue. Then the main task executes the following code, which switches task to run working tasks from the readyQueue.

/******* Main Task Code *******/

// after system initialization

while(1){

while(task && readyQ == 0);   // loop if no task runnable

if (readyQ)                  // if readyQueue nonempty

kswitch();                // switch to run a working task

else

end_task();               // all working tasks have ended

}

Since the main task has the lowest priority, it will run again if there are no runnable tasks or if all tasks have ended. In the latter case, the main task execute end_task(), in which it collects and displays the simulation results and terminates, thus ending the simulation run.

All working tasks execute the same body() function, in which each task reads commands from an input file to execute read or write disk block operations until end of the command file.

#define CMDLEN 10

int cmdfile[NTASK]; // opened command file descriptors

int task = ntask; // number of active tasks

int body()

{

int dev, blk;

char opcode, cmd[CMDLEN];

while(1){

if (read(cmdfile[running->pid], cmd, CMDLEN)==0){

running->status = DEAD; // task ends

task–;        // dec task count by 1

tswtich();

}

sscanf(cmd, “%c%4d%5d”, &opcode, &dev, &blk);

if (opcode==’r’)      // read (dev, blk)

readBlk(dev, blk);

if (opcode==’w’)

writeBlk(dev, blk); // write (dev, blk)

}

}

The commands of each task are generated randomly by the rand() function modulo a limit value. Each command is a triple of the form

[r xxx yyyy] or [w xxx yyyy], where xxx=dev, yyyy=blkno;

For a [r dev blk] command, the task calls

BUFFER *bread(dev, blk)

to get a buffer (pointer) containing valid data. After reading data from the buffer, it releases the buffer back to the buffer cache for reuse. In bread(), the task may wait for a buffer in getblk() or until the buffer data become valid. If so, it sleeps (in Unix algorithm) or becomes blocked (in PV algorithm), which switches task to run the next task from the readyQueue.

For a [w dev blk] command, the task tries to get a buffer for (dev, blk). If it has to wait for a buffer in getblk(), it will sleep or become blocked and switch task to run the next task. After writing data to a buffer, it marks the buffer as DIRTY for delayed write and releases the buffer to the buffer cache. Then it tries to execute the next command, etc.

3. Buffer Manager

The Buffer Manager implements buffer management functions, which consist of

BUFFER *bread(dev, blk)

dwrite(BUFFER *bp)

awrite(BUFFER *bp)

BUFFER *getblk(dev, blk)

brelse(BUFFER *bp)

These functions may invoke the Disk Driver to issue physical disk I/O.

4. Disk Driver

The Disk Driver consists of two parts:

(1). start_io(): maintain device I/O queues and issue I/O operation for buffers in I/O queue.

(2). Interrupt Handler: At the end of each I/O operation the Disk Controller interrupts the CPU.

Upon receiving an interrupt, the Interrupt Handler first reads the Interrupt Status from IntStatus, which contains

[dev | R|W | StatusCode]

|<– e.g.10 chars—– >|

where dev identifies the device. The Interrupt Handler removes the buffer from the head of the device I/O queue and processes the interrupt for the buffer. For a disk READ interrupt, the Interrupt Handler first moves the block of data from DataIn into the buffer’s data area. Then it marks the buffer data valid and wakes up or unblocks the task that’s waiting on the buffer for valid data. The awakened process will read data from the buffer and release the buffer back to the buffer cache. For a disk WRITE interrupt, there is no process waiting on the buffer for I/O completion. Such an ASYNC write buffer is handled by the Interrupt handler. It turns off the buffer’s ASYNC write flag and releases the buffer to the buffer cache. Then, it examines the device I/O queue. If the I/O queue is nonempty, it issues I/O commands for the first buffer in the I/O queue. Before issuing a disk WRITE operation, it copies the
buffer’s data to DataOut for the Disk Controller to get. Finally, it writes an ACK to IntAck to acknowledge the current interrupt, allowing the Disk Controller to continue When the Interrupt Handler finishes processing the current interrupt, normal task execution resumes from the last interrupted point.

5. Disk Controller

(3). Box#3: This is the Disk Controller, which is a child process of the main process. As such, it operates independently with the CPU side except for the communication channels between them, which are shown as the Interface between CPU and Disk Controller. The communication channels are implemented by pipes between the main process and the child process.

Commands: I/O commands from CPU to Disk Controller
DataOut: data out from CPU to Disk Controller in write operations
DataIn: data in from Disk Controller to CPU in read operations
IntStatus: interrupt status from Disk Controller to CPU
IntAck: interrupt ACK from CPU to Disk Controller

6. Disk Interrupts

Interrupts from Disk Controller to CPU are implemented by the SIGUSR1 (#10) signal. At the end of each I/O operation, the Disk Controller issues a kill(ppid, SIGUSR1) system call to send a SIGUSR1 signal to the parent process, which acts as an interrupt to the virtual CPU. As usual, the virtual CPU may mask out/in disk interrupts (signals) in Critical Regions. To prevent race condition, the Disk Controller must receive an interrupt ACK from the CPU before it can interrupt again.

7. Virtual Disks

(4). Box#4: These are virtual disks, which are simulated by Linux files. Using the Linux system calls lseek(), read() and write(), we may support any block I/O operations on the virtual disks. For simplicity, the disk block size is set to 16 bytes. Since the data contents do not matter, they can be set to a fixed sequence of 16 chars.

8. Project Requirements

Implement two versions of buffer management algorithms:

Unix algorithm using sleep/wakeup

New algorithm using semaphores.

Compare their performances in terms of buffer cache hit-ratio, numbers of actual I/O operations, task switches, retries and total run time, etc. under different task/buffer ratios.

9. Sample Base Code

For the sake of brevity, we only show the main.c file of the CPU side and the controller.c file of the Disk Controller. The functions of other included files are only explained briefly. Also, not shown are the code segments used to collect performance statistics, but they can be added easily to the base code.

/************ CPU side: main.c file **********/
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <signal.h>
#include <string.h>

int main(int argc, char *argv[ ])

{

if(argc != 6){

printf(“usage: a.out ntask nbuffer ndevice nblock nsample\n”);

exit(0);

}

task = ntask = atoi(argv[1]);

nbuffer= atoi(argv[2]);

ndevice= atoi(argv[3]);

nblock = atoi(argv[4]);

sample = atoi(argv[5]);

if(ntask>NTASK||nbuffer>NBUFFER||ndevice>NDEVICE||nblock>NBLOCK)

exit(1) ;

fp = fopen(“record”, “a”); // record task activities

printf(“Welcome to UNIX Buffer Management System\n”);

printf(“Initialize proc[] & buff[] & signal\n”);

initproc();            // build tasks to compete for buffers

initbuf();             // build buffer link lists

initsig();             // set sigmask & init itimer for timing

printf(“Install signal handler\n”);

signal(SIGUSR1, catcher); // install catcher for SIGUSR1 signal

printf(“Open pipes for communication\n”);

open_pipe();  // create communication pipes

printf(“fork(): parent as CPU child as Disk Controller\n”); switch(fork()){

case -1: perror(“fork call”); exit(2); break;

case 0: // child process as disk controller

close_pipe(0); // configure pipes at child side

sprintf(buffer,”%d %d %d %d %d”,\

CMDIN, WRITEIN, READOUT, STATOUT, ACKIN);

execl(“controller”,”controller”,buffer,argv[3],argv[4],0);

break;

default: // parent process as virtual CPU

close_pipe(1); // configure pipes at parent side

printf(“MAIN: check device status\n”);

check_dev(); // wait for child has created devices

printf(“Generate command file for each task\n”);

gencmd();    // generate command files of tasks

gettimeofday(&tv0,0); // start record run time

while(1){

INTON           // enable interrupts

while(tasks && readyQ == 0); // wait for ready tasks if(readyQ)

kswitch();

else

end_task();// end processing

}

}

}

/******* task body function **********/
int body(int pid)
{

char rw;  // opcode in command: read or write
int dev, blk;  // dev, blk in command
int n, count=0;  // task commands count

while(1){

n = read(cmdfile[running->pid], cmd, CMLEN);

if (n==0){      // if end of cmd file

running->status = DEAD; // mark task dead

task–;  // number of active task -1

kswitch();                //  switch to next task

}

count++;                     //  commands count++

sscanf(cmd, “%c%5d%5d”,&rw,&dev,&blk);

if (rw == ‘r’)

readBlk(dev, blk);         //  READ (dev, blk)

else

writeBlk(dev,  blk);       //  WRITE (dev, blk)

}

}

int kswitch()

{

INTOFF       //  disable interrupts

tswitch(); // switch task in assembly code

INTON //     enable interrupts

}

The main.c file includes the following files, which are only explained briefly.

(1). main.h: This file defines system constants, PROC structure type and symbolic constants. It also defines the macros

#define INTON sigprocmask(SIG_UNBLOCK, &sigmask, 0);

#define INTOFF sigprocmask(SIG_BLOCK,     &sigmask, 0);

for masking in/out interrupts, where sigmask is a 32-bit vector with SIGUSR1 bit (10)=1.

(2). proc.c: This file initializes the multitasking system. It creates ntask tasks to do disk I/O operations through the buffer cache.

(3). buf.c: This file defines semaphore, buffer and device structures. It initializes the buffer and device data structures, and it contains code for buffer link lists operations.

(4). utilyti.c: This file contains functions common to both the CPU side and the Disk Controller side.

Examples are creation and configuration of pipes, generation of input commands files and data files for simulated disks, etc. It also implements the end_task() function which collects and display simulation statistics when all tasks have ended.

(5). manager.c: This file implements the actual buffer management functions, such as readBlk(dev,blk), writeBlk(dev, blk), getblk() and brelse() which are the core functions of buffer management.

In the simulator, all the files are identical except the manager.c file, which has two versions; one version implements the Unix algorithm and the other one implements the PV algorithm using semaphores.

The following shows the Disk Controller source file, which is executed by a child process of the main process. The information needed by the child process are passed in via command-line parameters in argv[]. After initialization, the Disk Controller process executes an infinite loop until it gets the ‘END’ command from the CPU, indicating all tasks on the CPU side have ended. Then it closes all the virtual disk files and terminates.

/******* Pseudo Code of Disk Controller *******/

initialization;

while(1){

read command from CMDIN

if command = “END”, break;

decode and execute the command

interrupt CPU;

read ACK from ACKIN

}

close all virtual disk files and exit

/******** Disk Controller: controller.c file: *******/

#include <stdio.h>

#include <fcntl.h>

#include <signal.h>

#include <string.h>

#define DEVICE   128      // max number of devices

#define BLOCK     16      // disk BLOCK size

#define CMLEN     10      // command record length

int main(int argc, char *argv[ ])

{

char opcode, cmd[CMLEN], buf[BLOCK], status[CMLEN];

int CMDIN,READIN,WRITEOUT,STATOUT,ACKIN; // PIPEs

int i, ndevice, dev, blk;

int CPU = getppid(); // ppid for sending signal

int fd[DEVICE]; // files simulate n devices

char datafile[8];      // save file name

// get command line parameters

sscanf(argv[1], “%d %d %d %d %d”,\

&CMDIN, &READIN, &WRITEOUT, &STATOUT, &ACKIN);

ndevice = atoi(argv[2]);

printf(“Controller: ndevice=%d\n”, ndevice);

gendata(ndevice);      // generate data files as virtual disks

for (i=0; i<ndevice i++){   // open files to simulate devices

sprintf(datafile,”data%d”, i%128);

if(( fd[i] = open(datafile, O_RDWR)) < 0)

sprintf(status, “%dfail”,i) ;

else

sprintf(status, ” %d ok”,i) ;

write(STATOUT, status, CMLEN); // send device status to CPU

}

// Disk Controller Processing loop

while(1){

read(CMDIN, cmd, CMLEN);          // read command from pipe

if (!strcmp(cmd, “END”))          // end command  from CPU, break

break;

sscanf(cmd,”%c%4d%5d”, &opcode, &dev, &blk);

if (opcoe == ‘r’){

read(fd[dev], buf, BLOCK); // read data of (dev, blk)

write(WRITEOUT, buf, BLOCK); // write data to WRITEOUT pipe

}

else if (opcode == ‘w’){          // write cmd

read(READIN, buf, BLOCK); // read data from READIN pipe

write(fd[dev], buf, BLOCK); // write data to (dev, blk)

}

else

strcpy(status, “IOerr”);

write(STATOUT, cmd, CMLEN);    // write INTstatus to STATOUT

kill(CPU, SIGUSR1);              // interrupt CPU by SIGUSR1

read(ACKIN, buf, CMLEN);          // read ACK from CPU

}

// Disk controller end processing

for (i=0; i < ndevice;i++)

close(fd[i]);

}

int gendata(int ndev) // generate data file as virtual disks

{

char cmd[2048], name[8];

int fd[128], i, j;

printf(“generate data files\n”);

for(i=0; i<ndev; i++){

sprintf(name,”data%d”, i);

fd[i]=creat(name, 0644);   // create data files

for (j=0; j<2048; j++){    // 2 K bytes of ‘0’-‘9’ chars

cmd[j]= i%10 +’0′;

}

write(fd[i],cmd,2048);

close(fd[i]);

}

}

The simulator system is compiled and linked into two binary executables, a.out as the CPU side and controller as the Disk Controller side.

cc main.c s.s                   # a.out as the CPU side

cc -o controller controller.c   # controller as Disk Controller

Then run the simulator system as

a.out ntask nbuffer ndevice nblock nsample

In order to run the simulator with different input parameters, a sh script can be used, as in

# run: sh script to run the simulator systems

#! /bin/bash

for TASK in 4 8 16 64; do

For BUF in 4 16 64 128; do

a.out $TASK $BUF 16 64 1000    # ndev=16,nblk=64,sample=1000

done

echo————————

done

10. Sample Solutions

Figure 12.2 shows the outputs of the simulator using the Unix algorithm with

4 tasks, 4 buffers, 4 devices, 16 blocks per device and 100 input samples per task.

In the figure, the performance metrics are shown as

run-time = total running time

rIO = number of actual read operations

wIO = number of actual write operations

intr = number of disk interrupts

hits = number of buffer cache hits

swtch = number of task switches

dirty = number of delay write buffers

retry = number of task retries in getblk()

The last row of the outputs shows the percentages of the various metrics. The most important performance metrics are the total run time of the simulation and the buffer cache hit ratio. Other metrics may be used as guidance to justify the performance of the algorithms.

Figure 12.3 shows the outputs of the simulator using the PV algorithm with the same input parameters as in the Unix algorithm. It shows that the run-time (24 msec) is shorter than the Unix algorithm (29 msec) and the buffer cache hit ratio (6%) is higher than the Unix algorithm (2%). It also shows that, whereas the Unix algorithm has a large number of task retries, there are no task retries in the PV algorithm, which may account for the differences in performance between the two algorithms.

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *