The programming project is to implement a simulation system to compare the performances of the Unix I/O buffer management algorithms and the PV algorithm using semaphores. Although the immediate goal is I/O buffer performance, the real goal is for the reader to understand I/O operations of file systems. The following describes the project.
1. System Organization
Figure 12.1 shows the organization of the simulation system.
(1). Box#1: User Interface: This is the user interface part of the simulation system. It prompts for input commands, show command executions, displays system status and execution results, etc. During development, the reader may input commands manually for tasks to execute. During final testing, tasks should have their own sequence of input commands. For example, each task may read an input file containing commands. After a task switch, task inputs will be read from a different file.
Fig. 12.1 System
Organization Diagram
2. Multitasking System
(2). Box#2: This is the CPU side of a multitasking system, which simulates the kernel mode of a uniprocessor (single CPU) file system. It is essentially the same multitasking system described in Chap. 4 for user-level threads, except for the following modifications. When the system starts, it creates and runs a main task with the lowest priority, but it creates ntask working tasks, all with priority 1, and enters them into the readyQueue. Then the main task executes the following code, which switches task to run working tasks from the readyQueue.
/******* Main Task Code *******/
// after system initialization
while(1){
while(task && readyQ == 0); // loop if no task runnable
if (readyQ) // if readyQueue nonempty
kswitch(); // switch to run a working task
else
end_task(); // all working tasks have ended
}
Since the main task has the lowest priority, it will run again if there are no runnable tasks or if all tasks have ended. In the latter case, the main task execute end_task(), in which it collects and displays the simulation results and terminates, thus ending the simulation run.
All working tasks execute the same body() function, in which each task reads commands from an input file to execute read or write disk block operations until end of the command file.
#define CMDLEN 10
int cmdfile[NTASK]; // opened command file descriptors
int task = ntask; // number of active tasks
int body()
{
int dev, blk;
char opcode, cmd[CMDLEN];
while(1){
if (read(cmdfile[running->pid], cmd, CMDLEN)==0){
running->status = DEAD; // task ends
task–; // dec task count by 1
tswtich();
}
sscanf(cmd, “%c%4d%5d”, &opcode, &dev, &blk);
if (opcode==’r’) // read (dev, blk)
readBlk(dev, blk);
if (opcode==’w’)
writeBlk(dev, blk); // write (dev, blk)
}
}
The commands of each task are generated randomly by the rand() function modulo a limit value. Each command is a triple of the form
[r xxx yyyy] or [w xxx yyyy], where xxx=dev, yyyy=blkno;
For a [r dev blk] command, the task calls
BUFFER *bread(dev, blk)
to get a buffer (pointer) containing valid data. After reading data from the buffer, it releases the buffer back to the buffer cache for reuse. In bread(), the task may wait for a buffer in getblk() or until the buffer data become valid. If so, it sleeps (in Unix algorithm) or becomes blocked (in PV algorithm), which switches task to run the next task from the readyQueue.
For a [w dev blk] command, the task tries to get a buffer for (dev, blk). If it has to wait for a buffer in getblk(), it will sleep or become blocked and switch task to run the next task. After writing data to a buffer, it marks the buffer as DIRTY for delayed write and releases the buffer to the buffer cache. Then it tries to execute the next command, etc.
3. Buffer Manager
The Buffer Manager implements buffer management functions, which consist of
BUFFER *bread(dev, blk)
dwrite(BUFFER *bp)
awrite(BUFFER *bp)
BUFFER *getblk(dev, blk)
brelse(BUFFER *bp)
These functions may invoke the Disk Driver to issue physical disk I/O.
4. Disk Driver
The Disk Driver consists of two parts:
(1). start_io(): maintain device I/O queues and issue I/O operation for buffers in I/O queue.
(2). Interrupt Handler: At the end of each I/O operation the Disk Controller interrupts the CPU.
Upon receiving an interrupt, the Interrupt Handler first reads the Interrupt Status from IntStatus, which contains
[dev | R|W | StatusCode]
|<– e.g.10 chars—– >|
where dev identifies the device. The Interrupt Handler removes the buffer from the head of the device I/O queue and processes the interrupt for the buffer. For a disk READ interrupt, the Interrupt Handler first moves the block of data from DataIn into the buffer’s data area. Then it marks the buffer data valid and wakes up or unblocks the task that’s waiting on the buffer for valid data. The awakened process will read data from the buffer and release the buffer back to the buffer cache. For a disk WRITE interrupt, there is no process waiting on the buffer for I/O completion. Such an ASYNC write buffer is handled by the Interrupt handler. It turns off the buffer’s ASYNC write flag and releases the buffer to the buffer cache. Then, it examines the device I/O queue. If the I/O queue is nonempty, it issues I/O commands for the first buffer in the I/O queue. Before issuing a disk WRITE operation, it copies the
buffer’s data to DataOut for the Disk Controller to get. Finally, it writes an ACK to IntAck to acknowledge the current interrupt, allowing the Disk Controller to continue When the Interrupt Handler finishes processing the current interrupt, normal task execution resumes from the last interrupted point.
5. Disk Controller
(3). Box#3: This is the Disk Controller, which is a child process of the main process. As such, it operates independently with the CPU side except for the communication channels between them, which are shown as the Interface between CPU and Disk Controller. The communication channels are implemented by pipes between the main process and the child process.
Commands: I/O commands from CPU to Disk Controller
DataOut: data out from CPU to Disk Controller in write operations
DataIn: data in from Disk Controller to CPU in read operations
IntStatus: interrupt status from Disk Controller to CPU
IntAck: interrupt ACK from CPU to Disk Controller
6. Disk Interrupts
Interrupts from Disk Controller to CPU are implemented by the SIGUSR1 (#10) signal. At the end of each I/O operation, the Disk Controller issues a kill(ppid, SIGUSR1) system call to send a SIGUSR1 signal to the parent process, which acts as an interrupt to the virtual CPU. As usual, the virtual CPU may mask out/in disk interrupts (signals) in Critical Regions. To prevent race condition, the Disk Controller must receive an interrupt ACK from the CPU before it can interrupt again.
7. Virtual Disks
(4). Box#4: These are virtual disks, which are simulated by Linux files. Using the Linux system calls lseek(), read() and write(), we may support any block I/O operations on the virtual disks. For simplicity, the disk block size is set to 16 bytes. Since the data contents do not matter, they can be set to a fixed sequence of 16 chars.
8. Project Requirements
Implement two versions of buffer management algorithms:
Unix algorithm using sleep/wakeup
New algorithm using semaphores.
Compare their performances in terms of buffer cache hit-ratio, numbers of actual I/O operations, task switches, retries and total run time, etc. under different task/buffer ratios.
9. Sample Base Code
For the sake of brevity, we only show the main.c file of the CPU side and the controller.c file of the Disk Controller. The functions of other included files are only explained briefly. Also, not shown are the code segments used to collect performance statistics, but they can be added easily to the base code.
/************ CPU side: main.c file **********/
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <signal.h>
#include <string.h>
int main(int argc, char *argv[ ])
{
if(argc != 6){
printf(“usage: a.out ntask nbuffer ndevice nblock nsample\n”);
exit(0);
}
task = ntask = atoi(argv[1]);
nbuffer= atoi(argv[2]);
ndevice= atoi(argv[3]);
nblock = atoi(argv[4]);
sample = atoi(argv[5]);
if(ntask>NTASK||nbuffer>NBUFFER||ndevice>NDEVICE||nblock>NBLOCK)
exit(1) ;
fp = fopen(“record”, “a”); // record task activities
printf(“Welcome to UNIX Buffer Management System\n”);
printf(“Initialize proc[] & buff[] & signal\n”);
initproc(); // build tasks to compete for buffers
initbuf(); // build buffer link lists
initsig(); // set sigmask & init itimer for timing
printf(“Install signal handler\n”);
signal(SIGUSR1, catcher); // install catcher for SIGUSR1 signal
printf(“Open pipes for communication\n”);
open_pipe(); // create communication pipes
printf(“fork(): parent as CPU child as Disk Controller\n”); switch(fork()){
case -1: perror(“fork call”); exit(2); break;
case 0: // child process as disk controller
close_pipe(0); // configure pipes at child side
sprintf(buffer,”%d %d %d %d %d”,\
CMDIN, WRITEIN, READOUT, STATOUT, ACKIN);
execl(“controller”,”controller”,buffer,argv[3],argv[4],0);
break;
default: // parent process as virtual CPU
close_pipe(1); // configure pipes at parent side
printf(“MAIN: check device status\n”);
check_dev(); // wait for child has created devices
printf(“Generate command file for each task\n”);
gencmd(); // generate command files of tasks
gettimeofday(&tv0,0); // start record run time
while(1){
INTON // enable interrupts
while(tasks && readyQ == 0); // wait for ready tasks if(readyQ)
kswitch();
else
end_task();// end processing
}
}
}
/******* task body function **********/
int body(int pid)
{
char rw; // opcode in command: read or write
int dev, blk; // dev, blk in command
int n, count=0; // task commands count
while(1){
n = read(cmdfile[running->pid], cmd, CMLEN);
if (n==0){ // if end of cmd file
running->status = DEAD; // mark task dead
task–; // number of active task -1
kswitch(); // switch to next task
}
count++; // commands count++
sscanf(cmd, “%c%5d%5d”,&rw,&dev,&blk);
if (rw == ‘r’)
readBlk(dev, blk); // READ (dev, blk)
else
writeBlk(dev, blk); // WRITE (dev, blk)
}
}
int kswitch()
{
INTOFF // disable interrupts
tswitch(); // switch task in assembly code
INTON // enable interrupts
}
The main.c file includes the following files, which are only explained briefly.
(1). main.h: This file defines system constants, PROC structure type and symbolic constants. It also defines the macros
#define INTON sigprocmask(SIG_UNBLOCK, &sigmask, 0);
#define INTOFF sigprocmask(SIG_BLOCK, &sigmask, 0);
for masking in/out interrupts, where sigmask is a 32-bit vector with SIGUSR1 bit (10)=1.
(2). proc.c: This file initializes the multitasking system. It creates ntask tasks to do disk I/O operations through the buffer cache.
(3). buf.c: This file defines semaphore, buffer and device structures. It initializes the buffer and device data structures, and it contains code for buffer link lists operations.
(4). utilyti.c: This file contains functions common to both the CPU side and the Disk Controller side.
Examples are creation and configuration of pipes, generation of input commands files and data files for simulated disks, etc. It also implements the end_task() function which collects and display simulation statistics when all tasks have ended.
(5). manager.c: This file implements the actual buffer management functions, such as readBlk(dev,blk), writeBlk(dev, blk), getblk() and brelse() which are the core functions of buffer management.
In the simulator, all the files are identical except the manager.c file, which has two versions; one version implements the Unix algorithm and the other one implements the PV algorithm using semaphores.
The following shows the Disk Controller source file, which is executed by a child process of the main process. The information needed by the child process are passed in via command-line parameters in argv[]. After initialization, the Disk Controller process executes an infinite loop until it gets the ‘END’ command from the CPU, indicating all tasks on the CPU side have ended. Then it closes all the virtual disk files and terminates.
/******* Pseudo Code of Disk Controller *******/
initialization;
while(1){
read command from CMDIN
if command = “END”, break;
decode and execute the command
interrupt CPU;
read ACK from ACKIN
}
close all virtual disk files and exit
/******** Disk Controller: controller.c file: *******/
#include <stdio.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#define DEVICE 128 // max number of devices
#define BLOCK 16 // disk BLOCK size
#define CMLEN 10 // command record length
int main(int argc, char *argv[ ])
{
char opcode, cmd[CMLEN], buf[BLOCK], status[CMLEN];
int CMDIN,READIN,WRITEOUT,STATOUT,ACKIN; // PIPEs
int i, ndevice, dev, blk;
int CPU = getppid(); // ppid for sending signal
int fd[DEVICE]; // files simulate n devices
char datafile[8]; // save file name
// get command line parameters
sscanf(argv[1], “%d %d %d %d %d”,\
&CMDIN, &READIN, &WRITEOUT, &STATOUT, &ACKIN);
ndevice = atoi(argv[2]);
printf(“Controller: ndevice=%d\n”, ndevice);
gendata(ndevice); // generate data files as virtual disks
for (i=0; i<ndevice i++){ // open files to simulate devices
sprintf(datafile,”data%d”, i%128);
if(( fd[i] = open(datafile, O_RDWR)) < 0)
sprintf(status, “%dfail”,i) ;
else
sprintf(status, ” %d ok”,i) ;
write(STATOUT, status, CMLEN); // send device status to CPU
}
// Disk Controller Processing loop
while(1){
read(CMDIN, cmd, CMLEN); // read command from pipe
if (!strcmp(cmd, “END”)) // end command from CPU, break
break;
sscanf(cmd,”%c%4d%5d”, &opcode, &dev, &blk);
if (opcoe == ‘r’){
read(fd[dev], buf, BLOCK); // read data of (dev, blk)
write(WRITEOUT, buf, BLOCK); // write data to WRITEOUT pipe
}
else if (opcode == ‘w’){ // write cmd
read(READIN, buf, BLOCK); // read data from READIN pipe
write(fd[dev], buf, BLOCK); // write data to (dev, blk)
}
else
strcpy(status, “IOerr”);
write(STATOUT, cmd, CMLEN); // write INTstatus to STATOUT
kill(CPU, SIGUSR1); // interrupt CPU by SIGUSR1
read(ACKIN, buf, CMLEN); // read ACK from CPU
}
// Disk controller end processing
for (i=0; i < ndevice;i++)
close(fd[i]);
}
int gendata(int ndev) // generate data file as virtual disks
{
char cmd[2048], name[8];
int fd[128], i, j;
printf(“generate data files\n”);
for(i=0; i<ndev; i++){
sprintf(name,”data%d”, i);
fd[i]=creat(name, 0644); // create data files
for (j=0; j<2048; j++){ // 2 K bytes of ‘0’-‘9’ chars
cmd[j]= i%10 +’0′;
}
write(fd[i],cmd,2048);
close(fd[i]);
}
}
The simulator system is compiled and linked into two binary executables, a.out as the CPU side and controller as the Disk Controller side.
cc main.c s.s # a.out as the CPU side
cc -o controller controller.c # controller as Disk Controller
Then run the simulator system as
a.out ntask nbuffer ndevice nblock nsample
In order to run the simulator with different input parameters, a sh script can be used, as in
# run: sh script to run the simulator systems
#! /bin/bash
for TASK in 4 8 16 64; do
For BUF in 4 16 64 128; do
a.out $TASK $BUF 16 64 1000 # ndev=16,nblk=64,sample=1000
done
echo————————
done
10. Sample Solutions
Figure 12.2 shows the outputs of the simulator using the Unix algorithm with
4 tasks, 4 buffers, 4 devices, 16 blocks per device and 100 input samples per task.
In the figure, the performance metrics are shown as
run-time = total running time
rIO = number of actual read operations
wIO = number of actual write operations
intr = number of disk interrupts
hits = number of buffer cache hits
swtch = number of task switches
dirty = number of delay write buffers
retry = number of task retries in getblk()
The last row of the outputs shows the percentages of the various metrics. The most important performance metrics are the total run time of the simulation and the buffer cache hit ratio. Other metrics may be used as guidance to justify the performance of the algorithms.
Figure 12.3 shows the outputs of the simulator using the PV algorithm with the same input parameters as in the Unix algorithm. It shows that the run-time (24 msec) is shorter than the Unix algorithm (29 msec) and the buffer cache hit ratio (6%) is higher than the Unix algorithm (2%). It also shows that, whereas the Unix algorithm has a large number of task retries, there are no task retries in the PV algorithm, which may account for the differences in performance between the two algorithms.
Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.