Summarizing the background information on C structures, link list processing and binary trees, we are ready to integrate these concepts and techniques into a programming project to solve a practical problem. The programming project is to design and implement a Unix/Linux file system tree simulator.
1. Unix/Linux File System Tree
The logical organization of a Unix file system is a general tree, as shown in the Fig. 2.28. Linux file systems are organized in the same way, so the discussions here apply to Linux files systems also. The file system tree is usually drawn upside down, with the root node / at the top. For simplicity, we shall assume that the file system contains only directories (DIRs) and regular FILEs, i.e. no special files, which are I/O devices. A DIR node may have a variable number of children nodes. Children nodes of the same parent are called siblings. In a Unix/Linux file system, each node is represented by a unique pathname of the form /a/b/c or a/b/c. A pathname is absolute if it begins with /, indicating that it starts from the root. Otherwise, it is relative to the Current Working Directory (CWD).
2. Implement General Tree by Binary Tree
A general tree can be implemented as a binary tree. For each node, let childPtr point to the oldest child, and siblingPtr point to the oldest sibling. For convenience, each node also has a parentPtr pointing to its parent node. For the root node, both parentPtr and siblingPtr point to itself. As an example, Fig. 2.31 shows a binary tree which is equivalent to the general tree of Fig. 2.28. In Fig. 2.31, thin lines represent childPtr pointers and thick lines represent siblingPtr. For clarify, NULL pointers are not shown.
3. Project Specification and Requirements
The project is to design and implement a C program to simulate the Unix/Linux file system tree. The program should work as follows.
- . Start with a / node, which is also the Current Working Directory (CWD).
- . Prompt the user for a command. Valid commands are:
mkdir, rmdir, cd, ls, pwd, creat, rm, save, reload, menu, quit
- . Execute the command, with appropriate tracing messages.
- . Repeat (2) until the “quit” command, which terminates the program.
4. Commands Specification
5. Program Organization
There are many ways to design and implement such a simulator program. The following outlines the suggested organization of the program.
(1) NODE type: Define a C structure for the NODE type containing
64 chars : name string of the node;
char : node type: ‘D’ for directory or ‘F’ for file
node pointers : *childPtr, *siblingPtr, *parentPtr;
(2) Global Variables:
NODE *root, *cwd; // root and CWD pointers
char line[128]; // user input command line
char command[16], pathname[64]; // command and pathname strings
char dname[64], bname[64]; // dirname and basename string holders
(Others as needed)
(3) The main() function: The main function of the program can be sketched as follows.
int main()
{
initialize(); //initialize root node of the file system tree while(1){
get user input line = [command pathname];
identify the command; execute the command;
break if command=”quit”;
}
(4) Get user inputs: Assume that each user input line contains a command with an optional pathname. The reader may use scanf() to read user inputs from stdin. A better technique is as follows
fgets(line, 128, stdin); // get at most 128 chars from stdin
line[strlen(line)-1] = 0; // kill \n at end of line
sscanf(line, ”%s %s”, command, pathname);
The sscanf() function extracts items from the line[ ] area by format, which can be either chars, strings or integers. It is therefore more flexible than scanf().
(5) Identify command: Since each command is a string, most readers would probably try to identify the command by using strcmp() in a series of if-else-if statements, as in
if (!strcmp(command, “mkdir)
mkdir(pathname);
else if (!strcmp(command, “rmdir”
rmdir(pathname);
else if . . .
This requires many lines of string comparisons. A better technique is to use a command table containing command strings (pointers), which ends with a NULL pointer.
char *cmd[] = {“mkdir”, “rmdir”, “ls”, “cd”, “pwd”, “creat”, “rm”,
“reload”, “save”, “menu”, “quit”, NULL};
For a given command, search the command table for the command string and return its index, as shown by the following findCmd() function.
int findCmd(char *command)
{
int i = 0; while(cmd[i]){
if (!strcmp(command, cmd[i]))
return i; // found command: return index i
i + +;
}
return -1; // not found: return -1
}
As an example, for the command = “creat”,
int index = findCmd(“creat”);
returns the index 5, which can be used to invoke a corresponding creat() function.
(6) The main() function: Assume that, for each command, we have written a corresponding action function, e.g. mkdir(), rmdir(), ls(), cd(), etc. The main() function can be refined as shown below.
int main()
{
int index;
char line[128], command[16], pathname[64];
initialize(); //initialize root node of the file system tree while(1){
printf(“input a commad line : “);
fgets(line,128,stdin);
line[strlen(line)-1] = 0;
sscanf(line, “%s %s”, command, pathname);
index = fidnCmd(command); switch(index){
case 0 : mkdir(pathname); break;
case 1 : rmdir(pathname); break;
case 2 : ls(pathname); break;
etc.
default: printf(“invalid command %s\n”, command);
}
}
}
The program uses the command index as different cases in a switch table. This works fine if the number of commands is small. For large number of commands, it would be preferable to use a table of function pointers. Assume that we have implemented the command functions
int mkdir(char *pathname){………. }
int rmdir(char *pathname){………. }
etc.
Define a table of function pointers containing function names in the same order as their indices, as in
0 1 2 3 4 5 6
int (*fptr[ ])(char *)={(int (*)())mkdir,rmdir,ls,cd,pwd,creat,rm,. .};
The linker will populate the table with the entry addresses of the functions. Given a command index, we may call the corresponding function directly, passing as parameter pathname to the function, as in
int r = fptr[index](pathname);
6. Command Algorithms
Each user command invokes a corresponding action function, which implements the command. The following describes the algorithms of the action functions
mkdir pathname
(1) . Break up pathname into dirname and basename, e.g.
ABSOLUTE: pathname=/a/b/c/d. Then dirname=/a/b/c, basename=d
RELATIVE: pathname= a/b/c/d. Then dirname=a/b/c, basename=d
(2). Search for the dirname node:
ASSOLUTE pathname: start from /
RELATIVE pathname: start from CWD.
if nonexist : error messages and return FAIL if exist but not DIR: error message and return FAIL
(3) . (dirname exists and is a DIR):
Search for basename in (under) the dirname node:
if already exists: error message and return FAIL;
ADD a new DIR node under dirname;
Return SUCCESS
rmdir pathname
(1). if pathname is absolute, start = /
else start = CWD, which points CWD node
(2). search for pathname node:
tokenize pathname into components strings; begin from start, search for each component; return ERROR if fails
(3). pathname exists:
check it’s a DIR type;
check DIR is empty; can’t rmdir if NOT empty;
(4). delete node from parent’s child list;
creat pathname
SAME AS mkdir except the node type is ‘F’
rm pathname
SAME AS rmdir except check it’s a file, no need to check for EMPTY.
cd pathname
- find pathname node;
- check it’s a DIR;
- change CWD to point at DIR
Is pathname
- . find pathname node
- . list all children nodes in the form of [TYPE NAME] [TYPE NAME] …
pwd
Start from CWD, implement pwd by recursion:
- Save the name (string) of the current node
- Follow parentPtr to the parent node until the root node;
- Print the names by adding / to each name string
Save Tree to a FILE The simulator program builds a file system tree in memory. When the program exits, all memory contents will be lost. Rather than building a new tree every time, we may save the current tree as a file, which can be used to restore the tree later. In order to save the current tree as a file, we need to open a file for write mode. The following code segment shows how to open a file stream for writing, and then write (text) lines to it.
FILE *fp = fopen(“myfile”, “w+”); // fopen a FILE stream for WRITE
fprintf(fp, “%c %s”, ‘D’, “string\n”); // print a line to file
fclose(fp); // close FILE stream when done
save(filename) This function save the absolute pathnames of the tree as (text) lines in a file opened for WRITE. Assume that the file system tree is as shown in Fig. 2.32.
The pathnames are generated by PRE-ORDER traversal of a binary tree:
Each print function prints the absolute pathname of a node, which is essentially the same as pwd(). Since the root node always exists, it can be omitted from the save file.
reload(filename) The function reconstructs a tree from a file. First, initialize the tree as empty, i.e with only the root node. Then read each line of the file. If the line contains “D pathname”, call
mkdir(pathname) to make a directory.
If the line contains “F pathname”, call
creat(pathname) to create a file.
These will reconstruct the tree saved earlier.
Quit Command save the current tree to a file. Then terminate the program execution.
On subsequent runs of the simulator program, the user may use the reload command to restore the tree saved earlier.
(8). Additional Programming HELP
(8).1. Tokenize pathname into components: Given a pathname, e.g. “/a/b/c/d”, the following code segment shows how to tokenize pathname into component strings.
int tokenize(char *pathname)
{
char *s;
s = strtok(path, “/”); // first call to strtok()
while(s){
printf(“%s “, s);
s = strtok(0, “/”); // call strtok() until it returns NULL
}
}
The strtok() function divides a string into substrings by the specified delimiter char “/”. The substrings reside in the original string, thus destroying the original string. In order to preserve the original pathname, the user must pass a copy of pathname to the strtok() function. In order to access the tokenized substrings, the user must ensure that they are accessible by one of the following schemes.
. The copied pathname is a global variable, e.g. char path[128], which contains the tokenized substrings.
. If the copied pathname path[ ] is local in a function, access the substrings only in the function
(8).2. dir_name and base_name: For the simulator program, it is also often necessary to decompose a pathname into dir_name, and base_name, where dir_name is the directory part of pathname and base_name is the last component of pathname. As an example, if pathname=”/a/b/c”, then dir_name=”/a/b” and base_name=”c”. This can be done by the library functions dirname() and basename(), both of which destroy the pathname also. The following code segments show how to decompose a pathname into dir_name and base_name.
#include <libgen.h>
char dname[64], bname[64]; // for decomposed dir_name and base_name
int dbname(char *pathname)
{
char temp[128]; // dirname(), basename() destroy original pathname
strcpy(temp, pathname);
strcpy(dname, dirname(temp));
strcpy(temp, pathname);
strcpy(bname, basename(temp));
}
7. Sample Solution
Sample solution of the programming project is available online at the book’s website for download. Source code of the project solution is available to instructors upon request from the author.
Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.