Process Management in Unix/Linux: System Calls for Process Management

In this section, we shall discuss the following system calls in Linux, which are related to process management (Stallings 2011; Tanenbaum and Woodhull 2006).

fork(), wait(), exec(), exit()

Each is a library function which issues an actual syscall

int syscall(int a, int b, int c, int d);

where the first parameter a is the syscall number and b, c, d are parameters to the corresponding kernel function. In Intel x86 based Linux, syscall is implemented by the assembly instruction INT 0x80, which causes the CPU to enter the Linux kernel to execute the kernel function identified by the syscall number a.

1. fork()

Usage:   int pid = fork();

fork() creates a child process and returns the child’s pid or -1 if fork() fails. In Linux, each user can only have a finite number of processes at the same time. User resource limits are set in the /etc/security/ limits.conf file. A user may run the command ulimit -a to see the various resource limit values. Figure 3.5 shows the actions of fork(), which are explained below.

  • The left-hand side of Fig. 5 shows the images of a process Pi, which issues the syscall pid=fork () in Umode.
  • Pi goes to Kmode to execute the corresponding kfork() function in kernel, in which it creates a child process PROCj with its own Kmode stack and Umode image, as shown in the right-hand side of the figure. The Umode image of Pj is an IDENTICAL copy of Pi’s Umode image. Therefore, Pj’s code section also has the statement

pid=fork();

Furthermore, kfork() lets the child inherit all opened files of the parent. Thus, both the parent and child can get inputs from stdin and display to the same terminal of stdout and stderr.

  •  After creating a child, Pi returns to the statement

pid = fork();      // parent return child PID

in its own Umode image with the child’s pid = j. It returns -1 if fork() failed, in which case no child is created.

  • When the child process Pj runs, it exits Kmode and returns to the same statement

pid = fork();      // child returns 0

in its own Umode image with a 0 return value.

After a successful fork(), both the parent and child execute their own Umode images, which are identical immediately after fork(). From the program code point of view, the only way to tell which process is executing now is by the returned pid. Thus the program code should be written as

int pid = fork();

if (pid){

// parent executes this part

}

else{

// child executes this part

}

We illustrate fork() by an example.

Example 3.1: The example program C3.1 demonstrates fork().

/********************* C3.1.c: fork() ************************/

#include <stdio.h> int main()

{

int pid;

printf(“THIS IS %d MY PARENT=%d\n”, getpid(), getppid());

(1). pid = fork(); // fork syscall; parent returns child pid,

if (pid){ // PARENT EXECUTES THIS PART

(2).    printf(“THIS IS PROCESS %d CHILD PID=d\n”, getpid(), pid);

}

else{           // child executes this part

(3).    printf(“this is process %d parent=%d\n”, getpid(), getppid());

     }

}

The program is run by a child process of sh. In the example code, getpid() and getppid() are system calls. getpid() returns the calling process PID and getppid() returns the parent process PID.

Line (1) forks a child process.

Line 2 prints (in uppercase for easier identification) the PID of the executing process and the newly forked child PID.

Line (3) prints (in lowercase) the child process PID, which should be the same child PID in Line (2), and its parent PID, which should be the same process PID in Line (2).

2. Process Execution Order

After fork(), the child process competes with the parent and all other processes in the system for CPU time to run. Which process will run next depends on their scheduling priorities, which change dynamically. The following example illustrates the possible different execution orders of processes.

Example 3.2: The example program C3.2 demonstrates process execution orders.

/***************** C3.2.c file ********************/

#include <stdio.h>

int main()

{

int pid=fork(); // fork a child

if (pid){       // PARENT

    printf(“PARENT %d CHILD=%d\n”, getpid(), pid);

(1).      // sleep(1); // sleep 1 second ==> let child run next

     printf(“PARENT %d EXIT\n”, getpid());

}

else{            // child

printf(“child %d start my parent=%d\n”, getpid(), getppid());

(2).      // sleep(2); // sleep 2 seconds => let parent die first

printf(“child %d exit my parent=%d\n”, getpid(), getppid());

}

}

In the Example 3.2 code, the parent process forks a child. After fork(), which process will run next depends on their priority. The child may run and terminate first, but it may also be the other way around. If the processes execute very lengthy code, they may take turn to run, so that their outputs may be interleaved on the screen. In order to see the different process execution orders, the reader may perform the following experiments.

  1.  Uncomment line (1) to let the parent sleep for one second. Then the child should run to completion first.
  2.  Uncomment Line (2) but not Line (1) to let the child sleep for 2 seconds. Then the parent will run to completion first. If the parent process terminates first, the child’s ppid will change to 1 or to some other PID number. The reason for this will be explained later.
  3.  Uncomment both Line (1) and Line (2). The results should be the same as in case (2).

In addition to sleep(seconds), which suspends a calling process for a number of seconds, Unix/Linux also provide the following syscalls, which may affect the execution order of processes.

. nice(int inc): nice() increases the process priority value by a specified value, which lowers the process scheduling priority (larger priority value means lower priority). If there are processes with higher priority, this will trigger a process switch to run the higher priority process first. In a non-preemptive kernel, process switch may not occur immediately. It occurs only when the executing process is about to exit Kmode to return to Umode.

. sched_yield(void): sched_yield() causes the calling process to relinquish the CPU, allowing other process of higher priority to run first. However, if the calling process still has the highest priority, it will continue to run.

3. Process Termination

As pointed in Chap. 2 (Sect. 2.3.8), a process executing a program image may terminate in two possible ways.

(1). Normal Termination: Recall that the main() function of every C program is called by the C startup code crt0.o. If the program executes successfully, main() eventually returns to crt0.o, which calls the library function exit(0) to terminate the process. The exit(value) function does some clean-up work first, such as flush stdout, close I/O streams, etc. Then it issues an _exit (value) system call, which causes the process to enter the OS kernel to terminate. A 0 exit value usually means normal termination. If desired, a process may call exit(value) directly from anywhere inside a program without going back to crt0.o. Even more drastically, a process may issue a _exit(value) system call to terminate immediately without doing the clean-up work first. When a process terminates in kernel, it records the value in the _exit(value) system call as the exit status in the process PROC structure, notifies its parent and becomes a ZOMBIE. The parent process can find the ZOMBIE child, get its pid and exit status by the

pid = wait(int *status);

system call, which also releases the ZMOBIE child PROC structure as FREE, allowing it to be reused for another process.

(2). Abnormal Termination: While executing a program, the process may encounter an error condition, such as illegal address, privilege violation, divide by zero, etc. which is recognized by the CPU as an exception. When a process encounters an exception, it is forced into the OS kernel by a trap. The kernel’s trap handler converts the trap error type to a magic number, called SIGNAL, and delivers the signal to the process, causing it to terminate. In this case, the process terminates abnormally and the exit status of the ZOMBIE process is the signal number. In addition to trap errors, signals may also originate from hardware or from other processes. For example, pressing the Control_C key generates a hardware interrupt, which sends a number 2 signal (SIGINT) to all processes on the terminal, causing them to terminate. Alternatively, a user may use the command

kill -s signal_number pid          # signal_number=1 to 31

to send a signal to a target process identified by pid. For most signal numbers, the default action of a process is to terminate. Signals and signal handling will be covered later in Chap. 6.

In either case, when a process terminates, it eventually calls a kexit() function in the OS kernel. The general algorithm of kexit() was described in Sect. 3.5.1. The only difference is that the Unix/Linux kernel will erase the user mode image of the terminating process.

In Linux, each PROC has a 2-byte exitCode field, which records the process exit status. The high byte of exitCode is the exitValue in the _exit(exitValue) syscall, if the process terminated normally. The low byte is the signal number that caused it to terminate abnormally. Since a process can only die once, only one of the bytes has meaning.

4. Wait for Child Process Termination

At any time, a process may use the

int pid = wait(int *status);

system call, to wait for a ZOMBIE child process. If successful, wait() returns the ZOMBIE child PID and status contains the exitCode of the ZOMBIE child. In addition, wait() also releases the ZOMBIE child PROC as FREE for reuse. The wait() syscall invokes the kwait() function in kernel. The algorithm of kwait() is exactly the same as that described in Sect. 3.5.3

Example 3.3: The example program C3.3 demonstrates wait and exit system calls

/************** C3.3.c: wait() and exit() ***************/

#include <stdio.h>

#include <stdlib.h>

int main()

{

int pid, status;

pid = fork();

if (pid){ // PARENT:

printf(“PARENT %d WAITS FOR CHILD %d TO DIE\n”, getpid(),pid);

pid=wait(&status); // wait for ZOMBIE child process

printf(“DEAD CHILD=%d, status=0x%04x\n”, pid, status);

}

else{// child:

printf(“child %d dies by exit(VALUE)\n”, getpid());

(1).      exit(100);

}

}

When running the Example 3.3 program, the child termination status will be 0x6400, in which the high byte is the child’s exit value 100.

The reason why wait() waits for any ZOMBIE child can be justified as follows. After forking several login processes, P1 waits for any ZOMBIE children. As soon as a user logout from a terminal, P1 must respond immediately to fork another login process on that terminal. Since P1 does not know which login process will terminate first, it must wait for any ZOMBIE login child, rather than waiting for a specific one. Alternatively, a process may use the syscall

int pid = waitpid(int pid, int *status, int options);

to wait for a specific ZOMBIE child specified by the pid parameter with several options. For instance, wait(&status) is equivalent to waitpid(-1, &status, 0). The reader may consult the Linux man pages of wait for more details.

5. Subreaper Process in Linux

Since kernel version 3.4, Linux handles orphan processes in a slightly different way. A process may define itself as a subreaper by the syscall

prctl(PR_SET_CHILD_SUBREAPER);

If so, the init process P1 will no longer be the parent of orphan processes. Instead, the nearest living ancestor process that is marked as a subreaper will become the new parent. If there is no living subreaper process, orphans still go to the INIT process as usual. The reason to implement this mechanism is as follows. Many user space service managers, such as upstart, systemd, etc. need to track their started services. Such services usually create daemons by forking twice but let the intermediate child exit immediately, which elevates the grandchild to be a child of P1. The drawback of this scheme is that the service manager can no longer receive the SIGCHLD (death_of_child) signals from the service daemons, nor can it wait for any ZOMBIE children. All information about the children will be lost when P1 cleans up the re-parented processes. With the subreaper mechanism, a service manager can mark itself as a “sub-init”, and is now able to stay as the parent for all orphaned processes created by the started services. This also reduces the workload of P1, which does not have to handle all orphaned processes in the system. A good analogy is the following. Whenever a corporation gets too big, it’s time to break it up to prevent monopoly, as what happened to AT&T in the early 80’s. In the original Linux, P1 is the only process that is authorized to operate an orphanage. The subreaper mechanism essentially breaks the monopoly of P1, allowing any process to operate a local orphanage by declaring itself as a subreaper (even without a license!). As an example, in Ubuntu-15.10 and later, the per user init process is marked as a subreaper. It runs in Umode and belongs to the user. The reader may use the sh command

ps fxau | grep USERNAME | grep “/sbin/upstart”

to display the PID and information of the subreaper process. Instead of P1, it will be the parent of all orphaned processes of the user. We demonstrate the subreaper process capability of Linux by an example.

Example 3.4: The example program C3.4 demonstrates subreaper processes in Linux

/************** C3.4.c: Subreaper Process ***************/

#include <stdio.h>

#include <unistd.h>

#include <wait.h>

#include <sys/prctl.h>

int main()

{

int pid, r, status;

printf(“mark process %d as a subreaper\n”, getpid());

r = prctl(PR_SET_CHILD_SUBREAPER);

pid = fork();

if (pid){        // parent

printf(“subreaper %d child=%d\n”, getpid(), pid);

while(1){

pid = wait(&status); // wait for ZOMBIE children

if (pid>0)

printf(“subreaper %d waited a ZOMBIE=%d\n”, getpid(), pid);

else                // no more children

break;

}

}

else{           // child

printf(“child %d parent=%d\n”, getpid(), (pid_t)getppid());

pid = fork(); // child fork a grandchild

if (pid){      // child

printf(“child=%d start: grandchild=%d\n”, getpid(), pid);

printf(“child=%d EXIT : grandchild=%d\n”, getpid(), pid);

}

else{          // grandchild

printf(“grandchild=%d start: myparent=%d\n getppid());

printf(“grandchild=%d EXIT : myparent=%d\n getppid());

}

}

}

Figure 3.6 shows the sample outputs of running the Example 3.4 program.

In the Example 3.4 program, the process (9620) first marks itself as a subreaper. Then it forks a child (9621) and uses a while loop to wait for ZOBMIE children until there is none. The child process forks a child of its own, which is the grandchild (9622) of the first process. When the program runs, either the child (9621) or the grandchild (9622) may terminate first. If the grandchild terminates first, its parent would still be the same (9621). However, if the child terminates first, the grandchild would become a child of P1 if there is no living ancestor marked as a subreaper. Since the first process (9620) is a subreaper, it will adopt the grandchild as an orphan if its parent died first. The outputs show that the parent of the grandchild was 9621 when it starts to run, but changed to 9620 when it exits since its original parent has already died. The outputs also show that the subreaper process 9620 has reaped both 9621 and 9622 as ZOMBIE children. If the user kills the per user init process, it would amounts to a user logout. In that case, P1 would fork another user init process, asking the user to login again.

6. exec(): Change Process Execution Image

A process may use exec() to change its Umode image to a different (executable) file. The exec() library functions have several members:

int execl( const   char *path,  const char *arg, …);

int execlp(const   char *file,  const char *arg, …);

int execle(const   char *path,  const char *arg,..,char *const envp[]);

int execv( const char *path, char *const argv[]); int execvp(const char *file, char *const argv[]);

All of these are wrapper functions, which prepare the parameters and eventually issue the syscall

int execve(const char *filename, char *const argv[ ], char *const env^[ ]);

In the execve() syscall, the first parameter filename is either relative to the Current Working Directory (CWD) or an absolute pathname. The parameter argv[ ] is a NULL terminated array of string pointers, each points to a command line parameter string. By convention, argv[0] is the program name and other argv[ ] entries are command line parameters to the program. As an example, for the command line

a.out one two three

the following diagram shows the layout of argv [ ]

7. Environment Variables

Environment variables are variables that are defined for the current sh, which are inherited by children sh or processes. Environment variables are set in the login profiles and .bashrc script files when sh starts. They define the execution environment of subsequent programs. Each Environment variable is defined as

KEYWORD=string

Within a sh session the user can view the environment variables by using the env or printenv command. The following lists some of the important environment variables

SHELL=/bin/bash

TERM=xterm

USER=kcw

PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:./

HOME=/home/kcw

 SHELL: This specifies the sh that will be interpreting any user commands.

TERM: his specifies the type of terminal to emulate when running the sh.

USER: The current logged in user.

PATH: A list of directories that the system will check when looking for commands.

HOME: home directory of the user. In Linux, all user home directories are in /home

While in a sh session, environment variables can be set to new (string) values, as in

HOME=/home/newhome

which can be passed to descendant sh by the EXPORT command, as in

export HOME

They can also be unset by setting them to null strings. Within a process, environment variables are passed to C programs via the env[ ] parameter, which is a NULL terminated array of string pointers, each points to an environment variable.

Environment variables define the execution environment of subsequent programs. For example, when sh sees a command, it searches for the executable command (file) in the directories of the PATH environment variable. Most full screen text editors must know the terminal type they are running on, which is set in the TERM environment variable. Without the TERM information, a text editor may misbehave. Therefore, both command line parameters and environment variables must be passed to an executing program. This is the basis of the main() function in all C programs, which can be written as

int main(int argc, char *argv[ ], char *env[ ])

Exercise 3.1. Inside the main() function of a C program, write C code to print all the command line parameters and environment variables.

Exercise 3.2. Inside the main() function of a C program, find the PATH environment variable, which is of the form

PATH=/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games:./

Then write C code to tokenize the PATH variable string into individual directories.

If successful, exec(‘filename”,….) replaces the process Umode image with a new image from the executable filename. It’s still the same process but with a new Umode image. The old Umode image is abandoned and therefore never returned to, unless exec() fails, e.g. filename does not exist or is non-executable.

In general, after exec(), all opened files of the process remain open. Opened file descriptors that are marked as close-on-exec are closed. Most signals of the process are reset to default. If the executable file has the setuid bit turned on, the process effective uid/gid are changed to the owner of the executable file, which will be reset back to the saved process uid/gid when the execution finishes.

Example 3.5: The example program C3.5 demonstrates change process image by execl(), which is of the form

execl(“a.out”, “a.out”, argl, arg2,…, 0);

The library function execl() assembles the parameters into argv[ ] form first before calling execve (“a.out”, argv[ ], env[ ]).

/********** C3.5 program files ***************/

// (1). ——– b.c file: gcc to b.out————

#include <stdio.h>

int main(int argc, char *argv[])

{

printf(“this is %d in %s\n”, getpid(), argv[0]);

}

// (2). ——– a.c file: gcc to a.out————

#include <stdio.h>

int main(int argc, char *argv[])

{

printf(“THIS IS %d IN %s\n”, getpid(), argv[0]);

int r = execl(“b.out”, “b.out”, “hi”, 0);

printf(“SEE THIS LINE ONLY IF execl() FAILED\n”);

}

The reader may compile b.c into b.out first. Then compile a.c and run a.out, which will change the execution image from a.out to b.out, but the process PID does not change, indicating that it’s still the same process.

Example 3.6: The example program C3.6 demonstrates change process image by execve().

In this example program, we shall demonstrate how to run Linux commands in the /bin directory by execve(). The program should be run as

a.out command [options]

where command is any Linux command in the /bin directory and [options] are optional parameters to the command program, e.g.

a.out ls -l; a.out cat filename; etc.

The program assembles command and [options] into myargv[ ] and issues the syscall

execve(“/bin/command”, myargv, env);

to execute the /bin/command file. The process returns to the old image if execve() fails.

/*********** C3.6.c file: compile and run a.out ***********/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

char *dir[64], *myargv[64]; // assume at most 64 parameters

char cmd[128];

int main(int argc, char *argv[], char *env[])

{

int i, r;

printf(“THIS IS PROCESS %d IN %s\n”, getpid(), argv[0]);

if (argc < 2){

printf(“Usage: a.out command [options]\n”);

exit(0);

}

printf(“argc = %d\n”, argc);

for (i=0; i<argc; i++)      // print argv[ ] strings

printf(“argv[%d] = %s\n”, i, argv[i]);

for (i=0; i<argc-1; i++)    // create myargv[ ]

myargv[i] = argv[i+1];

myargv[i] =0;              // NULL terminated array

strcpy(cmd, “/bin/”);       // create /bin/command

strcat(cmd, myargv[0]);

printf(cmd = %s\n”, cmd); // show filename to be executed

int r = execve(cmd, myargv, env);

// come to here only if execve() failed

printf(“execve() failed: r = %d\n”, r);

}

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *