1. Threads
1.1. Principle of Threads
An operating system (OS) comprises many concurrent processes. In the process model, processes are independent execution units. Each process executes in either kernel mode or user mode. While in user mode, each process executes in a unique address space, which is separated from other processes. Although each process is an independent unit, it has only one execution path. Whenever a process must wait for something, e.g. an I/O completion event, it becomes suspended and the entire process execution stops. Threads are independent execution units in the same address space of a process. When creating a process, it is created in a unique address space with a main thread. When a process begins, it executes the main thread of the process. With only a main thread, there is virtually no difference between a process and a thread. However, the main thread may create other threads. Each thread may create yet more threads, etc. All threads in a process execute in the same address space of the process but each thread is an independent execution unit. In the threads model, if a thread becomes suspended, other threads may continue to execute. In addition to sharing a common address space, threads also share many other resources of a process, such as user id, opened file descriptors and signals, etc. A simple analogy is that a process is a house with a house master (the main thread). Threads are people living in the same house of a process. Each person in a house can carry on his/her activities independently, but they share some common facilities, such as the same mailbox, kitchen and bathroom, etc. Historically, most computer vendors used to support threads in their own proprietary operating systems. The implementations vary considerably from system to system. Currently, almost all OS support Pthreads, which is the threads standard of IEEE POSIX 1003.1c (POSIX 1995). For more information, the reader may consult numerous books (Buttlar et al. 1996) and online articles on Pthreads programming (Pthreads 2017).
1.2. Advantages of Threads
Threads have many advantages over processes.
- Thread creation and switching are faster: The context of a process is complex and large. The complexity stems mainly from the need for managing the process image. For example, in a system with virtual memory, a process image may be composed of many units of memory called pages. During execution some of the pages are in memory while others may not. The OS kernel must use several page tables and many levels of hardware assistances to keep track of the pages of each process. To create a new process, the OS must allocate memory and build the page tables for the process. To create a thread within a process, the OS does not have to allocate memory and build page tables for the new thread since it shares the same address space of the process. So, creating a thread is faster than creating a process. Also, thread switching is faster than process switching for the following reasons. Process switching involves replacing the complex paging environment of one process with that of another, which requires a lot of operations and time. In contrast, switching among threads in the same process is much simpler and faster because the OS kernel only needs to switch the execution points without changing the process image.
- Threads are more responsive: A process has only a single execution path. When a process becomes suspended, the entire process execution stops. In contrast, when a thread becomes suspended, other threads in the same process can continue to execute. This allows a program with threads to be more responsive. For example, in a process with multiple threads, while one thread becomes blocked to wait for I/O, other threads can still do computations in the background. In a server with threads, the server can serve multiple clients concurrently.
- Threads are better suited to parallel computing: The goal of parallel computing is to use multiple execution paths to solve problems faster. Algorithms based on the principle of divide and conquer, e.g. binary search and quicksort, etc. often exhibit a high degree of parallelism, which can be exploited by using parallel or concurrent executions to speed up the computation. Such algorithms often require the execution entities to share common data. In the process model, processes cannot share data efficiently because their address spaces are all distinct. To remedy this problem, processes must use Interprocess Communication (IPC) to exchange data or some other means to include a common data area in their address spaces. In contrast, threads in the same process share all the (global) data in the same address space. Thus, writing programs for parallel executions using threads is simpler and more natural than using processes.
1.3. Disadvantages of Threads
On the other hand, threads also have some disadvantages, which include
- Because of shared address space, threads needs explicit synchronization from the user.
- Many library functions may not be threads safe, e.g. the traditional strtok() function, which divides a string into tokens in-line. In general, any function which uses global variables or relies on contents of static memory is not threads safe. Considerable efforts are needed to adapt library functions to the threads environment.
- On single CPU systems, using threads to solve problems is actually slower than using a sequential program due to the overhead in threads creation and context switching at run-time.
2. Threads Operations
The execution locus of a thread is similar to that a process. A thread can execute in either kernel mode or user mode. While in user mode, threads executes in the same address space of a process but each has its own execution stack. As an independent execution unit, a thread can make system calls to the OS kernel, subject to the kernel’s scheduling policy, becomes suspended, activated to resume execution, etc. To take advantage of the shared address space of threads, the OS kernel’s scheduling policy may favor threads of the same process over those in different processes. As of now, almost all operating systems support POSIX Pthreads, which defines a standard set of Application Programming Interfaces (APIs) to support threads programming. In the following, we shall discuss and demonstrate concurrent programming by Pthreads in Linux (Goldt et al 1995; IBM; Love 2005; Linux Man Page Project 2017).
3. Threads Management Functions
The Pthreads library offers the following APIs for threads management.
pthread_create(thread, attr, function, arg): create thread
pthread_exit(status) : terminate thread
pthread_cancel(thread) : cancel thread
pthread_attr_init(attr) : initialize thread attributes
pthread_attr_destroy(attr): destroy thread attribute
3.1. Create Thread
threads are created by the pthread_create() function.
int pthread_create (pthread_t *pthread_id, pthread_attr_t *attr,
void *(*func)(void *), void *arg);
which returns 0 on success or an error number on failure. Parameters to the pthread_create() function are
.pthread_id is a pointer to a variable of the pthread_t type. It will be filled with the unique thread ID assigned by the OS kernel. In POSIX, pthread_t is an opaque type. The programmer should not know the contents of an opaque object because it may depend on implementation. A thread may get its own ID by the pthread_self() function. In Linux, pthread_t type is defined as unsigned long, so thread ID can be printed as %lu.
.attr is a pointer to another opaque data type, which specifies the thread attributes, which are explained in more detail below.
.func is the entry address of a function for the new thread to execute.
.arg is a pointer to a parameter for the thread function, which can be written as
void *func(void *arg)
Among these, the attr parameter is the most complex one. The steps of using an attr parameter are as follows.
- Define a pthread attribute variable pthread_attr_t attr
- Initialize the attribute variable with pthread_attr_init (&attr)
- Set the attribute variable and use it in pthread_create() call
- If desired, free the attr resource by pthread_attr_destroy (&attr)
The following shows some examples of using the attribute parameter. By default, every thread is created to be joinable with other threads. If desired, a thread can be created with the detached attribute, which makes it non-joinable with other threads. The following code segment shows how to create a detached thread.
pthread_attr_t attr; // define an attr variable
pthread_attr_init(&attr); // initialize attr
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED); // set attr
pthread_create(&thread_id, &attr, func, NULL); // create thread with attr
pthread_attr_destroy(&attr); // optional: destroy attr
Every thread is created with a default stack size. During execution, a thread may find out its stack size by the function
size_t pthread_attr_getstacksize()
which returns the default stack size. The following code segment shows how to create a thread with a specific stack size.
If the attr parameter is NULL, threads will be created with default attributes. In fact, this is the recommended way of creating threads, which should be followed unless there is a compelling reason to alter the thread attributes. In the following, we shall always use the default attributes by setting attr to NULL.
3.2. Thread ID
Thread ID is an opaque data type, which depends on implementation. Therefore, thread IDs should not be compared directly. If needed, they can be compared by the pthread_equal () function.
int pthread_equal (pthread_t t1, pthread_t t2);
which returns zero if the threads are different threads, non-zero otherwise.
3.3. Thread Termination
A thread terminates when the thread function finishes. Alternatively, a thread may call the function
int pthread_exit (void *status);
to terminate explicitly, where status is the exit status of the thread. As usual, a 0 exit value means normal termination, and non-zero values mean abnormal termination.
3.4. Thread Join
A thread can wait for the termination of another thread by
int pthread_join (pthread_t thread, void **status_ptr);
The exit status of the terminated thread is returned in status_ptr.
4. Threads Example Programs
4.1. Sum of Matrix by Threads
Example 4.1: Assume that we want to compute the sum of all the elements in an N x N matrix of integers. The problem can be solved by a concurrent algorithm using threads. In this example, the main thread first generates an N x N matrix of integers. Then it creates N working threads, passing as parameter a unique row number to each working thread, and waits for all the working threads to terminate. Each working thread computes the partial sum of a distinct row, and deposits the partial sum in a corresponding row of a global array int sum[N]. When all the working threads have finished, the main thread resumes. It computes the total sum by adding the partial sums generated by the working threads. The following shows the complete C code of the example program C4.1. Under Linux, the program must be compiled as
gcc C4.1.c -pthread
/**** C4.1.c file: compute matrix sum by threads ***/
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#define N 4
int A[N][N], sum[N];
void *func(void *arg) // threads function
{
int j, row;
pthread_t tid = pthread_self(); // get thread ID number
row = (int)arg; // get row number from arg
printf(“Thread %d [%lu] computes sum of row %d\n”, row, tid, row);
for (j=0; j<N; j++) // compute sum of Ain global sum
sum += A[j];
printf(“Thread %d [%lu] done: sum[%d] = %d\n”, row, tid, row, sum);
pthread_exit((void*)0); // thread exit: 0=normal termination
}
int main (int argc, char *argv[])
{
pthread_t thread[N]; // thread IDs
int i, j, r, total = 0; void *status;
printf(“Main: initialize A matrix\n”);
for (i=0; i<N; i++){
sum[i] = 0;
for (j=0; j<N; j++){
A[i][j] = i*N + j + 1;
printf(“%4d “, A[i][j]);
}
printf(“\n”);
}
printf(“Main: create %d threads\n”, N);
for(i=0; i<N; i++) {
pthread_create(&thread[i], NULL, func, (void *)i);
}
printf(“Main: try to join with threads\n”);
for(i=0; i<N; i++) {
pthread_join(thread[i], &status);
printf(“Main: joined with %d [%lu]: status=%d\n”,
i, thread[i], (int)status);
}
printf(“Main: compute and print total sum: “);
for (i=0; i<N; i++) total += sum[i];
printf(“tatal = %d\n”, total);
pthread_exit(NULL);
}
Figure 4.1 shows the outputs of running the Example Program C4.1. It shows the executions of individual threads and their computed partial sums. It also demonstrates the thread join operation.
4.2. Quicksort by Threads
Example 4.2: Quicksort by Concurrent Threads
In this example, we shall implement a parallel quicksort program by threads. When the program starts, it runs as the main thread of a process. The main thread calls qsort(&arg), with arg =[lowerbound=0, upperbound=N-1]. The qsort() function implements quicksort of an array of N integers. In qsort(), the thread picks a pivot element to divide the array into two parts such that all elements in the left part are less than the pivot and all elements in the right part are greater than the pivot. Then it creates two subthreads to sort each of the two parts, and waits for the subthreads to finish. Each subthread sorts its own range by the same algorithm recursively. When all the subthreads have finished, the main thread resumes. It prints the sorted array and terminates. As is well known, the number of sorting steps of quicksort depends on the order of the unsorted data, which affects the number of threads needed in the qsort program.
/****** C4.2.c: quicksort by threads *****/
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
typedef struct{
int upperbound;
int lowerbound;
}PARM;
#define N 10
int A[N] = {5,1,6,4,7,2,9,8,0,3}; // unsorted data
int print() // print current a[] contents {
int i;
printf(“[ “);
for (i=0; i<N; i++) printf(“%d “, a[i]);
printf(“]\n”);
}
void *qsort(void *aptr)
{
PARM *ap, aleft, aright;
int pivot, pivotIndex, left, right, temp;
int upperbound, lowerbound;
pthread_t me, leftThread, rightThread;
me = pthread_self();
ap = (PARM *)aptr;
upperbound = ap->upperbound;
lowerbound = ap->lowerbound;
pivot = a[upperbound]; // pick low pivot value
left = lowerbound – 1;
right = upperbound;
if (lowerbound >= upperbound) pthread_exit(NULL);
while (left < right) {
do { left++;} while (a[left] < pivot);
do { right–;} while (a[right] > pivot);
if (left < right ) {
temp = a[left];
a[left] = a[right];
a[right] = temp;
}
}
print();
pivotIndex = left; // put pivot back
temp = a[pivotIndex];
a[pivotIndex] = pivot;
a[upperbound] = temp;
// start the “recursive threads”
aleft.upperbound = pivotIndex – 1;
aleft.lowerbound = lowerbound;
aright.upperbound = upperbound;
aright.lowerbound = pivotIndex + 1;
printf(“%lu: create left and right threads\n”, me);
pthread_create(&leftThread, NULL, qsort, (void *)&aleft);
pthread_create(&rightThread, NULL, qsort, (void *)&aright);
// wait for left and right threads to finish
pthread_join(leftThread, NULL);
pthread_join(rightThread, NULL);
printf(“%lu: joined with left & right threads\n”, me);
}
int main(int argc, char *argv[])
{
PARM arg;
int i, *array;
pthread_t me, thread;
me = pthread_self();
printf(“main %lu: unsorted array = “, me);
print();
arg.upperbound = N-1;
arg.lowerbound = 0;
printf(“main %lu create a thread to do QS\n”, me);
pthread_create(&thread, NULL, qsort, (void *)&arg);
// wait for QS thread to finish
pthread_join(thread, NULL);
printf(“main %lu sorted array = “, me);
print();
}
Figure 4.2 shows the outputs of running the Example Program C4.2, which demonstrates parallel quicksort by concurrent threads.
Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.