Link List Processing in Unix/Linux

Structures and pointers are often used to construct and manipulate dynamic data structures, such as link lists, queues and trees, etc. The most basic type of dynamic data structures is the link list. In this section, we shall explain the concepts of link list, list operations, and demonstrate list processing by example programs.

1. Link Lists

Assume that nodes are structures of NODE type, as in

A (singly) link list is a data structure consisting of a sequence of nodes, which are linked together by the next pointers of the nodes, i.e. each node’s next pointer points to a next node in the list. Link lists are represented by NODE pointers. For example, the following C statements define two link lists, denoted by list and head.

NODE *list, *head;          // define list and head as link lists

A link list is empty if it contains no nodes, i.e. an empty link list is just a NULL pointer. In C programming, the symbol NULL is defined by the macro (in stddef.h)

#define NULL (void *)0

as a pointer with a 0 value. For convenience, we shall use either NULL or 0 to denote the null pointer. Thus, we may initialize link lists as empty by assigning them with null pointers, as in

list = NULL;    // list = null pointer

head =0;        // head = null pointer

2. Link List Operations

The most common types of operations on link lists are

Build: initialize and build list with a set of nodes

Traversal: step through the elements of a list

Search: search for an element by key

Insertion: insert new elements into a list

Deletion: delete an existing element from a list

Reorder: reorder a link list by (changed) element key or priority value

In the following, we shall develop C programs to demonstrate list processing.

3. Build Link List

In many C programming books, dynamic data structures are usually constructed with data objects that are dynamically allocated by the C library function malloc(). In these cases, data objects are allocated from the program’s heap area. This is fine, but some books seem to over emphasize the notion that dynamic data structures must use malloc(), which is false. Unlike static data structures, such as arrays, in which the number of elements is fixed, dynamic data structures consists of data objects that can be modified with ease, such as to insert new data objects, delete existing data objects and reorder the data objects, etc. It has nothing to do with where the data objects are located, which can be in either the program’s data area or heap area. We shall clarify and demonstrate this point in the following example programs.

3.1. Link List in Data Area

Example Program C2.1: This program builds a link list from an array of node structures. The program consists of a type.h header file and a C2.1.c file, which contains the main() function.

(1). type.h file: First, we write a type.h file, which includes the standard header files, defines the

constant N and the NODE type. The same type.h file will be used as an included file in all list processing example programs.

/********** type.h file ***********/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#define N 10

typedef struct node{

struct node *next;

int id;

char name[64];

}NODE;

/******* end of type.h file *******/

(2). C2.1 Program code: The program constructs a link list and prints the link list.

/*********************** C2.1.c file **********************/

#include “type.h”

NODE *mylist, node[N]; // in bss section of program run-time image

int printlist(NODE *p) // print list function

{

while(p){

printf(“[%s %d]->”, p->name, p->id);

p = p->next;

}

printf(“NULL\n”) ;

}

int main()

{

int i;
NODE *p;
for (i=0; i<N; i++){

p = &node[i];
sprintf(p->name, “%s%d”, “node”, i); // node0, node1, etc.

p->id = i;  // used node index as ID
p->next = p+1;  // node[i].next = &node[i+1];

}

node[N-1].next = 0;

mylist = &node[0];        // mylist points to node[0]

printlist(mylist);        // print mylist elements

}

(4). Explanation of the C2.1.c code

(4).1. Memory location of variables: The following diagram shows the memory location of the variables, which are all in the program’s Data area. More precisely, they are in the bss section of the combined program Data area.

(4).2. The main() function: The main() function builds a link list with node names = “node0” to “node9” and node id values=0 to 9. The link list ends with a null pointer. Note that the list pointer and list elements are all in the array of node[N], which is the program’s data area.

(5). Run Results: The reader may compile and run the C2.1 program. It should print the list as

[nodeO 0]->[ndoe1 1]-> … [node9 9]->NULL

3.2. Link List in Heap Area

Example C2.2: The next example program, C2.2, builds a link list in the program’s heap area.

/**************** C2.2.c file *****************/

#include “type.h”

NODE *mylist, *node; // pointers, no NODE area yet

int printlist(NODE *p){ // same as in C2.1 }

int main()

{

int i;

NODE *p;

node = (NODE *)malloc(N*sizeof(NODE)); // node->N*72 bytes in HEAP

for (i=0; i < N; i++){

p = &node[i];         // access each NODE area in HEAP

sprintf(p->name, “%s%d”, “node”,i);

p->id = i;

p->next = p+1;        // node[i].next = &node[i+1];

}

node[N-1].next = 0;

mylist = &node[0];

printlist(mylist);

}

(1). Memory location of variables: In the C2.2 program, both the variables mylist and node are NODE pointers, which are uninitialized globals. As such they are in the program’s bss section, as shown in Fig. 2.21. The program does not have any memory locations for the actual node structures.

NODE *mylist, *node; // pointers, no NODE area yet

When the program starts, it uses the statement

node = (NODE *)malloc(N*sizeof(NODE));

to allocate N*72 bytes in the program’s HEAP area, which will be used as nodes of the list. Since node is a NODE pointer, which points to a memory area containing N adjacent nodes, we may regard the memory area as an array and use node[i] (i=0 to N-1) to access each node element. This is a general principle. Whenever we have a pointer pointing to a memory area containing N adjacent data objects, we may access the memory area as a sequence of array elements. As a further example, assume

int *p = (int *)malloc(N*sizeof(int));

Then the memory area pointed by p can be accessed as *(p + i) or p[i], i=0 to N-1.

Example C2.3: The example program C2.3 builds a link list in the program’s heap area but with individually allocated nodes.

/**************** C2.3.c file *****************/

#include “type.h”

NODE *mylist, *node; // pointers, no NODE area yet

int printlist(NODE *p){ // same as in L1.c }

int main()

{

int i;

NODE *p,  *q;

for (i = 0; i < N; i++){

p = (NODE *)malloc(sizeof(NODE)); // allocate a node in heap

sprintf(p->name, “%s%d”, “node”,i); p->id = i;

p->next =0;        // node[i].next = 0;

if (i==0){

mylist = q = p; // mylist -> node0; q->current node

}

q->next = p;

q = p;

}

printlist(mylist);

}

4. Link List Traversal

Link list traversal is to step through the list elements in sequential order. The simplest link list traversal is to print the list elements. To do this, we may implement a generic printlist() function as follows.

int printlist(char *listname, NODE *list)

{

printf(“%s = “, name);

while(list){

printf(“[%s %d]->”, list->name, list->id);

list = list->next;

}

printf(“NULL\n”);

}

Alternatively, the printlist() function can be implemented by recursion.

void rplist(NODE *p)

{

if (p == 0){

printf(“NULL\n”);

return;

}

printf(“[%s %d]->”, p->name, p->id);

rplist(p->next);         // recursive call to rplist(p->next)

}

int printlist(char *listname, NODE *list)

{

printf(“%s = “, name); // print list name

rplist(list);          // recursively print each element

}

In addition to printing a link list, there are many other list operations that are essentially variations of the list traversal problem. Here we cite such a problem, which is a favorite question during interviews for Computer Science jobs.

This example implements a function which computes the SUM of the node values in a link list.

int sum(NODE *list) // return SUM of node values in a link list

{

int sum = 0; while(list){

sum += list->value; // add value of current node

list = list->next;      // step to next node

}

return sum;

}

Alternatively, the sum() function can be implemented recursively, as in

int sum(NODE *list) // return SUM of node values in a link list

{

if (list == 0)
return 0;

return list->value + sum(list->next);

}

Using the ? operator of C, this can be simplified further as

int sum(NODE *list)

{ return (list==0)? 0: list->value + sum(list->next); }

When traversing a singly link list using a single node pointer, we can only access the current node but not the previous node. A simple way to remedy this problem is to use two pointers while traversing a link list. We illustrate this technique by an example.

Traversing a (singly) link list with two pointers.

NODE *p,  *q;

p = q = list;         // both p and q start form list

while(p){

// access current node by p AND previous node by q

q = p;             // let q point at current node

p = p->next;       // advance p to next node

}

In the above code segment, p points at the current node and q points at the previous node, if any. This allows us to access both the current and previous nodes. This technique can be used in such operations as to delete the current node, insert before the current node, etc.

5. Search Link List

The search operation searches a link list for an element with a given key, which can be either an integer value or a string. It returns a pointer to the element if it exists, or NULL if not. Assuming that the key is an integer value, we can implement a search() function as follows.

NODE *search(NODE *list, int key)
{

NODE *p = list;
while(p){

if (p->key == key)
return p;
p = p->next;
// found element with key
// return node pointer
// advance p to next node

}
return 0;
// not found, return 0

}

6. Insert Operation

The insert operation inserts a new element into a link list by a specified criterion. For singly link lists, the simplest insertion is to insert to the end of a list. This amounts to traversing the list to the last element and then adding a new element to the end. We illustrate this by an example.

Example C2.4: The example program C2.4 builds a link list by inserting new nodes to the end of list.

(1) The insertion function: The insert() function enters a new node pointer p into a list. Since the insert operation may modify the list itself, we must pass the list parameter by reference, i.e. passing the address of the list. Normally, all elements in a link list should have unique keys. To ensure no two elements with the same key, the insert function may check this first. It should reject the insertion request if the node key already exists. We leave this as an exercise for the reader.

 int insert(NODE **list, NODE *p) // insert p to end of *list
{

NODE *q = *list;
if (q == 0) // if list is empty:
*list = p; // insert p as first element
else{      // otherwise, insert p to end of list

while(q->next)  // step to LAST element
q = q->next;
q->next = p;  // let LAST element point to p

}

p->next = 0;  // p is the new last element.

}

(2) The C2.4.c file:

/***************** c2 4 c file **************/ #include “type.h”

#include “type.h”
NODE *mylist;
int main()
{

char line[128], name[64];
int id;
NODE *p;
mylist = 0;
while(1){
// initialize mylist to empty list
printf(“enter node name and id value : “);
fgets(line, 128, stdin);  // get an input line
line[strlen(line)-1] = 0;  // kill \n at end
if (line[0] == 0)  // break out on empty input line

break;
sscanf(“%s %d”, name, &id); // extract name string and id value
p = (NODE *)malloc(sizeof(NODE));
if (p==0) exit(-1);
// out of HEAP memory
strcpy(p->name, name);
p->id = id;
insert(&mylist, p); // insert p to list end
printlist(mylist);
}

}

7. Priority Queue

A priority queue is a (singly) link list ordered by priority, with high priority values in front. In a priority queue, nodes with the same priority are ordered First-In-First-Out (FIFO). We can modify the insert() function to an enqueue() function, which inserts nodes into a priority queue by priority. Assume that each node has a priority field. The following shows the enqueue() function.

int enqueue(NODE **queue, NODE *p) // insert p into queue by priority

{

NODE *q = *queue;

if (q==0 || p->priority > q->priority){

*queue = p;

p->next = q;

}

else{

while(q->next && p->priority <= q->next->priority)

q = q->next;

p->next = q->next;

q->next = p;

}

}

8. Delete Operation

Given a link list and a list element key, delete the element from the list if it exists. Since the delete operation may modify the list, the list parameter must be passed by reference. Here we assume the element key is an integer. The delete() function returns the deleted node pointer, which can be used to free the node if it was dynamically allocated earlier.

NODE *delete(NODE **list, int key)

{

NODE *p, *q;
if (*list == 0)  // empty list

return 0;        // return 0 if deletion failed
p = *list;
if (p->key == key){  // found element at list beginning

list = p->next;    // modify *list
return p;

}

// element to be deleted is not the first one;

try to find it
q = p->next;
while(q){

if (q->key == key){

p->next = q->next;
return q;
// delete q from list

}
p = q;
q = q->next;

}
return 0; // failed to find element

}

// empty list

// return 0 if deletion failed

// found element at list beginning // modify *list

} // element to be deleted is not the first one; try to find it

q = p->next;

while(q){

if (q->key == key){

p->next = q->next; // delete q from list

return q;

} p = q;

q = q->next;

}

return 0;                // failed to find element

}

9. Circular Link List

A circular link list is one in which the last element points to the first element. As usual, an empty circular link list is a NULL pointer. Figure 2.22 shows a circular link list.

When traversing a circular list, we must detect the “end of list” condition properly to terminate the traversal. Assume that list is a non-empty circular list.

NODE *list, *p = list;

while(p->next != list){ // stop after last element

// access current node p;

p = p->next;

}

10. Open-Ended C Structures

In the original C language, all members of a structure must be fully specified so that the size of every structure is fixed at compile time. In practice, there are situations in which the size of a structure may vary. For convenience, the C language is extended to support open-ended structures containing an incompletely specified field. As an example, consider the following structure, in which the last member is an array of unspecified size.

struct node{

struct node *next;

int ID;

char name[ ];    // unspecified array size

};

In the above open-ended structure, name[ ] denotes an incompletely specified field, which must be the last entry. The size of an open-ended structure is determined by the specified fields. For the above example, the structure size is 8 bytes. To use such a structure, the user must allocate the needed memory for the actual structure, as in

struct node *sp = malloc(sizeof(struct node) + 32);

strcpy(sp->name, “this is a test string”);

The first line allocates a memory area of 8+32 bytes, with 32 bytes intended for the field name. The next line copies a string into the name field in the allocated memory area. What if the user does not allocate memory for an open-ended structure and use them directly? The following code lines illustrate the possible consequences.

struct node x;               // define a variable x of size only 8 bytes

strcpy(x.name, “test”);       // copy “test” into x.name field

In this case, x has only 8 bytes but there is no memory area for the name field. The second statement would copy the string “test” into the memory area immediately following x at run-time, overwriting whatever was in it. This may corrupt other data objects of the program, causing it to crash later.

11. Doubly Link Lists

A doubly link list, or dlist for short, is one in which the list elements are linked together in two directions, both forward and backward. In a dlist, each element has two pointers, denoted by next and prev, as shown below.

typedef struct node{

struct node *next; // pointer to next node

struct node *prev; // pointer to previous node

// other fields, e.g. key, name, etc.

}NODE;                  // list NODE type

In the node structure of dlist, the next pointer points to a next node, allowing forward traversal of the dlist from left to right. The prev pointer points to a previous node, allowing backward traversal of the dlist from right to left. Unlike a singly link list, which is represented by a single NODE pointer, doubly link lists may be represented in three different ways, each with its advantages and disadvantages. In this section, we shall use example programs to show doubly link list operations, which include

  • . Build dlist by inserting new elements to the end of list.
  • . Build dlist by inserting new elements to the front of list.
  • . Print dlist in both forward and backward directions.
  • . Search dlist for an element with a given key.
  • . Delete element from dlist.

12. Doubly Link Lists Example Programs

Example Program C2.5: Doubly Link List Version 1

In the first dlist example program, we shall assume that a dlist is represented by a single NODE pointer, which points to the first list element, if any. Figure 2.23 shows such a dlist.

Since the first element has no predecessor, its prev pointer must be NULL. Likewise, the last element has no successor, its next pointer must be NULL also. An empty dlist is just a NULL pointer. The following lists the C code of the example program C2.5.

/********** dlist Program C2.5.c **********/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

typedef struct node{

struct node *next;

struct node *prev;

int key;

}NODE;

NODE *dlist;        // dlist is a NODE pointer

int insert2end(NODE **list, int key) // insert to list END

{

NODE *p,  *q;

//printf(“insert2end: key=%d “, key);

p = (NODE *)malloc(sizeof(NODE));

p->key = key;

p->next = 0;

q = *list;

if (q==0){           // list empty

*list = p;

p->next = p->prev = 0;

}

else{

while(q->next)     // step to LAST element

q = q->next;

q->next = p;       // add p as last element

p->prev = q;

}

}

int insert2front(NODE **list, int key) // insert to list FRONT

{

NODE *p,  *q;

//printf(“insert2front: key=%d “, key); p = (NODE *)malloc(sizeof(NODE)); p->key = key;

p->prev =0;          //no previous node

q = *list;

if (q==0){           // list empty

*list = p;

p->next = 0;

}

else{

p->next = *list;

q->prev = p;

*list = p;

}

}

void printForward(NODE *list) // print list forward

{

printf(“list forward = “); while(list){

printf(“[%d]->”, list->key);

list = list->next;

}

printf(“NULL\n”);

}

void printBackward(NODE *list) // print list backward

{

printf(“list backward=”);

NODE *p = list; if (p){

while (p->next) // step to last element p = p->next;

while(p){

printf(“[%d]->”, p->key); p = p->prev;

}

}

printf(“NULL\n”);

}

NODE *search(NODE *list, int key) // search for an element with a key

{

NODE *p = list; while(p){

if (p->key==key){

printf(“found %d at %x\n”, key, (unsigned int)p);

return p;

}

p = p->next;

}

return 0;

}

int delete(NODE **list, NODE *p) // delete an element pointed by p

{

NODE *q = *list;

if (p->next==0 && p->prev==0){         // p is the only node

*list = 0;

}

else if (p->next==0 && p->prev != 0){ // last but NOT first

p->prev->next = 0;

}

else if (p->prev==0 && p->next != 0){ // first but NOT last

*list = p->next;

p->next->prev = 0;

}

else{

p->prev->next p->next->prev

}

free(p);

}

int main()

{

int i, key;

NODE *p; printf(“dlist program #1\n”);

printf(“insert to END\n”);

dlist = 0;                  // initialize dlist to empty

for (i=0; i<8; i++){

insert2end(&dlist, i);

}

printForward(dlist);

printBackward(dlist);

printf(“insert to FRONT\n”);

dlist = 0;                  // initialize dlist to empty

for (i=0; i<8; i++){

insert2front(&dlist, i);

}

printForward(dlist);

printBackward(dlist);

printf(“test delete\n”);

while(1){

printf(“enter a key to delete:   “);

scanf(“%d”, &key);

if (key < 0) exit(0);       // exit if negative key

p = search(dlist, key); if (p==0){

printf(“key %d not found\n”, key);

continue;

}

delete(&dlist, p);

printForward(dlist);

printBackward(dlist);

}

}

Figure 2.24 show the outputs of running the Example program C2.5. First, it constructs a dlist by inserting new elements to the list end, and prints the resulting list in both forward and backward directions. Next, it constructs a dlist by inserting new elements to the list front, and prints the resulting list in both directions. Then, it shows how to search for an element with a given key. Lastly, it shows how to delete elements from the list, and print the list after each delete operation to verify the results.

Discussions of Example C2.5: The primary advantage of the first version of dlist is that it only requires a single NODE pointer to represent a dlist. The disadvantages of this kind of dlist are:

  • . It is hard to access the last element of the list. In order to access the last element, as in the cases of

inserting to the list end and print the list backward, we must use the next pointer to step to the last element first, which is time-consuming if the list has many elements.

  • . When inserting a new element, we must detect and handle the various cases of whether the list is

empty or the new element is the first or last element, etc. These make the code non-uniform, which does not fully realize the capability of doubly link lists.

  • . Similarly, when deleting an element, we must detect and handle the different cases also.

As can be seen, the complexity of the insertion/deletion code stems mainly from two cases:

. the list is empty, in which case we must insert the first element.

. the inserted/deleted node is the first or last node, in which case we must modify the list pointer.

Alternatively, a doubly link list may be regarded as two singly link lists merged into one. Accordingly, we may represent doubly link lists by a listhead structure containing two pointers, as in

typedef struct listhead{
   struct node *next; // head node pointer
   struct node *prev;  // tail node pointer
}DLIST;  // doubly link list type
DLIST dlist;  // dlsit is a structure
dlist.next = dlist.prev = 0; // initialize dlist as empty

Figure 2.25 show the second version of doubly link lists.

The only advantage of using a list head structure is that it allows direct access to the last element via the list’s prev pointer. The disadvantages are exactly the same as those in the first version of dlist. Specifically,

  • Since dlist is a structure, it must be passed by reference to all list operation functions.
  • In both the insertion and deletion functions, we must detect and handle the various cases of whether the list is empty, the element is the first or last element, etc. In these cases, doubly link lists are no better than singly link list in terms of processing time.

In order to realize the capability of doubly link lists, we need a better way to represent doubly link lists. In the next example, we shall define doubly lists as follows. A dlist is represented by a listhead, which is a NODE structure but a dlist is initialized with both pointers to the listhead, as in

NODE dlist;                          // dlist is a NODE structure

dlist.next = dlist.prev = &dlist;   // initialize both pointers to the NODE structure

Such a dlist may be regarded as two circular link lists merged into one, with the listhead as the initial dummy element. Figure 2.26 shows such a doubly link list.

Since there are no null pointers in the Version 3 dlist, every node can be treated as an interior node, which greatly simplifies the list operation code. We demonstrate this kind of dlist by an example.

Example Program C2.6: This example program assumes a dlist is represented by a NODE structure, which is initialized with both pointers to the NODE structure itself.

/************** C2.6.c Program Code *************/

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

typedef struct node{

struct node *next;

struct node *prev;

int key;

}NODE;

NODE dlist; // dlist is NODE struct using only next & prev pointers

int insert2end(NODE *list, int key)

{

NODE *p,  *q;

//printf(“insert2end: key=%d\n”, key);

p = (NODE *)malloc(sizeof(NODE));

p->key = key; p->prev = 0;

q = list->prev;      // to LAST element

p->next = q->next;

q->next->prev = p;

q->next = p;

p->prev = q;

}

int insert2front(NODE *list, int key)

{

NODE *p,  *q;

//printf(“insertFront key=%d\n”, key);

p = (NODE *)malloc(sizeof(NODE));

p->key = key; p->prev = 0;

q = list->next;      // to first element

p->prev = q->prev;

q->prev->next = p;

q->prev = p;

p->next = q;

}

void printForward(NODE *list)

{

NODE *p = list->next; // use dlist’s next pointer printf(“list forward =”);

while(p != list){         // detect end of list

printf(“[%d]->”, p->key); p = p->next;

}

printf(“NULL\n”);

void printBackward(NODE *list)

{

printf(“list backward=”);

NODE *p = list->prev; // use dlist’s prev pointer while(p != list){     // detect end of list

printf(“[%d]->”, p->key);

p = p->prev;

}

printf(“NULL\n”);

}

NODE *search(NODE *list, int key)

{

NODE *p = list->next;

while(p != list){         // detect end of list

if (p->key==key){

printf(“found %d at %x\n”, key, (unsigned int)p);

return p;

}

p = p->next;

}

return 0;

}

int delete(NODE *list, NODE *p)

{

p->prev->next = p->next;

p->next->prev = p->prev;

free(p);

}

int main()

{

int i, key;

NODE *p;

printf(“dlist program #3\n”);

printf(“insert to END\n”);

dlist.next = dlist.prev = &dlist; // empty dlist

for (i=0; i<8; i++){ insert2end(&dlist, i);

}

printForward(&dlist);

printBackward(&dlist);

printf(“insert to front\n”);

dlist.next = dlist.prev = &dlist; // empty dlist to begin

for (i=0; i<8; i++){

insert2front(&dlist, i);

printForward(&dlist);

printBackward(&dlist);

printf(“do deletion\n”);

while(1){

printf(“enter key to delete:   “);

scanf(“%d”, &key);

if (key < 0) exit(0); // exit if key negative p = search(&dlist, key);

if (p==0){

printf(“key %d not found\n”, key); continue;

}

delete(&dlist, p);

printForward(&dlist);

printBackward(&dlist);

}

}

Figure 2.27 shows the outputs of running the example program C2.6. Although the outputs are identical to those of Fig. 2.24, their processing codes are very different.

The primary advantages of the Example program C2.6 are as follows. Since every node can be treated as an interior node, it is no longer necessary to detect and handle the various cases of empty list, first and last elements, etc. As a result, both insertion and deletion operations are greatly simplified. The only modification needed is to detect the end of list condition during search or list traversal by the code segment

NODE *p = list.next;

while (p != &list){

p = p->next;

}

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *