Structures in C in Unix/Linux

A structure is a composite data type containing a collection of variables or data objects. Structure types in C are defined by the struct keyword. Assume that we need a node structure containing the following fields.

next : a pointer to the next node structure;

key : an integer;

name : an array of 64 chars;

Such a structure can be defined as

struct node{

struct node *next;

int key;

char name[64];

};

Then, “struct node” can be used as a derived type to define variables of that type, as in

struct node x, *nodePtr;

These define x as a node structure and nodePtr a node pointer. Alternatively, we may define “struct node” as a derived type by the typedef statement in C.

typedef struct node{

struct node *next;

int key;

char name[64];

}NODE;

Then, NODE is a derived type, which can be used to define variables of that type, as in

NODE x, *nodePtr;

The following summarizes the properties of C structures.

(1) When defining a C structure, every field of the structure must have a type already known to the compiler, except for the self-referencing pointers. This is because pointers are always the same size, e.g. 4 bytes in 32-bit architecture. As an example, in the above NODE type, the filed next is a

struct node *next;

which is correct because the compiler knows struct node is a type (despite being incomplete yet) and how many bytes to allocate for the next pointer. In contrast, the following statements

typedef struct node{

NODE *next;      // error

int key; char name[64];

} NODE;

would cause a compile-time error because the compiler does not know what is the NODE type yet, despite next is a pointer.

(2). Each C structure data object is allocated a piece of contiguous memory. The individual fields of a C structure are accessed by using the . operator, which identifies a specific field, as in

NODE x; // x is a structure of NODE type

Then the individual fields of x are accessed as

x.next; which is a pointer to another NODE type object.

x.key; which is an integer

x.name; which is an array of 64 chars

At run time, each field is accessed as an offset from the beginning address of the structure.

(3). The size of a structure can be determined by sizeof(struct type). The C compiler will calculate the size in total number of bytes of the structure. Due to memory alignment constraints, the C compiler may pad some of the fields of a structure with extra bytes. If needed, the user may define C structures with the PACKED attribute, which prevents the C compiler from padding the fields with extra bytes, as in

typedef struct node{

struct node *next;

int key;

char name[2];

}_ attribute_ ((packed, aligned(1))) NODE;

In this case, the size of the NODE structure will be 10 bytes. Without the packed attribute, it would be 12 bytes because the C compiler would pad the name field with 2 extra bytes, making every NODE object a multiple of 4 bytes for memory alignment.

(4). Assume that NODE x, y; are two structures of the same type. Rather than copying the individual fields of a structure, we can assign x to y by the C statement y = x. The compiler generated code uses the library function memncpy(&y, &x, sizeof(NODE)) to copy the entire structure.

(5). Unions in C is similar to structures. To define a union, simply replace the keyword struct with the keyword union, as in

Members in unions are accessed in exactly the same way as in structures. The major difference between structures and unions is that, whereas each member in a structure has a unique memory area, all members of a union share the same memory area, which is accessed by the attributes of the individual members. The size of a union is determined by the largest member. For example, in the union x the member name requires 32 bytes. All other members require only 4 bytes each. So the size of the union x is 32 bytes. The following C statements show how to access the individual members of a union.

x.ptr = 0x12345678;           // use first 4 bytes of x

x.ID = 12345;                 // use first 4 bytes of x also

strcpy(x.name, “1234567890”); // uses first 11 bytes of x

1. Structure and Pointers

In C, pointers are variables which point to other data objects, i.e. they contain the address of other data objects. In C programs, pointers are define with the * attribute, as in

TYPE *ptr;

which defines ptr as a pointer to a TYPE data object, where TYPE can be either a base type or a derived type, such as struct, in C. In C programming, structures are often accessed by pointers to the structures. As an example, assume that NODE is a structure type. The following C statements

NODE x, *p;

p = &x;

define x as a NODE type data object and p as a pointer to NODE objects. The statement p = &x; assigns the address of x to p, so that p points at the data object x. Then *p denotes the object x, and the members of x can be accessed as

(*p).name, (*p).value, (*p).next

Alternatively, the C language allows us to reference the members of x by using the “point at” operator ->, as in

p->name, p->value, p->next;

which are more convenient than the . operator. In fact, using the -> operator to access members of structures has become a standard practice in C programming.

2. Typecast in C

Typecast is a way to convert a variable from one data type to another data type by using the cast operator (TYPE) variable. Consider the following code segments.

char *cp, c = ‘a’;        // c is 1 byte

int  *ip, i = 0x12345678; // i is 4 bytes

(1). i = c;             // i = 0x00000061; lowest byte = c
(2). c = i;             // c = 0x78 (c = lowest byte of i)
(3). cp =
(char *)&i;   // typecast to suppress compiler warning
(4). ip =
(int *)&c;    // typecast to suppress compiler warning
(5). c = *(char *)ip;   // use ip as a char *
(6). i = *(int *)cp;    // use cp as an int *

Lines (1) and (2) do not need typecasting even though the assignments involve different data types. The resulting values of the assignments are shown in the comments. Lines (4) and (5) need typecasting in order to suppress compiler warnings. After the assignments of Lines (4) and (5), *cp is still a byte, which is the lowest byte of the (4-byte) integer i. *ip is an integer = 0x00000061, with the lowest byte = ‘c’ or 0x 61. Line (6) forces the compiler to use ip as a char *, so *(char *)ip dereferences to a single byte. Line (7) forces the compiler to use cp as an int *, so *(int *)cp dereferences to a 4-byte value, beginning from where cp points at.

Typecasting is especially useful with pointers, which allows the same pointer to point to data objects of different sizes. The following shows a more practical example of typecasting. In an Ext2/3 file system, the contents of a directory are dir_entries defined as

struct dir_entry{

int ino;  // inode number

int entry_len  // entry length in bytes

int name_len;  // name_len

char name[ ]  // name_len chars

};

The contents of a directory consist of a linear list of dir_entries of the form

| ino elen nlen NAME | ino elen nlen NAME |   . . .

in which the entries are of variable length due to the different name_len of the entries. Assume that char buf[ ] contains a list of dir_entries. The problem is how to traverse the dir_entries in buf[ ]. In order to step through the dir_entires sequentially, we define

struct dir_entry *dp = (struct dir_entry *)buf; // typecasting

char *cp = buf;                      // no need for typecasting

// Use dp to access the current dir_entry;

// advance dp to next dir_entry:

cp = += dp->entry_len;        // advance cp by entry_len

dp = (struct dir_entry *)cp; // pull dp to where cp points at

With proper typecasting, the last two lines of C code can be simplified as

dp = (struct dir_entry *)((char *)dp + dp->rlen);

which eliminates the need for a char *cp.

Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.

Leave a Reply

Your email address will not be published. Required fields are marked *