Next, we consider the run-time behavior of a.out during execution. The run-time behavior of a program stems mainly from function calls. The following discussions apply to running C programs on 32-bit Intel x86 processors. On these machines, the C compiler generated code passes parameters on the stack in function calls. During execution, it uses a special CPU register (ebp) to point at the stack frame of the current executing function.
1. Run-Time Stack Usage in 32-Bit GCC
Consider the following C program, which consists of a main() function shown on the left-hand side, which calls a sub() function shown on the right-hand side.
- When executing a.out, a process image is created in memory, which looks (logically) like the diagram shown in Fig. 7, where Data includes both initialized data and bss.
- Every CPU has the following registers or equivalent, where the entries in parentheses denote registers of the x86 CPU:
PC (IP): point to next instruction to be executed by the CPU.
SP (SP): point to top of stack.
FP (BP): point to the stack frame of current active function.
Return Value Register (AX): register for function return value.
- In every C program, main() is called by the C startup code crt0.o. When crt0.o calls main(), it pushes the return address (the current PC register) onto stack and replaces PC with the entry address of main(), causing the CPU to enter main(). For convenience, we shall show the stack contents from left to right. When control enters main(), the stack contains the saved return PC on top, as shown in Fig. 8, in which XXX denotes the stack contents before crt0.o calls main(), and SP points to the saved return PC from where crt0.o calls main().
- Upon entry, the compiled code of every C function does the following:
. push FP onto stack # this saves the CPU’s FP register on stack.
. let FP point at the saved FP # establish stack frame
. shift SP downward to allocate space for automatic local variables on stack
. the compiled code may shift SP farther down to allocate some temporary working space on the stack, denoted by temps.
For this example, there are 3 automatic local variables, int a, b, c, each of sizeof(int) = 4 bytes. After entering main(), the stack contents becomes as shown in Fig. 2.9, in which the spaces of a, b, c are allocated but their contents are yet undefined.
- Then the CPU starts to execute the code a=1; b=2; c=3; which put the values 1, 2, 3 into the memory locations of a, b, c, respectively. Assume that sizeof(int) is 4 bytes. The locations of a, b, c are at -4, -8, -12 bytes from where FP points at. These are expressed as -4(FP), -8(FP), -12(FP) in assembly code, where FP is the stack frame pointer. For example, in 32-bit Linux the assembly code for b=2 in C is
movl $2, -8(%ebp) # b=2 in C
where $2 means the value of 2 and %ebp is the ebp register.
- main() calls sub() by c = sub(a, b); The compiled code of the function call consists of . Push parameters in reverse order, i.e. push values of b=2 and a=1 into stack.
. Call sub, which pushes the current PC onto stack and replaces PC with the entry address of sub, causing the CPU to enter sub().
When control first enters sub(), the stack contains a return address at the top, preceded by the parameters, a, b, of the caller, as shown in Fig. 2.10.
- Since sub() is written in C, it actions are exactly the same as that of main(), i.e. it . Push FP and let FP point at the saved FP;
. Shift SP downward to allocate space for local variables u, v.
. The compiled code may shift SP farther down for some temp space on stack.
The stack contents becomes as shown in Fig. 2.11.
2. Stack Frames
While execution is inside a function, such as sub(), it can only access global variables, parameters passed in by the caller and local variables, but nothing else. Global and static local variables are in the combined Data section, which can be referenced by a fixed base register. Parameters and automatic locals have different copies on each invocation of the function. So the problem is: how to reference parameters and automatic locals? For this example, the parameters a, b, which correspond to the arguments x, y, are at 8(FP) and 12(FP). Similarly, the automatic local variables u, v are at -4(FP) and -8(FP). The stack area visible to a function, i.e. parameters and automatic locals, is called the Stack Frame of a function, like a frame of movie to a person. Thus, FP is called the Stack Frame Pointer. To a function, the stack frame looks like the following (Fig. 2.12).
From the above discussions, the reader should be able to deduce what would happen if we have a sequence of function calls, e.g.
crt0.o –> main() –> A(par_a) –> B(par_b) –> C(par_c)
For each function call, the stack would grow (toward low address) one more frame for the called function. The frame at the stack top is the stack frame of the current executing function, which is pointed by the CPU’s frame pointer. The saved FP points (backward) to the frame of its caller, whose saved FP points back at the caller’s caller, etc. Thus, the function call sequence is maintained in the stack as a link list, as shown in Fig. 2.13.
By convention, the CPU’s FP = 0 when crt0.o is entered from the OS kernel. So the stack frame link list ends with a 0. When a function returns, its stack frame is deallocated and the stack shrinks back.
3. Return From Function Call
When sub() executes the C statement return x+y+u+v, it evaluates the expression and puts the resulting value in the return value register (AX). Then it deallocates the local variables by
.copy FP into SP; # SP now points to the saved FP in stack.
.pop stack into FP; # this restores FP, which now points to the caller’s stack frame,
# leaving the return PC on the stack top.
(On the x86 CPU, the above operations are equivalent to the leave instruction).
.Then, it executes the RET instruction, which pops the stack top into PC register, causing the CPU to execute from the saved return address of the caller.
- Upon return, the caller function catches the return value in the return register (AX). Then it cleans the parameters a, b, from the stack (by adding 8 to SP). This restores the stack to the original situation before the function call. Then it continues to execute the next instruction.
It is noted that some compilers, e.g. GCC Version 4, allocate automatic local variables in increasing address order. For instance, int a, b; implies (address of a) < (address of b). With this kind of allocation scheme, the stack contents may look like the following (Fig. 2.14).
In this case, automatic local variables are also allocated in “reverse order”, which makes them consistent with the parameter order, but the concept and usage of stack frames remain the same.
4. Long Jump
In a sequence of function calls, such as
main() –> A() –> B()–>C();
when a called function finishes, it normally returns to the calling function, e.g. C() returns to B(), which returns to A(), etc. It is also possible to return directly to an earlier function in the calling sequence by a long jump. The following program demonstrates long jump in Unix/Linux.
/** longjump.c file: demonstrate long jump in Linux **/
#include <stdio.h>
#include <setjmp.h>
jmp_buf env; // for saving longjmp environment
int main()
{
int r, a=100;
printf(“call setjmp to save environment\n”);
if ((r=setjmp(env)) == 0){
A();
printf(“normal return\n”);
}
else
printf(“back to main() via long jump, r=%d a=%d\n”, r, a);
}
int A()
{
printf(“enter A()\n”);
B();
printf(“exit A()\n”);
}
int B()
{
printf(“enter B()\n”);
printf(“long jump? (y|n) “);
if (getchar()==’y’)
longjmp(env, 1234);
printf(“exit B()\n”);
}
In the longjump program, setjmp() saves the current execution environment in a jmp_buf structure and returns 0. The program proceeds to call A(), which calls B(). While in the function B(), if the user chooses not to return by long jump, the functions will show the normal return sequence. If the user chooses to return by longjmp(env, value), execution will return to the last saved environment with a nonzero value. In this case, it causes B() to return to main() directly, bypassing A(). The principle of long jump is very simple. When a function finishes, it returns by the (callerPC, callerFP) in the current stack frame, as shown in Fig. 2.15.
If we replace (callerPC, callerFP) with (savedPC, savedFP) of an earlier function in the calling sequence, execution would return to that function directly. In addition to the (savedPC, savedFP), setjmp() may also save CPU’s general registers and the original SP, so that longjmp() can restore the complete environment of the returned function. Long jump can be used to abort a function in a calling sequence, causing execution to resume from a known environment saved earlier. Although rarely used in user mode programs, it is a common technique in systems programming. For example, it may be used in a signal catcher to bypass a user mode function that caused an exception or trap error. We shall demonstrate this technique later in Chap. 6 on signals and signal processing.
5. Run-Time Stack Usage in 64-Bit GCC
In 64-bit mode, the CPU registers are expanded to rax, rbx, rcx, rdx, rbp, rsp, rsi, rdi, r8 to r15, all 64-bit wide. The function call convention differs slightly from 32-bit mode. When calling a function, the first 6 parameters are passed in rdi, rsi, rdx, rcx, r8, r9, in that order. Any extra parameters are passed through the stack as they are in 32-bit mode. Upon entry, a called function first establishes the stack frame (using rbp) as usual. Then it may shift the stack pointer (rsp) downward for local variables and working spaces on the stack. The GCC compiler generated code may keep the stack pointer fixed, with a default reserved Red Zone stack area of 128 bytes, while execution is inside a function, making it possible to access stack contents by using rsp as the base register. However, the GCC compiler generated code still uses the stack frame pointer rbp to access both parameters and locals. We illustrate the function call convention in 64-bit mode by an example.
Example: Function Call Convention in 64-Bit Mode
- The following t.c file contains a main() function in C, which defines 9 local int (32-bit) variables, a to i. It calls a sub() function with 8 int parameters.
/********* t.c file ********/
#include <stdio.h>
int sub(int a, int b, int c, int d, int e, int f, int g, int h)
{
int u, v, w;
u = 9;
v = 10;
w = 11;
return a+g+u+v; // use first and extra parameter, locals
}
int main()
{
int a, b, c, d, e, f, g, h, i; a = 1; b = 2;
c = 3;
d = 4;
e = 5;
f = 6;
g = 7;
h = 8;
i = sub(a,b,c,d,e,f,g,h);
}
- Under 64-bit Linux, compile t.c to generate a t.s file in 64-bit assembly by
gcc -S t.c # generate t.s file
Then edit the t.s file to delete the nonessential lines generated by the compiler and add comments to explain the code actions. The following shows the simplified t.s file with added comment lines.
#———— t.s file generated by 64-bit GCC compiler ————–
.globl sub
sub: # int sub(int a,b,c,d,e,f,g,h)
# first 6 parameters a, b, c, d, e, f are in registers
# rdi,rsi,rdx,rcx,r8d,r9d
# 2 extra parameters g,h are on stack.
# establish stack frame
pushq %rbp
movq %rsp, %rbp
# no need to shift rsp down because each function has a 128 bytes
# reserved stack area.
# rsp will be shifted down if function define more locals
# save first 6 parameters in registers on stack
movl %edi, -20(%rbp) # a
movl %esi, -24(%rbp) # b
movl %edx, -28(%rbp) # C
movl %ecx, -32(%rbp) # d
movl %r8d, -36(%rbp) # e
movl %r9d, -40(%rbp) # f
# access locals u, v, w at rbp -4 to -12
movl $9, -4(%rbp)
movl $10, -8(%rbp)
movl $11, -12(%rbp)
# compute x + g + u + v:
movl -20(%rbp), %edx # saved a on stack
movl 16(%rbp), %eax # g at 16(rbp)
addl %eax, %edx
movl -4(%rbp), %eax # u at -4(rbp)
addl %eax, %edx
movl -8(%rbp), %eax # v at -8(rbp)
addl %edx, %eax
# did not shift rsp down, so just popQ to restore rbp
popq %rbp
ret
#====== main function code in assembly ======
.globl main
main:
# establish stack frame
pushq %rbp
movq %rsp, %rbp
# shit rsp down 48 bytes for locals
subq $48, %rsp
# locals are at rbp -4 to -32
# call sub(a,b,c,d,e,f,g,h): first 6 parameters in registers
# push 2 extra parameters h,g on stack
Source: Wang K.C. (2018), Systems Programming in Unix/Linux, Springer; 1st ed. 2018 edition.