Things That Happen Around Function Calls

Registers: call-clobbered v.s. call-preserved

  1. Call-clobbered registers
    • a.k.a. caller-saved or volatile registers; “scratch register/temporary register” in AAPCS32, “scratch” registers in x86
    • Registers whose values would be changed (or “clobbered”) across a function call, and so the caller must save their values beforehand if it wants to use them after the call. The callee could use these registers without hesitation or preservation.
  2. Call-preserved registers
    • a.k.a. callee-saved or non-volatile registers; “preserved” registers in x86
    • Registers whose values are promised (by the compiler or calling convention) to be intact across a function call. The caller doesn’t need to do anything about it, but the callee must restore their original values afterwards if it uses these registers.

registers

For instances:

Arch Call-clobbered Call-preserved
ARM64 x0-x17 x18-x30
x86_64 rdi, rsi, r{a,c,d}x, r8-r11 rbx, rsp, rbp, r12-15

References:

Stack frame organization

ARM64

For ARM64, a frame record is a segment that contains an fp (x29) value at lower bytes and an lr (x30) value at higher bytes on a function’s stack frame. Meanwhile, the fp register always contains the address of the latest frame record on stack; more specifically, pointing to the fp part.

Leaf functions don’t need to save frame record on stack, while non-leaf functions need (will be explained in the next paragraph). Frame record is pushed to/popped from stack in function’s prolog/epilog. In prolog, right after a new frame record is pushed to stack, fp will be overridden to point to the new frame record. In this way, as illustrated by the figure below, a “chain” is formed. Any function can always use this chain to retrieve its caller’s frame record, which is really helpful for tracing call path and/or dumping the stack in the debugging scenario.

registers

ARM64 uses bl{,r} to make function calls, which would automatically store the address of the instruction following bl{,r} to the lr register. That’s why a leaf-function doesn’t need to save its returning address via frame record: its return address is in lr already; and it doesn’t make any function call, so no one will overwrite lr. On the other hand, non-leaf function’s return address, storing in lr, would be overridden at subroutine invocation (i.e. when executing bl{,r}), so non-leaf functions need to save lr (in form of frame record).

References:

x86_64

Analogous to the fp register in ARM64, in x86_64 it’s the rbp register (commonly referred to as stack base pointer) that helps locating caller’s stack frame and forms the “chain” with those rbp values stored in upper stack frames:

registers

As the “base pointer” of each stack frame:

  • ARM64’s fp stays at and points to a lower address in the frame;
  • x86_64’s rbp stays at and points to a higher address in the frame.

These two architectures share as well a similar pattern of setting up and releasing stack frames.

registers

Explanation of labeled instructions for ARM64:

1[a.1]: Make space for the new frame and store frame record on it.
2       Note the `32` here: ARM64 requires the stack to be 16-byte aligned.
3[a.2]: Set the base of stack frame.
4[a.3]: Save input arguments.

For x86_64:

1[x.1]: Push the base of the old (i.e. caller's) frame to stack.
2[x.2]: Set `rbp` with the base of the new (i.e. callee's) frame.
3[x.3]: Make space for the new frame.
4[x.4]: Save input arguments.
5[x.5]: `leave` equals to `mov rsp, rbp; pop rbp`.

SLUB 的类三级缓存结构
内核代码中的编译时检查