Registers: call-clobbered v.s. call-preserved
- Call-clobbered registers
- a.k.a. caller-saved or volatile registers; “scratch register/temporary register” in AAPCS32, “scratch” registers in x86
- Registers whose values would be changed (or “clobbered”) across a function call, and so the caller must save their values beforehand if it wants to use them after the call. The callee could use these registers without hesitation or preservation.
- Call-preserved registers
- a.k.a. callee-saved or non-volatile registers; “preserved” registers in x86
- Registers whose values are promised (by the compiler or calling convention) to be intact across a function call. The caller doesn’t need to do anything about it, but the callee must restore their original values afterwards if it uses these registers.
For instances:
Arch | Call-clobbered | Call-preserved |
---|---|---|
ARM64 | x0-x17 |
x18-x30 |
x86_64 | rdi, rsi, r{a,c,d}x, r8-r11 |
rbx, rsp, rbp, r12-15 |
References:
-
These two answers: 1 and 2 of the same question on Stack Overflow
-
ARM[64]:
- Meaning of scratch register in ARM series
- Overview of ARM64 & ARM ABI conventions on Microsoft Learn
- ARM Architecture Procedure Call Standard (AAPCS)
-
x86[_64]:
- x86_64 NASM Assembly Quick Reference
- About System V ABI and calling conventions on OSDev.org
- About x86 and x86_64 and their calling conventions from Microsoft
Stack frame organization
ARM64
For ARM64, a frame record is a segment that contains an fp
(x29
) value at lower bytes and an lr
(x30
) value at higher bytes on a function’s stack frame. Meanwhile, the fp
register always contains the address of the latest frame record on stack; more specifically, pointing to the fp
part.
Leaf functions don’t need to save frame record on stack, while non-leaf functions need (will be explained in the next paragraph). Frame record is pushed to/popped from stack in function’s prolog/epilog. In prolog, right after a new frame record is pushed to stack, fp
will be overridden to point to the new frame record. In this way, as illustrated by the figure below, a “chain” is formed. Any function can always use this chain to retrieve its caller’s frame record, which is really helpful for tracing call path and/or dumping the stack in the debugging scenario.
ARM64 uses bl{,r}
to make function calls, which would automatically store the address of the instruction following bl{,r}
to the lr
register. That’s why a leaf-function doesn’t need to save its returning address via frame record: its return address is in lr
already; and it doesn’t make any function call, so no one will overwrite lr
. On the other hand, non-leaf function’s return address, storing in lr
, would be overridden at subroutine invocation (i.e. when executing bl{,r}
), so non-leaf functions need to save lr
(in form of frame record).
References:
- Check section 6.4.5, 6.4.6 and 6.5 of AAPCS64 .
- Related: Android’s and LLVM’s doc about Shadow Call Stack
x86_64
Analogous to the fp
register in ARM64, in x86_64 it’s the rbp
register (commonly referred to as stack base pointer) that helps locating caller’s stack frame and forms the “chain” with those rbp
values stored in upper stack frames:
As the “base pointer” of each stack frame:
- ARM64’s
fp
stays at and points to a lower address in the frame; - x86_64’s
rbp
stays at and points to a higher address in the frame.
These two architectures share as well a similar pattern of setting up and releasing stack frames.
Explanation of labeled instructions for ARM64:
1[a.1]: Make space for the new frame and store frame record on it.
2 Note the `32` here: ARM64 requires the stack to be 16-byte aligned.
3[a.2]: Set the base of stack frame.
4[a.3]: Save input arguments.
For x86_64:
1[x.1]: Push the base of the old (i.e. caller's) frame to stack.
2[x.2]: Set `rbp` with the base of the new (i.e. callee's) frame.
3[x.3]: Make space for the new frame.
4[x.4]: Save input arguments.
5[x.5]: `leave` equals to `mov rsp, rbp; pop rbp`.