System V Application Binary Interface: AMD64 Architecture Processor Supplement (With LP64 and ILP32 Programming Models)
System V Application Binary Interface: AMD64 Architecture Processor Supplement (With LP64 and ILP32 Programming Models)
Edited by
H.J. Lu , Michael Matz2 , Milind Girkar3 , Jan Hubička4 ,
1
1 [email protected]
2 [email protected]
3 [email protected]
4 [email protected]
5 [email protected]
6 [email protected]
1 Introduction 11
2 Software Installation 12
1
AMD64 ABI 1.0 – October 16, 2023 – 13:28
3.5.7 Variable Argument Lists . . . . . . . . . . . . . . . . . . . . . . 55
3.6 DWARF Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.1 DWARF Release Number . . . . . . . . . . . . . . . . . . . . . 60
3.6.2 DWARF Register Number Mapping . . . . . . . . . . . . . . . . 60
3.7 Stack Unwind Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4 Object Files 63
4.1 ELF Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.1 Machine Information . . . . . . . . . . . . . . . . . . . . . . . . 63
4.1.2 Number of Program Headers . . . . . . . . . . . . . . . . . . . . 64
4.2 Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.1 Section Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.2.2 Section types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.3 Special Sections . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2.4 EH_FRAME sections . . . . . . . . . . . . . . . . . . . . . . . 66
4.3 Symbol Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.4 Relocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 Relocation Types . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.2 Large Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6 Libraries 86
6.1 C Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.1 Global Data Symbols . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.2 Floating Point Environment Functions . . . . . . . . . . . . . . . 86
6.2 Unwind Library Interface . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.2.1 Exception Handler Framework . . . . . . . . . . . . . . . . . . . 87
6.2.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.3 Throwing an Exception . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.4 Exception Object Management . . . . . . . . . . . . . . . . . . . 94
6.2.5 Context Management . . . . . . . . . . . . . . . . . . . . . . . . 94
6.2.6 Personality Routine . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 Unwinding Through Assembler Code . . . . . . . . . . . . . . . . . . . 100
2
AMD64 ABI 1.0 – October 16, 2023 – 13:28
7 Development Environment 103
9 Conventions 105
9.1 C++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.2 Fortran . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.2.1 Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.2.2 Representation of Fortran Types . . . . . . . . . . . . . . . . . . 108
9.2.3 Argument Passing . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.2.4 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
9.2.5 COMMON blocks . . . . . . . . . . . . . . . . . . . . . . . . . 111
9.2.6 Intrinsics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
3
AMD64 ABI 1.0 – October 16, 2023 – 13:28
B Linker Optimization 138
B.1 Combine GOTPLT and GOT Slots . . . . . . . . . . . . . . . . . . . . . 138
B.2 Optimize GOTPCRELX Relocations . . . . . . . . . . . . . . . . . . . . 139
4
AMD64 ABI 1.0 – October 16, 2023 – 13:28
List of Tables
5
AMD64 ABI 1.0 – October 16, 2023 – 13:28
10.2 General Dynamic Model Code Sequence with TLSDESC . . . . . . . . . 123
10.3 Initial Exec Model Code Sequence . . . . . . . . . . . . . . . . . . . . . 123
10.4 Initial Exec Model Code Sequence, II . . . . . . . . . . . . . . . . . . . 124
10.5 Local Dynamic Model Code Sequence With Lea . . . . . . . . . . . . . 124
10.6 Local Dynamic Model Code Sequence With Add . . . . . . . . . . . . . 124
10.7 General Dynamic Model Code Sequence with TLSDESC . . . . . . . . . 125
10.8 Local Dynamic Model Code Sequence, II . . . . . . . . . . . . . . . . . 125
10.9 Local Exec Model Code Sequence With Lea . . . . . . . . . . . . . . . . 125
10.10Local Exec Model Code Sequence With Add . . . . . . . . . . . . . . . 125
10.11Local Exec Model Code Sequence, II . . . . . . . . . . . . . . . . . . . 126
10.12Local Exec Model Code Sequence, III . . . . . . . . . . . . . . . . . . . 126
10.13GD -> IE Code Transition . . . . . . . . . . . . . . . . . . . . . . . . . 126
10.14GDesc -> IE Code Transition . . . . . . . . . . . . . . . . . . . . . . . . 126
10.15GD -> LE Code Transition . . . . . . . . . . . . . . . . . . . . . . . . . 127
10.16GDesc -> LE Code Transition . . . . . . . . . . . . . . . . . . . . . . . 127
10.17IE -> LE Code Transition With Lea . . . . . . . . . . . . . . . . . . . . 127
10.18IE -> LE Code Transition With Add . . . . . . . . . . . . . . . . . . . . 127
10.19IE -> LE Code Transition, II . . . . . . . . . . . . . . . . . . . . . . . . 128
10.20LD -> LE Code Transition With Lea . . . . . . . . . . . . . . . . . . . . 128
10.21LD -> LE Code Transition With Add . . . . . . . . . . . . . . . . . . . . 128
10.22LD -> LE Code Transition, II . . . . . . . . . . . . . . . . . . . . . . . . 128
10.23Indirect Branch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
6
AMD64 ABI 1.0 – October 16, 2023 – 13:28
List of Figures
7
AMD64 ABI 1.0 – October 16, 2023 – 13:28
3.30 Position-Independent Switch Code . . . . . . . . . . . . . . . . . . . . . 55
3.31 Parameter Passing Example with Variable-Argument List . . . . . . . . . 56
3.32 Register Allocation Example for Variable-Argument List . . . . . . . . . 56
3.33 Register Save Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.34 va_list Type Declaration . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.35 Sample Implementation of va_arg(l, int) . . . . . . . . . . . . . . . . 59
3.36 DWARF Register Number Mapping . . . . . . . . . . . . . . . . . . . . 61
3.37 Pointer Encoding Specification Byte . . . . . . . . . . . . . . . . . . . . 62
11.1 Function Call without PLT (Small and Medium Models) . . . . . . . . . 130
11.2 Function Address without PLT (Small and Medium Models) . . . . . . . 131
11.3 __tls_get_addr Call . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Revision History
Intel Changes to this document made by Intel Corporation are copyright Intel Corpo-
ration and licensed under CC BY 4.0: https://creativecommons.org/
licenses/by/4.0/legalcode.
1.0 Document _Unwind_GetIPInfo. Update to match C++ ABI. Add chapter for alternate
code sequences For security.
0.99 Add description of TLS relocations (thanks to Alexandre Oliva) and mention the
decimal floating point and AVX types (thanks to H.J. Lu).
8
AMD64 ABI 1.0 – October 16, 2023 – 13:28
0.98 Various clarifications and fixes according to feedback from Sun, thanks to Terrence
Miller. DWARF register numbers for some system registers, thanks to Jan Beulich.
Add R_X86_64_SIZE32 and R_X86_64_SIZE64 relocations; extend meaning of e_phnum
to handle more than 0xffff program headers, thanks to Rod Evans. Add footnote
about passing of decimal datatypes. Specify that _Bool is booleanized at the caller.
0.95 Include description of the medium PIC memory model (thanks to Jan Hubička) and
large model (thanks to Evandro Menezes).
0.93 Add sections about program headers, new section types and special sections for un-
winding information. Thanks to Michael Walker.
0.92 Fix some typos (thanks to Bryan Ford), add section about stack layout in the Linux
kernel. Fix example in figure 3.5 (thanks to Tom Horsley). Add section on unwind-
ing through assembler (written by Michal Ludvig). Remove mmxext feature (thanks
to Evandro Menezes). Add section on Fortran (by Steven Bosscher) and stack un-
winding (by Jan Hubička).
0.91 Clarify that x87 is default mode, not MMX (by Hans Peter Anvin).
0.90 Change DWARF register numbers again; mention that __m128 needs alignment; fix
typo in figure 3.3; add some comments on kernel expectations; mention TLS ex-
tensions; add example for passing of variable-argument lists; change semantics of
%rax in variable-argument lists; improve formatting; mention that X87 class is not
used for passing; make /lib64 a Linux specific section; rename x86-64 to AMD64;
describe passing of complex types. Special thanks to Andi Kleen, Michal Ludvig,
Michael Matz, David O’Brien and Eric Young for their comments.
0.21 Define __int128 as class INTEGER in register passing. Mention that %al is used for
variadic argument lists. Fix some textual problems. Thanks to H. Peter Anvin, Bo
Thorsen, and Michael Matz.
9
AMD64 ABI 1.0 – October 16, 2023 – 13:28
0.20 — 2002-07-11 Change DWARF register number values of %rbx, %rsi, %rsi (thanks
to Michal Ludvig). Fix footnotes for fundamental types (thanks to H. Peter Anvin).
Specify size_t (thanks to Bo Thorsen and Andreas Schwab). Add new section on
floating point environment functions.
0.19 — 2002-03-27 Set name of Linux dynamic linker, mention %fs. Incorporate changes
from H. Peter Anvin <[email protected]> for booleans and define handling of sub-64-
bit integer types in registers.
10
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 1
Introduction
The AMD641 architecture2 is an extension of the x86 architecture. Any processor imple-
menting the AMD64 architecture specification will also provide compatibility modes for
previous descendants of the Intel 8086 architecture, including 32-bit processors such as
the Intel 386, Intel Pentium, and AMD K6-2 processor. Operating systems conforming to
the AMD64 ABI may provide support for executing programs that are designed to execute
in these compatibility modes. The AMD64 ABI does not apply to such programs; this
document applies only to programs running in the “long” mode provided by the AMD64
architecture.
Binaries using the AMD64 instruction set may program to either a 32-bit model, in
which the C data types int, long and all pointer types are 32-bit objects (ILP32); or to a
64-bit model, in which the C int type is 32-bits but the C long type and all pointer types
are 64-bit objects (LP64). This specification covers both LP64 and ILP32 programming
models.
Except where otherwise noted, the AMD64 architecture ABI follows the conventions
described in the Intel386 ABI. Rather than replicate the entire contents of the Intel386
ABI, the AMD64 ABI indicates only those places where changes have been made to the
Intel386 ABI.
This specification is mostly written using terms and concepts of the C programming
language, and so provides only an ABI for C. However, it is assumed that many program-
ming languages will wish to link with code written in C, so that the ABI specifications
documented here apply there too.3
1
AMD64 has been previously called x86-64. The latter name is used in a number of places out of
historical reasons instead of AMD64.
2
The architecture specification is available on the web at http://www.x86-64.org/
documentation.
3
See e.g. section 9.1 for details on C++ ABI and section 9.2 for Fortran.
11
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 2
Software Installation
This document does not specify how software must be installed on an AMD64 architecture
machine.
12
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 3
13
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 3.1: Micro-Architecture Levels
14
AMD64 ABI 1.0 – October 16, 2023 – 13:28
shared object with the -march=x86-64-v3 GCC flag. The resulting shared object
needs to be installed into the directory /usr/lib64/glibc-hwcaps/x86-64-v3
or /usr/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v3 (in case of dis-
tributions with a multi-arch file system layout). In order to support systems that only
implement the K8 baseline, a fallback implementation must be installed into the default
locations, /usr/lib64 or /usr/lib/x86_64-linux-gnu. It has to be built with
-march=x86-64 (the upstream GCC default). If this guideline is not followed, loading
the library will fail on systems that do not support the level for which the optimized shared
object was built.
Shared objects that are installed under the matching glibc-hwcaps subdirectory
can use the CPU features for this level and earlier levels without further detection logic.
Run-time detection for other CPU features not listed in this section, or listed only under
later levels, is still required (even if all current CPUs implement certain CPU features
together).
If a distribution requires support for a certain level, they build everything with the
appropriate -march= option and install the built binaries in the default file system loca-
tions. When targeting such distributions, programmers can build their binaries with the
same -march= option and install them into the default locations. Optimized shared ob-
jects for later levels can still be installed into subdirectories with the appropriate name.
Fundamental Types
Figure 3.1 shows the correspondence between ISO C’s scalar types and the processor’s.
__int128, _Float16, __float80, __float128, __m64, __m128, __m256 and __m512 types are
optional.
The __float128 type uses a 15-bit exponent, a 113-bit mantissa (the high order signif-
icant bit is implicit) and an exponent bias of 16383.2
1
The Intel386 ABI uses the term halfword for a 16-bit object, the term word for a 32-bit object, the term
doubleword for a 64-bit object. But most IA-32 processor specific documentation define a word as a 16-bit
object, a doubleword as a 32-bit object, a quadword as a 64-bit object and a double quadword as a 128-bit
object.
2
Initial implementations of the AMD64 architecture are expected to support operations on the
__float128 type only via software emulation.
15
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.1: Scalar Types
Alignment AMD64
Type C sizeof (bytes) Architecture
_Bool† 1 1 boolean
char 1 1 signed byte
signed char
unsigned char 1 1 unsigned byte
signed short 2 2 signed twobyte
unsigned short 2 2 unsigned twobyte
Integral signed int 4 4 signed fourbyte
enum†††
unsigned int 4 4 unsigned fourbyte
signed long (LP64) 8 8 signed eightbyte
unsigned long (LP64) 8 8 unsigned eightbyte
signed long (ILP32) 4 4 signed fourbyte
unsigned long (ILP32) 4 4 unsigned fourbyte
signed long long 8 8†††† signed eightbyte
unsigned long long 8 8†††† unsigned eightbyte
__int128†† 16 16 signed sixteenbyte
signed __int128†† 16 16 signed sixteenbyte
unsigned __int128†† 16 16 unsigned sixteenbyte
Pointer any-type * (LP64) 8 8 unsigned eightbyte
any-type (*)() (LP64)
any-type * (ILP32) 4 4 unsigned fourbyte
any-type (*)() (ILP32)
_Float16†††††† 2 2 16-bit (IEEE-754)
float 4 4 single (IEEE-754)
Floating- double 8 8†††† double (IEEE-754)
point __float80†† 16 16 80-bit extended (IEEE-754)
long double††††† 16 16 80-bit extended (IEEE-754)
__float128†† 16 16 128-bit extended (IEEE-754)
long double††††† 16 16 128-bit extended (IEEE-754)
Decimal- _Decimal32 4 4 32bit BID (IEEE-754R)
floating- _Decimal64 8 8 64bit BID (IEEE-754R)
point _Decimal128 16 16 128bit BID (IEEE-754R)
Packed __m64†† 8 8 MMX and 3DNow!
__m128†† 16 16 SSE and SSE-2
__m256†† 32 32 AVX
__m512†† 64 64 AVX-512
†
This type is called bool in C++.
††
These types are optional.
†††
C++ and some implementations of C permit enums larger than an int. The underlying type is bumped
to an unsigned int, long int or unsigned long int, in that order.
††††
The long long, signed long long, unsigned long long and double types have
4-byte alignment in the Intel386 ABI.
†††††
The long double type is 128-bit, the same as the __float128 type, on the AndroidTM
platform. More information on the AndroidTM platform is available from http://www.android.
com/.
††††††
The _Float16 type, from ISO/IEC TS 18661-3:2015, is optional.
16
AMD64 ABI 1.0 – October 16, 2023 – 13:28
The long double type uses a 15 bit exponent, a 64-bit mantissa with an explicit high
order significant bit and an exponent bias of 16383.3 Although a long double requires 16
bytes of storage, only the first 10 bytes are significant. The remaining six bytes are tail
padding, and the contents of these bytes are undefined.
The __int128 type is stored in little-endian order in memory, i.e., the 64 low-order bits
are stored at a a lower address than the 64 high-order bits.
The value of _Alignof(max_align_t) is 16.
A null pointer (for all types) has the value zero.
The type size_t is defined as unsigned long for LP64 and unsigned int for ILP32.
Booleans, when stored in a memory object, are stored as single byte objects the value
of which is always 0 (false) or 1 (true). When stored in integer registers (except for
passing as arguments), all 8 bytes of the register are significant; any nonzero value is
considered true.
Like the Intel386 architecture, the AMD64 architecture in general does not require all
data accesses to be properly aligned. Misaligned data accesses are slower than aligned
accesses but otherwise behave identically. The only exceptions are that __m128, __m256
and __m512 must always be aligned properly.
• _BitInt(N) types are stored in little-endian order in memory. Bits in each byte
are allocated from right to left.
• For N <= 64, they have the same size and alignment as the smallest of (signed and
unsigned) char, short, int, long and long long types that can contain them.
• For N > 64, they are treated as struct of 64-bit integer chunks. The number of
chunks is the smallest number that can contain the type. _BitInt(N) types are
byte-aligned to 64 bits. The size of these types is the smallest multiple of the 64-bit
chunks greater than or equal to N.
3
This type is the x87 double extended precision data type.
17
AMD64 ABI 1.0 – October 16, 2023 – 13:28
• The value of the unused bits beyond the width of the _BitInt(N) value but within
the size of the _BitInt(N) are unspecified when stored in memory or register.
This permits the use of these types in allocated arrays using the common
sizeof(Array)/sizeof(ElementType) pattern.
Special Types
The __bf164 type is an alternate encoding format for 16-bit float with 8-bit exponent and
7-bit mantissa. __bf16 represents Brain Floating Point Format5 which is a truncated (16-
bit) version of the 32-bit IEEE 754 single-precision floating-point format. It has the same
size, alignment, parameter passing and return rules as _Float16.
Bit-Fields
C struct and union definitions may include bit-fields that define integral values of a speci-
fied size.
The ABI does not permit bit-fields having the type __m64, __m128, __m256 or
__m512. Programs using bit-fields of these types are not portable.
Bit-fields that are neither signed nor unsigned always have non-negative values. Al-
though they may have type char, short, int, or long (which can have negative values), these
bit-fields have the same range as a bit-field of the same size with the corresponding un-
signed type. Bit-fields obey the same size and alignment rules as other structure and union
members.
4
This type is used in BF16 intrinsics.
5
This type is used to accelerate machine learning (deep learning training, in particular) algorithms.
6
The alignment requirement allows the use of SSE instructions when operating on the array. The com-
piler cannot in general calculate the size of a variable-length array (VLA), but it is expected that most VLAs
will require at least 16 bytes, so it is logical to mandate that VLAs have at least a 16-byte alignment.
18
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.2: Bit-Field Ranges
Also:
• bit-fields must be contained in a storage unit appropriate for its declared type
• bit-fields may share a storage unit with other struct / union members
19
AMD64 ABI 1.0 – October 16, 2023 – 13:28
The standard calling sequence requirements apply only to global functions. Local
functions that are not reachable from other compilation units may use different conven-
tions. Nevertheless, it is recommended that all functions use the standard calling sequence
when possible.
3.2.1 Registers
The AMD64 architecture provides 16 general purpose 64-bit registers. In addition the
architecture provides 16 SSE registers, each 128 bits wide and 8 x87 floating point reg-
isters, each 80 bits wide. Each of the x87 floating point registers may be referred to in
MMX/3DNow! mode as a 64-bit register. All of these registers are global to all procedures
active for a given thread.
Intel AVX (Advanced Vector Extensions) provides 16 256-bit wide AVX registers
(%ymm0 - %ymm15). The lower 128-bits of %ymm0 - %ymm15 are aliased to the respective 128b-bit
SSE registers (%xmm0 - %xmm15). Intel AVX-512 provides 32 512-bit wide SIMD registers
(%zmm0 - %zmm31). The lower 128-bits of %zmm0 - %zmm31 are aliased to the respective 128b-
bit SSE registers (%xmm0 - %xmm317 ). The lower 256-bits of %zmm0 - %zmm31 are aliased to the
respective 256-bit AVX registers (%ymm0 - %ymm318 ). For purposes of parameter passing and
function return, %xmmN, %ymmN and %zmmN refer to the same register. Only one of them can
be used at the same time. We use vector register to refer to either SSE, AVX or AVX-512
register. In addition, Intel AVX-512 also provides 8 vector mask registers (%k0 - %k7), each
64-bit wide.
Intel Advanced Matrix Extensions (Intel AMX) is a programming paradigm consisting
of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a
larger 2-dimensional memory image, and accelerators able to operate on tiles. Capability
of Intel AMX implementation is enumerated by palettes. Two palettes are supported:
palette 0 represents the initialized state and palette 1 consists of 8 tile registers (%tmm0 -
%tmm7) of up to 1 KB size, which is controlled by a tile control register.
Intel APX (Advanced Performance Extensions) provides 16 general purpose 64-bit
registers (%r16 - %r31).
This subsection discusses usage of each register. Registers %rbp, %rbx and %r12 through
%r15 “belong” to the calling function and the called function is required to preserve their
values. In other words, a called function must preserve these registers’ values for its
caller. Remaining registers “belong” to the called function.9 If a calling function wants
to preserve such a register value across a function call, it must save the value in its local
7
%xmm16 - %xmm31 are only available with Intel AVX-512.
8
%ymm16 - %ymm31 are only available with Intel AVX-512.
9
Note that in contrast to the Intel386 ABI, %rdi, and %rsi belong to the called function, not the caller.
20
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.3: Stack Frame with Base Pointer
stack frame.
The CPU shall be in x87 mode upon entry to a function. Therefore, every function
that uses the MMX registers is required to issue an emms or femms instruction after using
MMX registers, before returning or calling another function. 10 The direction flag DF
in the %rFLAGS register must be clear (set to “forward” direction) on function entry and
return. Other user flags have no specified role in the standard calling sequence and are not
preserved across calls.
The control bits of the MXCSR register are callee-saved (preserved across calls), while
the status bits are caller-saved (not preserved). The x87 status word register is caller-saved,
whereas the x87 control word is callee-saved.
21
AMD64 ABI 1.0 – October 16, 2023 – 13:28
been transferred to the function entry point, i.e. immediately after the return address has
been pushed, %rsp points to the return address, and the value of (%rsp + 8) is a multiple of
16 (32 or 64). 12
The 128-byte area beyond the location pointed to by %rsp is considered to be reserved
and shall not be modified by signal or interrupt handlers.13 Therefore, functions may use
this area for temporary data that is not needed across function calls. In particular, leaf
functions may use this area for their entire stack frame, rather than adjusting the stack
pointer in the prologue and epilogue. This area is known as the red zone.
Definitions We first define a number of classes to classify arguments. The classes are
corresponding to AMD64 register classes and defined as:
INTEGER This class consists of integral types that fit into one of the general purpose
registers.
SSE The class consists of types that fit into a vector register.
SSEUP The class consists of types that fit into a vector register and can be passed and
returned in the upper bytes of it.
X87, X87UP These classes consists of types that will be returned via the x87 FPU.
COMPLEX_X87 This class consists of types that will be returned via the x87 FPU.
NO_CLASS This class is used as initializer in the algorithms. It will be used for padding
and empty structures and unions.
MEMORY This class consists of types that will be passed and returned in memory via
the stack.
12
The conventional use of %rbp as a frame pointer for the stack frame may be avoided by using %rsp
(the stack pointer) to index into the stack frame. This technique saves two instructions in the prologue and
epilogue and makes one additional general-purpose register (%rbp) available.
13
Locations within 128 bytes can be addressed using one-byte displacements.
22
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Classification The size of each argument gets rounded up to eightbytes.14
The basic types are assigned their natural classes:
• Arguments of types (signed and unsigned) _Bool, char, short, int, long, long long,
and pointers are in the INTEGER class.
• Arguments of types _Float16, float, double, _Decimal32, _Decimal64 and __m64 are
in class SSE.
• Arguments of types __float128, _Decimal128 and __m128 are split into two halves.
The least significant ones belong to class SSE, the most significant one to class
SSEUP.
• Arguments of type __m256 are split into four eightbyte chunks. The least significant
one belongs to class SSE and all the others to class SSEUP.
• Arguments of type __m512 are split into eight eightbyte chunks. The least significant
one belongs to class SSE and all the others to class SSEUP.
• The 64-bit mantissa of arguments of type long double belongs to class X87, the
16-bit exponent plus 6 bytes of padding belongs to class X87UP.
• Arguments of type __int128 offer the same operations as INTEGERs, yet they do
not fit into one general purpose register but require two registers. For classification
purposes __int128 is treated as if it were implemented as:
typedef struct {
long low, high;
} __int128;
with the exception that arguments of type __int128 that are stored in memory must
be aligned on a 16-byte boundary.
• Arguments of type _BitInt(N) with N > 64 are classified as if they were imple-
mented as struct of 64-bit integer fields.
14
Therefore the stack will always be eightbyte aligned.
23
AMD64 ABI 1.0 – October 16, 2023 – 13:28
struct complexT {
T real;
T imag;
};
1. If the size of an object is larger than eight eightbytes, or it contains unaligned fields,
it has class MEMORY 15 .
2. If a C++ object is non-trivial for the purpose of calls, as specified in the C++ ABI
16
, it is passed by invisible reference (the object is replaced in the parameter list by
a pointer that has class INTEGER) 17 .
3. If the size of the aggregate exceeds a single eightbyte, each is classified separately.
Each eightbyte gets initialized to class NO_CLASS.
4. Each field of an object is classified recursively so that always two fields 18 are con-
sidered. The resulting class is calculated according to the classes of the fields in the
eightbyte:
24
AMD64 ABI 1.0 – October 16, 2023 – 13:28
(e) If one of the classes is X87, X87UP, COMPLEX_X87 class, MEMORY is
used as class.
(f) Otherwise class SSE is used.
(a) If one of the classes is MEMORY, the whole argument is passed in memory.
(b) If X87UP is not preceded by X87, the whole argument is passed in memory.
(c) If the size of the aggregate exceeds two eightbytes and the first eightbyte isn’t
SSE or any other eightbyte isn’t SSEUP, the whole argument is passed in mem-
ory.
(d) If SSEUP is not preceded by SSE or SSEUP, it is converted to SSE.
Passing Once arguments are classified, the registers get assigned (in left-to-right order)
for passing as follows:
1. If the class is MEMORY, pass the argument on the stack at an address respecting the
arguments alignment (which might be more than its natural alignement).
2. If the class is INTEGER, the next available register of the sequence %rdi, %rsi, %rdx,
%rcx, %r8 and %r9 is used19 .
3. If the class is SSE, the next available vector register is used, the registers are taken
in the order from %xmm0 to %xmm7.
4. If the class is SSEUP, the eightbyte is passed in the next available eightbyte chunk
of the last used vector register.
When a value of type _Bool is returned or passed in a register or on the stack, bit 0
contains the truth value and bits 1 to 7 shall be zero20 .
19
Note that %r11 is neither required to be preserved, nor is it used to pass arguments. Making this register
available as scratch register means that code in the PLT need not spill any registers when computing the
address to which control needs to be transferred. %al is used to indicate the number of vector arguments
passed to a function requiring a variable number of arguments. %r10 is used for passing a function’s static
chain pointer.
20
Other bits are left unspecified, hence the consumer side of those values can rely on it being 0 or 1 when
truncated to 8 bit.
25
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.4: Register Usage
callee
Register Usage saved
%rax temporary register; with variable arguments passes information No
about the number of vector registers used; 1st return register
%rbx callee-saved register Yes
%rcx used to pass 4th integer argument to functions No
%rdx used to pass 3rd argument to functions; 2nd return register No
%rsp stack pointer Yes
%rbp callee-saved register; optionally used as frame pointer Yes
%rsi used to pass 2nd argument to functions No
%rdi used to pass 1st argument to functions No
%r8 used to pass 5th argument to functions No
%r9 used to pass 6th argument to functions No
%r10 temporary register, used for passing a function’s static chain No
pointer
%r11 temporary register No
%r12–%r14 callee-saved registers Yes
%r15 callee-saved register; optionally used as GOT base pointer Yes
%r16–%r31 temporary registers No
%xmm0–%xmm1 used to pass and return floating point arguments No
%xmm2–%xmm7 used to pass floating point arguments No
%xmm8–%xmm15 temporary registers No
%xmm16–%xmm31 temporary registers No
%tmm0–%tmm7 temporary registers No
%mm0–%mm7 temporary registers No
%k0–%k7 temporary registers No
%st0,%st1 temporary registers, used to return long double arguments No
%st2–%st7 temporary registers No
%fs thread pointer Yes
mxcsr SSE2 control and status word partial
x87 SW x87 status word No
x87 CW x87 control word Yes
tilecfg Tile control register No
26
AMD64 ABI 1.0 – October 16, 2023 – 13:28
If there are no registers available for any eightbyte of an argument, the whole argument
is passed on the stack. If registers have already been assigned for some eightbytes of such
an argument, the assignments get reverted.
Once registers are assigned, the arguments passed in memory are pushed on the stack
in reversed (right-to-left21 ) order.
For calls that may call functions that use varargs or stdargs (prototype-less calls or calls
to functions containing ellipsis (. . . ) in the declaration) %al22 is used as hidden argument
to specify the number of vector registers used. The contents of %al do not need to match
exactly the number of registers, but must be an upper bound on the number of vector
registers used and is in the range 0–8 inclusive.
When passing __m256 or __m512 arguments to functions that use varargs or stdarg,
function prototypes must be provided. Otherwise, the run-time behavior is undefined.
Returning of Values The returning of values is done according to the following algo-
rithm:
2. If the type has class MEMORY, then the caller provides space for the return value
and passes the address of this storage in %rdi as if it were the first argument to the
function. In effect, this address becomes a “hidden” first argument. This storage
must not overlap any data visible to the callee through other names than this argu-
ment.
On return %rax will contain the address that has been passed in by the caller in %rdi.
3. If the class is INTEGER, the next available register of the sequence %rax, %rdx is
used.
4. If the class is SSE, the next available vector register of the sequence %xmm0, %xmm1 is
used.
5. If the class is SSEUP, the eightbyte is returned in the next available eightbyte chunk
of the last used vector register.
21
Right-to-left order on the stack makes the handling of functions that take a variable number of arguments
simpler. The location of the first argument can always be computed statically, based on the type of that
argument. It would be difficult to compute the address of the first argument if the arguments were pushed in
left-to-right order.
22
Note that the rest of %rax is undefined, only the contents of %al is defined.
27
AMD64 ABI 1.0 – October 16, 2023 – 13:28
6. If the class is X87, the value is returned on the X87 stack in %st0 as 80-bit x87
number.
7. If the class is X87UP, the value is returned together with the previous X87 value in
%st0.
8. If the class is COMPLEX_X87, the real part of the value is returned in %st0 and the
imaginary part in %st1.
As an example of the register passing conventions, consider the declarations and the
function call shown in Figure 3.5. The corresponding register allocation is given in Fig-
ure 3.6, the stack frame offset given shows the frame before calling the function.
28
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.6: Register Allocation Example
29
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 3.2: Hardware Exceptions and Signals
Code Reason
FPE_FLTDIV floating-point divide by zero
FPE_FLTOVF floating-point overflow
FPE_FLTUND floating-point underflow
FPE_FLTRES floating-point inexact result
FPE_FLTINV invalid floating-point operation
30
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Processes begin with three logical segments, commonly called text, data, and stack.
Use of shared libraries add other segments and a process may dynamically create seg-
ments.
• A process whose size exceeds the system’s available combined physical memory
and secondary storage cannot run. Although some physical memory must be present
to run any process, the system can execute processes that are bigger than physical
memory, paging them to and from secondary storage. Nonetheless, both physical
memory and secondary storage are shared resources. System load, which can vary
from one program execution to the next, affects the available amount.
Programs that dereference null pointers are erroneous and a process should not expect
0x0 to be a valid address.
31
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Although applications may control their memory assignments, the typical arrangement
appears in figure 3.8.
...
常用的segment分配 0x80000000000 Dynamic segments
Stack segment
...
...
Data segments
...
0x400000 Text segments
0 Unmapped
32
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 3.4: x87 Floating-Point Control Word
The rFLAGS register contains the system flags, such as the direction flag and the carry
flag. The low 16 bits (FLAGS portion) of rFLAGS are accessible by application software.
The state of them at process initialization is shown in table 3.6.
33
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 3.6: rFLAGS Bits
Stack State
This section describes the machine state that exec (BA_OS) creates for new processes.
Various language implementations transform this initial program state to the state required
by the language standard.
For example, a C program begins executing at a function named main declared as:
where
When main() returns its value is passed to exit() and if that has been over-ridden and
returns, _exit() (which must be immune to user interposition).
The initial state of the process stack, i.e. when _start is called is shown in figure 3.9.
34
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.9: Initial Process Stack 进程初始化栈
Argument strings, environment strings, and the auxiliary information appear in no spe-
cific order within the information block and they need not be compactly allocated.
Only the registers listed below have specified values at process entry:
%rbp The content of this register is unspecified at process initialization time, but the user
code should mark the deepest stack frame by setting the frame pointer to zero.
%rsp The stack pointer holds the address of the byte with lowest address which is part of
the stack. It is guaranteed to be 16-byte aligned at process entry.
%rdx a function pointer that the application should register with atexit (BA_OS).
It is unspecified whether the data and stack segments are initially mapped with execute
permissions or not. Applications which need to execute code on the stack or data segments
should take proper precautions, e.g., by calling mprotect().
35
AMD64 ABI 1.0 – October 16, 2023 – 13:28
辅助向量:在计算机体系结构中,用于支持操作系统和应用程序的
一组数据结构。这些数据结构包括进程控制块、页表、文件描述符
表等。辅助向量提供了关于进程环境和状态的信息,例如命令行参
数、环境变量、打开文件的文件描述符等。在Linux系统中,辅助向
量通常由内核传递给用户空间程序,以便程序能够获取有关其执行
环境的信息。
typedef struct
{
int a_type;
union {
long a_val;
void *a_ptr;
void (*a_fnc)();
} a_un;
} auxv_t;
The AMD64 ABI uses the auxiliary vector types defined in figure 3.11.
36
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.11: Auxiliary Vector Types
AT_NULL The auxiliary vector has no fixed length; instead its last entry’s a_type mem-
ber has this value.
AT_IGNORE This type indicates the entry has no meaning. The corresponding value of
a_un is undefined.
AT_EXECFD At process creation the system may pass control to an interpreter program.
When this happens, the system places either an entry of type AT_EXECFD or one of
type AT_PHDR in the auxiliary vector. The entry for type AT_EXECFD uses the a_val
37
AMD64 ABI 1.0 – October 16, 2023 – 13:28
member to contain a file descriptor open to read the application program’s object
file.
AT_PHDR The system may create the memory image of the application program before
passing control to the interpreter program. When this happens, the a_ptr member of
the AT_PHDR entry tells the interpreter where to find the program header table in the
memory image.
AT_PHENT The a_val member of this entry holds the size, in bytes, of one entry in the
program header table to which the AT_PHDR entry points.
AT_PHNUM The a_val member of this entry holds the number of entries in the program
header table to which the AT_PHDR entry points.
AT_PAGESZ If present, this entry’s a_val member gives the system page size, in bytes.
AT_BASE The a_ptr member of this entry holds the base address at which the interpreter
program was loaded into memory. See “Program Header” in the System V ABI for
more information about the base address.
AT_FLAGS If present, the a_val member of this entry holds one-bit flags. Bits with
undefined semantics are set to zero.
AT_ENTRY The a_ptr member of this entry holds the entry point of the application
program to which the interpreter program should transfer control.
AT_NOTELF The a_val member of this entry is non-zero if the program is in another
format than ELF.
AT_UID The a_val member of this entry holds the real user id of the process.
AT_EUID The a_val member of this entry holds the effective user id of the process.
AT_GID The a_val member of this entry holds the real group id of the process.
AT_EGID The a_val member of this entry holds the effective group id of the process.
AT_PLATFORM The a_ptr member of this entry points to a string containing the plat-
form name.
AT_HWCAP The a_val member of this entry contains an bitmask of CPU features. It
mask to the value returned by CPUID 1.EDX.
38
AMD64 ABI 1.0 – October 16, 2023 – 13:28
AT_CLKTCK The a_val member of this entry contains the frequency at which times()
increments.
AT_SECURE The a_val member of this entry contains one if the program is in secure
mode (for example started with suid). Otherwise zero.
AT_BASE_PLATFORM The a_ptr member of this entry points to a string identifying
the base architecture platform (which may be different from the platform).
AT_RANDOM The a_ptr member of this entry points to 16 securely generated random
bytes.
AT_HWCAP2 The a_val member of this entry contains the extended hardware feature
mask. Currently it is 0, but may contain additional feature bits in the future.
AT_EXECFN The a_ptr member of this entry is a pointer to the file name of the executed
program.
39
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Small code model The virtual address of code executed is known at link time. Addition-
ally all symbols are known to be located in the virtual addresses in the range from 0
to 231 − 224 − 1 or from 0x00000000 to 0x7ef f f f f f 24 .
This allows the compiler to encode symbolic references with offsets in the range
from −(231 ) to 224 or from 0x80000000 to 0x01000000 directly in the sign ex-
tended immediate operands, with offsets in the range from 0 to 231 − 224 or from
0x00000000 to 0x7f 000000 in the zero extended immediate operands and use in-
struction pointer relative addressing for the symbols with offsets in the range −(224 )
to 224 or 0xf f 000000 to 0x01000000.
This is the fastest code model and we expect it to be suitable for the vast majority of
programs.
Kernel code model The kernel of an operating system is usually rather small but
runs in the negative half of the address space. So we define all symbols to be
in the range from 264 − 231 to 264 − 224 or from 0xf f f f f f f f 80000000 to
0xf f f f f f f f f f 000000.
This code model has advantages similar to those of the small model, but allows
encoding of zero extended symbolic references only for offsets from 231 to 231 + 224
or from 0x80000000 to 0x81000000. The range offsets for sign extended reference
changes to 0 to 231 + 224 or 0x00000000 to 0x81000000.
Medium code model The data sections are split into two parts — the regular data sec-
tions still limited in the same way as in the small code model and the large data
sections having no limits except for available addressing space. The program layout
must be set in a way so that large data sections (.ldata, .lrodata, .lbss) are not
between regular text and data sections.
This model requires the compiler to use movabs instructions to access large static
data and to load addresses into registers, but keeps the advantages of the small code
model for manipulation of addresses in the small data and text sections (specially
needed for branches).
By default only data larger than 65535 bytes will be placed in the large data section.
Large code model The large code model makes no assumptions about addresses and
sizes of sections.
24
The number 24 is chosen arbitrarily. It allows for all memory of objects of size up to 224 or 16M bytes
to be addressed directly because the base address of such objects is constrained to be less than 231 − 224
or 0x7f 000000. Without such constraint only the base address would be accessible directly, but not any
offsetted variant of it.
40
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Although not strictly necessary, the data sections can be split into normal and large
parts like in the medium model, to improve interoperability.
The compiler is required to use the movabs instruction, as in the medium code
model, even for dealing with addresses inside the text section. Additionally, indi-
rect branches are needed when branching to addresses whose offset from the current
instruction pointer is unknown.
It is possible to avoid the limitation on the text section in the small and medium
models by breaking up the program into multiple shared libraries, so this model is
strictly only required if the text of a single function becomes larger than what the
medium model allows.
Small position independent code model (PIC) Unlike the previous models, the virtual
addresses of instructions and data are not known until dynamic link time. So all
addresses have to be relative to the instruction pointer.
Additionally the maximum distance between a symbol and the end of an instruction
is limited to 231 − 224 − 1 or 0x7ef f f f f f , allowing the compiler to use instruction
pointer relative branches and addressing modes supported by the hardware for every
symbol with an offset in the range −(224 ) to 224 or 0xf f 000000 to 0x01000000.
Medium position independent code model (PIC) This model is like the previous
model, but similarly to the medium static model adds large data sections at the end
of object files.
In the medium PIC model, the instruction pointer relative addressing can not be used
directly for accessing large static data, since the offset can exceed the limitations on
the size of the displacement field in the instruction. Instead an unwind sequence
consisting of movabs, lea and add needs to be used.
Large position independent code model (PIC) This model is like the previous model,
but makes no assumptions about the distance of symbols.
The large PIC model implies the same limitation as the medium PIC model regarding
addressing of static data. Additionally, references to the global offset table and to
the procedure linkage table and branch destinations need to be calculated in a similar
way. Further the size of the text segment is allowed to be up to 16EB in size, hence
similar restrictions apply to all address references into the text segments, including
branches.
Only small code model and small position independent code model (PIC) are used in
ILP32 binaries.
41
AMD64 ABI 1.0 – October 16, 2023 – 13:28
3.5.2 Conventions
In this document some special assembler symbols are used in the coding examples and
discussion. They are:
• name@GOT: specifies the offset to the GOT entry for the symbol name from the base of
the GOT.
• name@GOTOFF: specifies the offset to the location of the symbol name from the base of
the GOT.
• name@GOTPCREL: specifies the offset to the GOT entry for the symbol name from the
current code location.
• name@PLT: specifies the offset to the PLT entry of symbol name from the current code
location.
• name@PLTOFF: specifies the offset to the PLT entry of symbol name from the base of
the GOT.
• _GLOBAL_OFFSET_TABLE_: specifies the offset to the base of the GOT from the current
code location.
42
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.12: Position-Independent Function Prolog Code
medium model:
leaq _GLOBAL_OFFSET_TABLE_(%rip),%r15 # GOTPC32 reloc
large model:
pushq %r15 # save %r15
leaq 1f(%rip),%r11 # absolute %rip
1: movabs $_GLOBAL_OFFSET_TABLE_,%r15 # offset to the GOT (R_X86_64_GOTPC64)
leaq (%r11,%r15),%r15 # absolute address of the GOT
For the medium model the GOT pointer is directly loaded, for the large model the
absolute value of %rip is added to the relative offset to the base of the GOT in order to
obtain its absolute address (see figure 3.12).
43
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Small models
44
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.14: Position-Independent Load and Store (Small PIC Model)
extern int src[65536]; .extern src
extern int dst[65536]; .extern dst
extern int *ptr; .extern ptr
static int lsrc[65536]; .local lsrc
.comm lsrc,262144,4
static int ldst[65536]; .local ldst
.comm ldst,262144,4
static int *lptr; .local lptr
.comm lptr,8,8
.text
dst[0] = src[0]; movq src@GOTPCREL(%rip), %rax
movl (%rax), %edx
movq dst@GOTPCREL(%rip), %rax
movl %edx, (%rax)
45
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Medium models
46
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.16: Position-Independent Load and Store (Medium PIC Model)
extern int src[65536]; .extern src
extern int dst[65536]; .extern dst
extern int *ptr; .extern ptr
static int lsrc[65536]; .local lsrc
.comm lsrc,262144,4
static int ldst[65536]; .local ldst
.comm ldst,262144,4
static int *lptr; .local lptr
.comm lptr,8,8
.text
dst[0] = src[0]; movq src@GOTPCREL(%rip), %rax
movl (%rax), %edx
movq dst@GOTPCREL(%rip), %rax
movl %edx, (%rax)
Figure 3.17: Position-Independent Load and Store (Medium PIC Model), continued
ldst[0] = lsrc[0]; movabsq lsrc@GOTOFF, %rax
movl (%rax,%r15), %eax
movabsq ldst@GOTOFF, %rdx
movl %eax, (%rdx,%r15)
47
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Large Models
Again, in order to access data at any position in the 64-bit addressing space, it is necessary
to calculate the address explicitly27 , not unlike the medium code model.
For position-independent code access to both static and external global data assumes
that the GOT address is stored in a dedicated register. In these examples we assume it is
in %r1528 (see Function Prologue):
27
If, at code generation-time, it is determined that a referred to global data object address is resolved
within 2GB, the %rip-relative addressing mode can be used instead. See example in figure 3.19.
28
If, at code generation-time, it is determined that a referred to global data object address is resolved
within 2GB, the %rip-relative addressing mode can be used instead. See example in figure 3.21.
48
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.20: Position-Independent Global Data Load and Store
Figure 3.22: Position-Independent Direct Function Call (Small and Medium Model)
extern void function (); .globl function
function (); call function@PLT
49
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.23: Position-Independent Indirect Function Call
extern void (*ptr) (); .globl ptr, name
extern void name ();
ptr = name; movq ptr@GOTPCREL(%rip), %rax
movq name@GOTPCREL(%rip), %rdx
movq %rdx, (%rax)
Large models
It cannot be assumed that a function is within 2GB in general. Therefore, it is necessary
to explicitly calculate the desired address reaching the whole 64-bit address space.
50
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.25: Position-Independent Direct and Indirect Function Call
static (*ptr) (void); Lptr: .quad
extern foo (void); .globl foo
static bar (void); Lbar: ...
foo (); movabs $foo@PLTOFF,%r11 ;R_X86_64_PLTOFF64
call *(%r11,%r15)
bar (); movabs $Lbar@GOTOFF,%r11 ;R_X86_64_GOTOFF64
leaq (%r11,%r15),%r11
call *%r11
ptr = foo; movabs $Lptr@GOTOFF,%rax ;R_X86_64_GOTOFF64
movabs $foo@PLTOFF,%r11 ;R_X86_64_PLTOFF64
leaq (%r11,%r15),%r11
movq %r11,(%rax,%r15)
ptr = bar; movabs $Lbar@GOTOFF,%r11 ;R_X86_64_GOTOFF64
leaq (%r11,%r15),%r11
movq %r11,(%rax,%r15)
(*ptr) (); movabs $Lptr@GOTOFF,%r11 ;R_X86_64_GOTOFF64
call *(%r11,%r15)
Implementation advice
If, at code generation-time, certain conditions are determined, it’s possible to generate
faster or smaller code sequences as the large model normally requires. When:
(absolute) target of function call is within 2GB , a direct call or %rip-relative address-
ing might be used:
bar (); call Lbar
ptr = bar; movabs $Lptr,%rax ;R_X86_64_64
leaq $Lbar(%rip),%r11
movq %r11,(%rax)
(PIC) the base of GOT is within 2GB an indirect call to the GOT entry might be imple-
mented like so:
foo (); call *(foo@GOT) ;R_X86_64_GOTPCREL
(PIC) the base of PLT is within 2GB , the PLT entry may be referred to relatively to
%rip:
ptr = foo; movabs $Lptr@GOTOFF,%rax ;R_X86_64_GOTOFF64
leaq $foo@PLT(%rip),%r11 ;R_X86_64_PLT32
movq %r11,(%rax,%r15)
(PIC) target of function call is within 2GB and is either not global or bound locally, a
direct call to the symbol may be used or it may be referred to relatively to %rip:
51
AMD64 ABI 1.0 – October 16, 2023 – 13:28
bar (); call Lbar
ptr = bar; movabs $Lptr@GOTOFF,%rax ;R_X86_64_GOTOFF64
leaq $Lbar(%rip),%r11
movq %r11,(%rax,%r15)
3.5.6 Branching
Small and Medium Models
As all labels are within 2GB no special care has to be taken when implementing branches.
The full AMD64 ISA is usable.
Large Models
Because functions can be theoretically up to 16EB long, the maximum 32-bit displace-
ment of conditional and unconditional branches in the AMD64 ISA are not enough to
address the branch target. Therefore, a branch target address is calculated explicitly 30 .
For absolute objects:
52
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.27: Implicit Calculation of Target Address
if (!a) testl %eax,%eax
{ jz 2f
... 1: ...
} 2:
goto Label; jmp Label
... ...
Label: Label:
53
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.29: Absolute Switch Code
switch (a) cmpl $0,%eax
{ jl .Ldefault
cmpl $2,%eax
jg .Ldefault
movabs $.Ltable,%r11 ;R_X86_64_64
jmpq *(%r11,%eax,8)
.section .lrodata,"aLM",@progbits,8
.align 8
.Ltable:
.quad .Lcase0 ;R_X86_64_64
.quad .Ldefault ;R_X86_64_64
.quad .Lcase2 ;R_X86_64_64
.previous
default: .Ldefault:
... ...
case 0: .Lcase0:
... ...
case 2: .Lcase2:
... ...
}
54
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.30: Position-Independent Switch Code
switch (a) cmpl $0,%eax
{ jl .Ldefault
cmpl $2,%eax
jg .Ldefault
movabs $.Ltable@GOTOFF,%r11 ;R_X86_64_GOTOFF64
leaq (%r11,%r15),%r11
movq *(%r11,%eax,8),%r11
leaq (%r11,%r15),%r11
jmpq *%r11
.section .lrodata,"aLM",@progbits,8
.align 8
.Ltable:
.quad .Lcase0@GOTOFF ;R_X86_64_GOTOFF64
.quad .Ldefault@GOTOFF ;R_X86_64_GOTOFF64
.quad .Lcase2@GOTOFF ;R_X86_64_GOTOFF64
.previous
default: .Ldefault:
... ...
case 0: .Lcase0:
... ...
case 2: .Lcase2:
... ...
}
31
55
AMD64 ABI 1.0 – October 16, 2023 – 13:28
When __m256 or __m512 is passed as variable-argument, it should always be passed
on stack. Only named __m256 and __m512 arguments may be passed in register as
specified in section 3.2.3.
56
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.33: Register Save Area
Register Offset
%rdi 0
%rsi 8
%rdx 16
%rcx 24
%r8 32
%r9 40
%xmm0 48
%xmm1 64
...
%xmm15 288
57
AMD64 ABI 1.0 – October 16, 2023 – 13:28
gp_offset The element holds the offset in bytes from reg_save_area to the place where
the next available general purpose argument register is saved. In case all argument
registers have been exhausted, it is set to the value 48 (6 ∗ 8).
fp_offset The element holds the offset in bytes from reg_save_area to the place where
the next available floating point argument register is saved. In case all argument
registers have been exhausted, it is set to the value 304 (6 ∗ 8 + 16 ∗ 16).
2. Compute num_gp to hold the number of general purpose registers needed to pass type
and num_fp to hold the number of floating point registers needed.
or
l->fp_offset > 304 − num_fp ∗ 16
go to step 7.
5. Set:
l->gp_offset = l->gp_offset + num_gp ∗ 8
l->fp_offset = l->fp_offset + num_fp ∗ 16.
58
AMD64 ABI 1.0 – October 16, 2023 – 13:28
9. Set l->overflow_arg_area to:
l->overflow_arg_area + sizeof(type)
59
AMD64 ABI 1.0 – October 16, 2023 – 13:28
3.6.1 DWARF Release Number
The DWARF definition requires some machine-specific definitions. The register number
mapping needs to be specified for the AMD64 registers. In addition, starting with version
3 the DWARF specification requires processor-specific address class codes to be defined.
Position independence In order to avoid load time relocations for position independent
code, the FDE CIE offset pointer should be stored relative to the start of CIE ta-
ble entry. Frames using this extension of the DWARF standard must set the CIE
identifier tag to 1.
Outgoing arguments area delta To maintain the size of the temporarily allocated outgo-
ing arguments area present on the end of the stack (when using push instructions),
operation GNU_ARGS_SIZE (0x2e) can be used. This operation takes a single uleb128
argument specifying the current size. This information is used to adjust the stack
frame when jumping into the exception handler of the function after unwinding the
stack frame. Additionally the CIE Augmentation shall contain an exact specification
of the encoding used. It is recommended to use a PC relative encoding whenever
possible and adjust the size according to the code model used.
60
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.36: DWARF Register Number Mapping
61
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 3.37: Pointer Encoding Specification Byte
Mask Meaning
0x1 Values are stored as uleb128 or sleb128 type (according to flag 0x8)
0x2 Values are stored as 2 bytes wide integers (udata2 or sdata2)
0x3 Values are stored as 4 bytes wide integers (udata4 or sdata4)
0x4 Values are stored as 8 bytes wide integers (udata8 or sdata8)
0x8 Values are signed
0x10 Values are PC relative
0x20 Values are text section relative
0x30 Values are data section relative
0x40 Values are relative to the start of function
z Indicates that a uleb128 is present determining the size of the augmentation sec-
tion.
L Indicates the encoding (and thus presence) of an LSDA pointer in the FDE aug-
mentation.
The data filed consist of single byte specifying the way pointers are encoded.
It is a mask of the values specified by the table 3.37.
The default DWARF pointer encoding (direct 4-byte absolute pointers) is rep-
resented by value 0.
R Indicates a non-default pointer encoding for FDE code pointers. The formating
is represented by a single byte in the same way as in the ‘L’ command.
P Indicates the presence and an encoding of a language personality routine in the
CIE augmentation. The encoding is represented by a single byte in the same
way as in the ’L’ command followed by a pointer to the personality function
encoded by the specified encoding.
When the augmentation is present, the first command must always be ‘z’ to allow
easy skipping of the information.
In order to simplify manipulation of the unwind tables, the runtime library provide
higher level API to stack unwinding mechanism, for details see section 6.2.
62
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 4
Object Files
File Class
For AMD64 ILP32 objects, the file class value in e_ident[EI_CLASS] must be
ELFCLASS32. For AMD64 LP64 objects, the file class value must be ELFCLASS64.
Data Encoding
For the data encoding in e_ident[EI_DATA], AMD64 objects use ELFDATA2LSB.
Processor identification
Processor identification resides in the ELF headers e_machine member and must have
the value EM_X86_64.1
1
The value of this identifier is 62.
63
AMD64 ABI 1.0 – October 16, 2023 – 13:28
4.1.2 Number of Program Headers
The e_phnum member contains the number of entries in the program header table. The
product of e_phentsize and e_phnum gives the table’s size in bytes. If a file has no
program header table, e_phnum holds the value zero.
If the number of program headers is greater than or equal to PN_XNUM (0xffff), this mem-
ber has the value PN_XNUM (0xffff). The actual number of program header table entries is
contained in the sh_info field of the section header at index 0. Otherwise, the sh_info
member of the initial entry contains the value zero.
4.2 Sections
4.2.1 Section Flags
In order to allow linking object files of different code models, it is necessary to provide
for a way to differentiate those sections which may hold more than 2GB from those which
may not. This is accomplished by defining a processor-specific section attribute flag for
sh_flag (see table 4.1).
Name Value
SHF_X86_64_LARGE 0x10000000
SHF_X86_64_LARGE If an object file section does not have this flag set, then it may not hold
more than 2GB and can be freely referred to in objects using smaller code models.
Otherwise, only objects using larger code models can refer to them. For example,
a medium code model object can refer to data in a section that sets this flag besides
being able to refer to data in a section that does not set it; likewise, a small code
model object can refer only to code in a section that does not set this flag.
64
AMD64 ABI 1.0 – October 16, 2023 – 13:28
4.2.2 Section types
SHT_X86_64_UNWIND This section contains unwind function table entries for stack
unwinding. The contents are described in Section 4.2.4 of this document.
.eh_frame This section holds the unwind function table. The contents are described in
Section 4.2.4 of this document.
The additional sections defined in table 4.4 are used by a system supporting the large
code model.
65
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 4.4: Additional Special Sections for the Medium and Large Code Models
Name Type Attributes
.lbss SHT_NOBITS SHF_ALLOC+SHF_WRITE+SHF_X86_64_LARGE
.ldata SHT_PROGBITS SHF_ALLOC+SHF_WRITE+SHF_X86_64_LARGE
.ldata1 SHT_PROGBITS SHF_ALLOC+SHF_WRITE+SHF_X86_64_LARGE
.lgot SHT_PROGBITS SHF_ALLOC+SHF_WRITE+SHF_X86_64_LARGE
.lplt SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR+SHF_X86_64_LARGE
.lrodata SHT_PROGBITS SHF_ALLOC+SHF_X86_64_LARGE
.lrodata1 SHT_PROGBITS SHF_ALLOC+SHF_X86_64_LARGE
.ltext SHT_PROGBITS SHF_ALLOC+SHF_EXECINSTR+SHF_X86_64_LARGE
In order to enable static linking of objects using different code models, the following
section ordering is suggested:
.plt .init .fini .text .got .rodata .rodata1 .data .data1 .bss These sections
can have a combined size of up to 2GB.
.lplt .ltext .lgot .lrodata .lrodata1 .ldata .ldata1 .lbss These sections plus
the above can have a combined size of up to 16EB.
66
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 4.5: Common Information Entry (CIE)
67
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 4.6: CIE Augmentation Section Content
68
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 4.7: Frame Descriptor Entry (FDE)
69
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 4.8: FDE Augmentation Section Content
The existence and size of the optional call frame instruction area must be computed
based on the overall size and the offset reached while scanning the preceding fields of the
CIE or FDE.
The overall size of a .eh_frame section is given in the ELF section header. The only
way to determine the number of entries is to scan the section until the end, counting entries
as they are encountered.
70
AMD64 ABI 1.0 – October 16, 2023 – 13:28
4.4 Relocation
4.4.1 Relocation Types
Figure 4.1 shows the allowed relocatable fields.
7 word8 0
15 word16 0
31 word32 0
63 word64 0
71
AMD64 ABI 1.0 – October 16, 2023 – 13:28
B Represents the base address at which a shared object has been loaded into memory
during execution. Generally, a shared object is built with a 0 base virtual address,
but the execution address will be different.
G Represents the offset into the global offset table at which the relocation entry’s symbol
will reside during execution.
L Represents the place (section offset or address) of the Procedure Linkage Table entry
for a symbol.
P Represents the place (section offset or address) of the storage unit being relocated (com-
puted using r_offset).
S Represents the value of the symbol whose index resides in the relocation entry.
Z Represents the size of the symbol whose index resides in the relocation entry.
The AMD64 LP64 ABI architecture uses only Elf64_Rela relocation entries with
explicit addends. The r_addend member serves as the relocation addend.
The AMD64 ILP32 ABI architecture uses only Elf32_Rela relocation entries in
relocatable files. Executable files or shared objects may use either Elf32_Rela or
Elf32_Rel relocation entries.
72
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 4.9: Relocation Types
Name Value Field Calculation
R_X86_64_NONE 0 none none
R_X86_64_64 1 word64 S + A
R_X86_64_PC32 2 word32 S + A - P
R_X86_64_GOT32 3 word32 G + A
R_X86_64_PLT32 4 word32 L + A - P
R_X86_64_COPY 5 none none
R_X86_64_GLOB_DAT 6 wordclass S
R_X86_64_JUMP_SLOT 7 wordclass S
R_X86_64_RELATIVE 8 wordclass B + A
R_X86_64_GOTPCREL 9 word32 G + GOT + A - P
R_X86_64_32 10 word32 S + A
R_X86_64_32S 11 word32 S + A
R_X86_64_16 12 word16 S + A
R_X86_64_PC16 13 word16 S + A - P
R_X86_64_8 14 word8 S + A
R_X86_64_PC8 15 word8 S + A - P
R_X86_64_DTPMOD64 16 word64
R_X86_64_DTPOFF64 17 word64
R_X86_64_TPOFF64 18 word64
R_X86_64_TLSGD 19 word32
R_X86_64_TLSLD 20 word32
R_X86_64_DTPOFF32 21 word32
R_X86_64_GOTTPOFF 22 word32
R_X86_64_TPOFF32 23 word32
R_X86_64_PC64 † 24 word64 S + A - P
R_X86_64_GOTOFF64 † 25 word64 S + A - GOT
R_X86_64_GOTPC32 26 word32 GOT + A - P
R_X86_64_SIZE32 32 word32 Z + A
R_X86_64_SIZE64 † 33 word64 Z + A
R_X86_64_GOTPC32_TLSDESC 34 word32
R_X86_64_TLSDESC_CALL 35 none
R_X86_64_TLSDESC 36 word64×2
R_X86_64_IRELATIVE 37 wordclass indirect (B + A)
R_X86_64_RELATIVE64 †† 38 word64 B + A
Deprecated 39
Deprecated 40
R_X86_64_GOTPCRELX 41 word32 G + GOT + A - P
R_X86_64_REX_GOTPCRELX 42 word32 G + GOT + A - P
R_X86_64_CODE_4_GOTPCRELX 43 word32 G + GOT + A - P
R_X86_64_CODE_4_GOTTPOFF 44 word32
R_X86_64_CODE_4_GOTPC32_TLSDESC 45 word32
† This relocation is used only for LP64.
†† This relocation only appears in ILP32 executable files or shared objects.
The special semantics for most of these relocation types are identical to those used for
73
AMD64 ABI 1.0 – October 16, 2023 – 13:28
the Intel386 ABI. 3 4
The R_X86_64_GOTPCREL relocation has different semantics from the
R_X86_64_GOT32 or equivalent i386 R_386_GOTPC relocation. In particular,
because the AMD64 architecture has an addressing mode relative to the instruction
pointer, it is possible to load an address from the GOT using a single instruction. The
calculation done by the R_X86_64_GOTPCREL relocation gives the difference between
the location in the GOT where the symbol’s address is given and the location where the
relocation is applied.
For the occurrence of name@GOTPCREL in the following assembler instructions:
call *name@GOTPCREL(%rip)
jmp *name@GOTPCREL(%rip)
mov name@GOTPCREL(%rip), %reg
test %reg, name@GOTPCREL(%rip)
binop name@GOTPCREL(%rip), %reg
where binop is one of adc, add, and, cmp, or, sbb, sub, xor instructions, the
R_X86_64_GOTPCRELX relocation, the R_X86_64_REX_GOTPCRELX relocation if
the REX prefix is present, or the R_X86_64_CODE_4_GOTPCRELX relocation if the in-
struction starts at 4 bytes before the relocation offset, should be generated, instead of the
R_X86_64_GOTPCREL relocation. See also section B.2.
The R_X86_64_32 and R_X86_64_32S relocations truncate the computed value
to 32-bits. The linker must verify that the generated value for the R_X86_64_32
(R_X86_64_32S) relocation zero-extends (sign-extends) to the original 64-bit value.
A program or object file using R_X86_64_8, R_X86_64_16, R_X86_64_PC16
or R_X86_64_PC8 relocations is not conformant to this ABI, these relocations are only
added for documentation purposes. The R_X86_64_16, and R_X86_64_8 relocations
truncate the computed value to 16-bits resp. 8-bits.
The relocations R_X86_64_DTPMOD64, R_X86_64_DTPOFF64,
R_X86_64_TPOFF64, R_X86_64_TLSGD, R_X86_64_TLSLD,
R_X86_64_DTPOFF32, R_X86_64_GOTTPOFF and R_X86_64_TPOFF32
are listed for completeness. They are part of the Thread-Local Storage ABI ex-
tensions and are documented in the document called “ELF Handling for Thread-
3
Even though the AMD64 architecture supports IP-relative addressing modes, a GOT is still required
since the offset from a particular instruction to a particular data item cannot be known by the static linker.
4
Note that the AMD64 architecture assumes that offsets into GOT are 32-bit values, not 64-bit values.
This choice means that a maximum of 232 /8 = 229 entries can be placed in the GOT. However, that should
be more than enough for most programs. In the event that it is not enough, the linker could create mul-
tiple GOTs. Because 32-bit offsets are used, loads of global data do not require loading the offset into a
displacement register; the base plus immediate displacement addressing form can be used.
74
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Local Storage”5 . R_X86_64_GOTTPOFF should be generated only if the in-
struction starts at 3 bytes before the relocation offset where linker expects a
REX byte. R_X86_64_CODE_4_GOTTPOFF should be generated, instead of
R_X86_64_GOTTPOFF, if the instruction starts at 4 bytes before the relocation
offset and linker optimization must take the different instruction encoding into account.
The relocations R_X86_64_GOTPC32_TLSDESC, R_X86_64_TLSDESC_CALL
and R_X86_64_TLSDESC are also used for Thread-Local Storage, but are
not documented there as of this writing. A description can be found in the
document “Thread-Local Storage Descriptors for IA32 and AMD64/EM64T”6 .
R_X86_64_CODE_4_GOTPC32_TLSDESC should be generated, instead of
R_X86_64_GOTPC32_TLSDESC, if the instruction starts at 4 bytes before the
relocation offset and linker optimization must take the different instruction encoding into
account.
In order to make this document self-contained, a description of the TLS relocations
follows.
The %fs segment register is used to implement the thread pointer. The linear address of
the thread pointer is stored at offset 0 relative to the %fs segment register. The following
code loads the thread pointer in the %rax register:
R_X86_64_DTPMOD64 resolves to the index of the dynamic thread vector entry that
points to the base address of the TLS block corresponding to the module that defines
the referenced symbol. R_X86_64_DTPOFF64 and R_X86_64_DTPOFF32 compute
the offset from the pointer in that entry to the referenced symbol. The linker generates
such relocations in adjacent entries in the GOT, in response to R_X86_64_TLSGD and
R_X86_64_TLSLD relocations. If the linker can compute the offset itself, because the
referenced symbol binds locally, the relocations R_X86_64_64 and R_X86_64_32
may be used instead. Otherwise, such relocations are always in pairs, such that the
R_X86_64_DTPOFF64 relocation applies to the word64 right past the corresponding
R_X86_64_DTPMOD64 relocation.
R_X86_64_TPOFF64 and R_X86_64_TPOFF32 resolve to the offset from
the thread pointer to a thread-local variable. The former is generated in response
to R_X86_64_GOTTPOFF, that resolves to a PC-relative address of a GOT entry
containing such a 64-bit offset.
5
This document is currently available via http://www.akkadia.org/drepper/tls.pdf
6
This document is currently available via
http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
75
AMD64 ABI 1.0 – October 16, 2023 – 13:28
R_X86_64_TLSGD and R_X86_64_TLSLD both resolve to PC-relative offsets to
a DTPMOD GOT entry. The difference between them is that, for R_X86_64_TLSGD,
the following GOT entry will contain the offset of the referenced symbol into its TLS
block, whereas, for R_X86_64_TLSLD, the following GOT entry will contain the off-
set for the base address of the TLS block. The idea is that adding this offset to the re-
sult of R_X86_64_DTPMOD32 for a symbol ought to yield the same as the result of
R_X86_64_DTPMOD64 for the same symbol.
R_X86_64_TLSDESC resolves to a pair of word64s, called TLS Descriptor, the first
of which is a pointer to a function, followed by an argument. The function is passed
a pointer to the this pair of entries in %rax and, using the argument in the second en-
try, it must compute and return in %rax the offset from the thread pointer to the symbol
referenced in the relocation, without modifying any registers other than processor flags.
R_X86_64_GOTPC32_TLSDESC resolves to the PC-relative address of a TLS descrip-
tor corresponding to the named symbol. R_X86_64_TLSDESC_CALL must annotate the
instruction used to call the TLS Descriptor resolver function, so as to enable relaxation of
that instruction.
R_X86_64_IRELATIVE is similar to R_X86_64_RELATIVE except that the value
used in this relocation is the program address returned by the function, which takes no
arguments, at the address of the result of the corresponding R_X86_64_RELATIVE re-
location.
One use of the R_X86_64_IRELATIVE relocation is to avoid name lookup for the
locally defined STT_GNU_IFUNC symbols at load-time. Support for this relocation is
optional, but is required for the STT_GNU_IFUNC symbols.
76
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 5
77
AMD64 ABI 1.0 – October 16, 2023 – 13:28
ual processes, it maintains the segments’ relative positions. Because position-independent
code uses relative addressing between segments, the difference between virtual addresses
in memory must match the difference between virtual addresses in the file.
Name Value
PT_GNU_EH_FRAME 0x6474e550
PT_SUNW_EH_FRAME 0x6474e550
PT_SUNW_UNWIND 0x6464e550
78
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 5.2: Optional Dynamic Array Tags, d_tag
• The procedure linkage table entries transfer control to external functions via indirect
branch over the corresponding entry in the global offset table referenced by the
r_offset field of the R_X86_64_JUMP_SLOT relocation. The r_addend field of
the R_X86_64_JUMP_SLOT relocation stores the procedure linkage table offset
of indirect branch in the corresponding procedure linkage table entry.
• All such procedure linkage table entries have the same layout. Each entry starts
with an indirect branch over the global offset table entry. An ENDBR64 instruction
may precede the indirect branch. There should be no functional side effect when the
indirect branch is replaced by a direct branch.
• The only entry point of a procedure linkage table entry is the first byte of the entry.
• All entries have the same size and are aligned to the entry size.
79
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Global Offset Table (GOT)
Position-independent code cannot, in general, contain absolute virtual addresses. Global
offset tables hold absolute addresses in private data, thus making the addresses available
without compromising the position-independence and shareability of a program’s text.
A program references its global offset table using position-independent addressing and
extracts absolute values, thus redirecting position-independent references to absolute lo-
cations.
If a program requires direct access to the absolute address of a symbol, that symbol
will have a global offset table entry. Because the executable file and shared objects have
separate global offset tables, a symbol’s address may appear in several tables. The dynamic
linker processes all the global offset table relocations before giving control to any code in
the process image, thus ensuring the absolute addresses are available during execution.
The tables first entry (number zero) is reserved to hold the address of the dynamic
structure, referenced with the symbol _DYNAMIC. This allows a program, such as the dy-
namic linker, to find its own dynamic structure without having yet processed its relocation
entries. This is especially important for the dynamic linker, because it must initialize it-
self without relying on other programs to relocate its memory image. On the AMD64
architecture, entries one and two in the global offset table also are reserved.
The global offset table contains 64-bit addresses.
For the large models the GOT is allowed to be up to 16EB in size.
The symbol _GLOBAL_OFFSET_TABLE_ may reside in the middle of the .got section,
allowing both negative and non-negative offsets into the array of addresses.
Function Addresses
References to the address of a function from an executable file and the shared objects asso-
ciated with it might not resolve to the same value. References from within shared objects
will normally be resolved by the dynamic linker to the virtual address of the function it-
self. References from within the executable file to a function defined in a shared object
will normally be resolved by the link editor to the address of the procedure linkage table
entry for that function within the executable file.
80
AMD64 ABI 1.0 – October 16, 2023 – 13:28
To allow comparisons of function addresses to work as expected, if an executable file
references a function defined in a shared object, the link editor will place the address of
the procedure linkage table entry for that function in its associated symbol table entry.
This will result in symbol table entries with section index of SHN_UNDEF but a type of
STT_FUNC and a non-zero st_value. A reference to the address of a function from
within a shared library will be satisfied by such a definition in the executable.
Some relocations are associated with procedure linkage table entries. These entries
are used for direct function calls rather than for references to function addresses. These
relocations do not use the special symbol value described above. Otherwise a very tight
endless loop would be created.
81
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Following the steps below, the dynamic linker and the program “cooperate” to resolve
symbolic references through the procedure linkage table and the global offset table.
1. When first creating the memory image of the program, the dynamic linker sets the
second and the third entries in the global offset table to special values. Steps below
explain more about these values.
2. Each shared object file in the process image has its own procedure linkage table, and
control transfers to a procedure linkage table entry only from within the same object
file.
3. For illustration, assume the program calls name1, which transfers control to the label
.PLT1.
4. The first instruction jumps to the address in the global offset table entry for name1.
Initially the global offset table holds the address of the following pushq instruction,
not the real address of name1.
5. Now the program pushes a relocation index (index) on the stack. The relocation
index is a 32-bit, non-negative index into the relocation table addressed by the
DT_JMPREL dynamic section entry. The designated relocation entry will have type
R_X86_64_JUMP_SLOT, and its offset will specify the global offset table entry
used in the previous jmp instruction. The relocation entry contains a symbol table
index that will reference the appropriate symbol, name1 in the example.
6. After pushing the relocation index, the program then jumps to .PLT0, the first entry
in the procedure linkage table. The pushq instruction places the value of the second
global offset table entry (GOT+8) on the stack, thus giving the dynamic linker one
word of identifying information. The program then jumps to the address in the third
global offset table entry (GOT+16), which transfers control to the dynamic linker.
7. When the dynamic linker receives control, it unwinds the stack, looks at the desig-
nated relocation entry, finds the symbol’s value, stores the “real” address for name1
in its global offset table entry, and transfers control to the desired destination.
8. Subsequent executions of the procedure linkage table entry will transfer directly to
name1, without calling the dynamic linker a second time. That is, the jmp instruction
at .PLT1 will transfer to name1, instead of “falling through” to the pushq instruction.
The LD_BIND_NOW environment variable can change the dynamic linking behavior. If
its value is non-null, the dynamic linker evaluates procedure linkage table entries before
82
AMD64 ABI 1.0 – October 16, 2023 – 13:28
transferring control to the program. That is, the dynamic linker processes relocation entries
of type R_X86_64_JUMP_SLOT during process initialization. Otherwise, the dynamic linker
evaluates procedure linkage table entries lazily, delaying symbol resolution and relocation
until the first execution of a table entry.
Relocation entries of type R_X86_64_TLSDESC may also be subject to lazy relocation,
using a single entry in the procedure linkage table and in the global offset table, at loca-
tions given by DT_TLSDESC_PLT and DT_TLSDESC_GOT, respectively, as described
in “Thread-Local Storage Descriptors for IA32 and AMD64/EM64T”2 .
For self-containment, DT_TLSDESC_GOT specifies a GOT entry in which the dy-
namic loader should store the address of its internal TLS Descriptor resolver function,
whereas DT_TLSDESC_PLT specifies the address of a PLT entry to be used as the TLS
descriptor resolver function for lazy resolution from within this module. The PLT entry
must push the linkmap of the module onto the stack and tail-call the internal TLS Descrip-
tor resolver function.
Large Models
In the small and medium code models the size of both the PLT and the GOT is limited by
the maximum 32-bit displacement size. Consequently, the base of the PLT and the top of
the GOT can be at most 2GB apart.
Therefore, in order to support the available addressing space of 16EB, it is necessary
to extend both the PLT and the GOT. Moreover, the PLT needs to support the GOT being
over 2GB away and the GOT can be over 2GB in size.3
The PLT is extended as shown in figure 5.3 with the assumption that the GOT address
is in %r154 .
2
This document is currently available via
http://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
3
If it is determined that the base of the PLT is within 2GB of the top of the GOT, it is also allowed to use
the same PLT layout for a large code model object as that of the small and medium code models.
4
See Function Prologue.
83
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 5.3: Final Large Code Model PLT
This way, for the first 102261125 entries, each PLT entry besides .PLT0 uses only 21
bytes. Afterwards, the PLT entry code changes by repeating that of .PLT0, when each PLT
entry is 27 bytes long. Notice that any alignment consideration is dropped in order to keep
the PLT size down.
Each extended PLT entry is thus 5 to 11 bytes larger than the small and medium code
model PLT entries.
The functionality of entry .PLT0 remains unchanged from the small and medium code
models.
Note that the symbol index is still limited to 32 bits, which would allow for up to 4G
global and external functions.
Typically, UNIX compilers support two types of PLT, generally through the options
-fpic and -fPIC. When building position-independent objects using the large code model,
only -fPIC is allowed. Using the option -fpic with the large code model remains reserved
for future use.
84
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 5.4: AMD64 Program Interpreter
85
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 6
Libraries
6.1 C Library
6.1.1 Global Data Symbols
The symbols _fp_hw, __flt_rounds and __huge_val are not provided by the AMD64 ABI.
86
AMD64 ABI 1.0 – October 16, 2023 – 13:28
This section is meant to specify a language-independent interface that can be used to
provide higher level exception-handling facilities such as those defined by C++.
The unwind library interface consists of at least the following routines:
_Unwind_RaiseException ,
_Unwind_Resume ,
_Unwind_DeleteException ,
_Unwind_GetGR ,
_Unwind_SetGR ,
_Unwind_GetIP ,
_Unwind_GetIPInfo ,
_Unwind_SetIP ,
_Unwind_GetRegionStart ,
_Unwind_GetLanguageSpecificData ,
_Unwind_ForcedUnwind ,
_Unwind_GetCFA
In addition, two data types are defined (_Unwind_Context and _Unwind_Exception )
to interface a calling runtime (such as the C++ runtime) and the above routine. All rou-
tines and interfaces behave as if defined extern "C". In particular, the names are not
mangled. All names defined as part of this interface have a "_Unwind_" prefix.
Lastly, a language and vendor specific personality routine will be stored by the com-
piler in the unwind descriptor for the stack frames requiring exception processing. The
personality routine is called by the unwinder to handle language-specific tasks such as
identifying the frame handling a particular exception.
The interface described here tries to keep both similar. There is a major difference,
however.
• In the case where an exception is thrown, the stack is unwound while the exception
propagates, but it is expected that the personality routine for each stack frame knows
87
AMD64 ABI 1.0 – October 16, 2023 – 13:28
whether it wants to catch the exception or pass it through. This choice is thus del-
egated to the personality routine, which is expected to act properly for any type of
exception, whether “native” or “foreign”. Some guidelines for “acting properly” are
given below.
• During “forced unwinding”, on the other hand, an external agent is driving the un-
winding. For instance, this can be the longjmp routine. This external agent, not
each personality routine, knows when to stop unwinding. The fact that a personality
routine is not given a choice about whether unwinding will proceed is indicated by
the _UA_FORCE_UNWIND flag.
• In the search phase, the framework repeatedly calls the personality routine, with the
_UA_SEARCH_PHASE flag as described below, first for the current %rip and register
state, and then unwinding a frame to a new %rip at each step, until the personal-
ity routine reports either success (a handler found in the queried frame) or failure
(no handler) in all frames. It does not actually restore the unwound state, and the
personality routine must access the state through the API.
• If the search phase reports a failure, e.g. because no handler was found, it will call
terminate() rather than commence phase 2.
88
AMD64 ABI 1.0 – October 16, 2023 – 13:28
If the search phase reports success, the framework restarts in the cleanup phase.
Again, it repeatedly calls the personality routine, with the _UA_CLEANUP_PHASE flag
as described below, first for the current %rip and register state, and then unwinding
a frame to a new %rip at each step, until it gets to the frame with an identified
handler. At that point, it restores the register state, and control is transferred to the
user landing pad code.
Each of these two phases uses both the unwind library and the personality routines,
since the validity of a given handler and the mechanism for transferring control to it are
language-dependent, but the method of locating and restoring previous stack frames is
language-independent.
A two-phase exception-handling model is not strictly necessary to implement C++ lan-
guage semantics, but it does provide some benefits. For example, the first phase allows
an exception-handling mechanism to dismiss an exception before stack unwinding begins,
which allows presumptive exception handling (correcting the exceptional condition and
resuming execution at the point where it was raised). While C++ does not support pre-
sumptive exception handling, other languages do, and the two-phase model allows C++ to
coexist with those languages on the stack.
Note that even with a two-phase model, we may execute each of the two phases more
than once for a single exception, as if the exception was being thrown more than once. For
instance, since it is not possible to determine if a given catch clause will re-throw or not
without executing it, the exception propagation effectively stops at each catch clause, and
if it needs to restart, restarts at phase 1. This process is not needed for destructors (cleanup
code), so the phase 1 can safely process all destructor-only frames at once and stop at the
next enclosing catch clause.
For example, if the first two frames unwound contain only cleanup code, and the third
frame contains a C++ catch clause, the personality routine in phase 1, does not indicate
that it found a handler for the first two frames. It must do so for the third frame, because it
is unknown how the exception will propagate out of this third frame, e.g. by re-throwing
the exception or throwing a new one in C++.
The API specified by the AMD64 psABI for implementing this framework is described
in the following sections.
89
AMD64 ABI 1.0 – October 16, 2023 – 13:28
typedef enum {
_URC_NO_REASON = 0,
_URC_FOREIGN_EXCEPTION_CAUGHT = 1,
_URC_FATAL_PHASE2_ERROR = 2,
_URC_FATAL_PHASE1_ERROR = 3,
_URC_NORMAL_STOP = 4,
_URC_END_OF_STACK = 5,
_URC_HANDLER_FOUND = 6,
_URC_INSTALL_CONTEXT = 7,
_URC_CONTINUE_UNWIND = 8
} _Unwind_Reason_Code;
The interpretations of these codes are described below.
Exception Header
The unwind interface uses a pointer to an exception header object as its representation
of an exception being thrown. In general, the full representation of an exception object
is language- and implementation-specific, but is prefixed by a header understood by the
unwind interface, defined as follows:
typedef void (*_Unwind_Exception_Cleanup_Fn)
(_Unwind_Reason_Code reason,
struct _Unwind_Exception *exc);
struct _Unwind_Exception {
uint64 exception_class;
_Unwind_Exception_Cleanup_Fn exception_cleanup;
uint64 private_1;
uint64 private_2;
};
An _Unwind_Exception object must be eightbyte aligned. The first two fields are set by
user code prior to raising the exception, and the latter two should never be touched except
by the runtime.
The exception_class field is a language- and implementation-specific identifier of the
kind of exception. It allows a personality routine to distinguish between native and foreign
exceptions, for example. By convention, the high 4 bytes indicate the vendor (for instance
AMD\0), and the low 4 bytes indicate the language. For the C++ ABI described in this
document, the low four bytes are C++\0.
The exception_cleanup routine is called whenever an exception object needs to be
destroyed by a different runtime than the runtime which created the exception object, for
instance if a Java exception is caught by a C++ catch handler. In such a case, a reason code
(see above) indicates why the exception object needs to be deleted:
90
AMD64 ABI 1.0 – October 16, 2023 – 13:28
_URC_FATAL_PHASE1_ERROR = 3 The personality routine encountered an error during phase
1, other than the specific error codes defined.
Unwind Context
The _Unwind_Context type is an opaque type used to refer to a system-specific data struc-
ture used by the system unwinder. This context is created and destroyed by the system,
and passed to the personality routine during unwinding.
struct _Unwind_Context
91
AMD64 ABI 1.0 – October 16, 2023 – 13:28
handler for the exception, bad stack format, etc.). In such a case, an _Unwind_Reason_Code
value is returned.
Possibilities are:
_URC_END_OF_STACK The unwinder encountered the end of the stack during phase 1, with-
out finding a handler. The unwind runtime will not have modified the stack. The
C++ runtime will normally call uncaught_exception() in this case.
_Unwind_ForcedUnwind
typedef _Unwind_Reason_Code (*_Unwind_Stop_Fn)
(int version,
_Unwind_Action actions,
uint64 exceptionClass,
struct _Unwind_Exception *exceptionObject,
struct _Unwind_Context *context,
void *stop_parameter );
_Unwind_Reason_Code_Unwind_ForcedUnwind
( struct _Unwind_Exception *exception_object,
_Unwind_Stop_Fn stop,
void *stop_parameter );
Raise an exception for forced unwinding, passing along the given exception object,
which should have its exception_class and exception_cleanup fields set. The exception
object has been allocated by the language-specific runtime, and has a language-specific
format, except that it must contain an _Unwind_Exception struct (see Exception Header
above).
Forced unwinding is a single-phase process (phase 2 of the normal exception-handling
process). The stop and stop_parameter parameters control the termination of the unwind
process, instead of the usual personality routine query. The stop function parameter is
called for each unwind frame, with the parameters described for the usual personality
routine below, plus an additional stop_parameter.
92
AMD64 ABI 1.0 – October 16, 2023 – 13:28
When the stop function identifies the destination frame, it transfers control (ac-
cording to its own, unspecified, conventions) to the user code as appropriate without
returning, normally after calling _Unwind_DeleteException. If not, it should return an
_Unwind_Reason_Code value as follows:
_URC_NO_REASON This is not the destination frame. The unwind runtime will call the
frame’s personality routine with the _UA_FORCE_UNWIND and _UA_CLEANUP_PHASE flags
set in actions, and then unwind to the next frame and call the stop function again.
_URC_FATAL_PHASE2_ERROR The stop function may return this code for other fatal condi-
tions, e.g. stack corruption.
If the stop function returns any reason code other than _URC_NO_REASON, the stack state
is indeterminate from the point of view of the caller of _Unwind_ForcedUnwind. Rather than
attempt to return, therefore, the unwind library should return _URC_FATAL_PHASE2_ERROR to
its caller.
Example: longjmp_unwind()
The expected implementation of longjmp_unwind() is as follows. The setjmp() routine
will have saved the state to be restored in its customary place, including the frame pointer.
The longjmp_unwind() routine will call _Unwind_ForcedUnwind with a stop function that
compares the frame pointer in the context record with the saved frame pointer. If equal, it
will restore the setjmp() state as customary, and otherwise it will return _URC_NO_REASON
or _URC_END_OF_STACK.
If a future requirement for two-phase forced unwinding were identified, an alternate
routine could be defined to request it, and an actions parameter flag defined to support it.
_Unwind_Resume
void _Unwind_Resume
(struct _Unwind_Exception *exception_object);
Resume propagation of an existing exception e.g. after executing cleanup code in a
partially unwound stack. A call to this routine is inserted at the end of a landing pad that
93
AMD64 ABI 1.0 – October 16, 2023 – 13:28
performed cleanup, but did not resume normal execution. It causes unwinding to proceed
further.
_Unwind_Resume should not be used to implement re-throwing. To the unwinding
runtime, the catch code that re-throws was a handler, and the previous unwinding
session was terminated before entering it. Re-throwing is implemented by calling
_Unwind_RaiseException again with the same exception object.
This is the only routine in the unwind library which is expected to be called directly
by generated code: it will be called at the end of a landing pad in a "landing-pad" model.
_Unwind_GetGR
uint64 _Unwind_GetGR
(struct _Unwind_Context *context, int index);
This function returns the 64-bit value of the given general register. The register is
identified by its index as given in figure 3.36.
During the two phases of unwinding, no registers have a guaranteed value.
_Unwind_SetGR
void _Unwind_SetGR
(struct _Unwind_Context *context,
int index,
uint64 new_value);
94
AMD64 ABI 1.0 – October 16, 2023 – 13:28
This function sets the 64-bit value of the given register, identified by its index as for
_Unwind_GetGR.
The behavior is guaranteed only if the function is called during phase 2 of unwinding,
and applied to an unwind context representing a handler frame, for which the personality
routine will return _URC_INSTALL_CONTEXT. In that case, only registers %rdi, %rsi, %rdx,
%rcx should be used. These scratch registers are reserved for passing arguments between
the personality routine and the landing pads.
_Unwind_GetIP
uint64 _Unwind_GetIP
(struct _Unwind_Context *context);
This function returns the 64-bit value of the instruction pointer (IP).
During unwinding, the value is guaranteed to be the address of the instruction imme-
diately following the call site in the function identified by the unwind context. This value
may be outside of the procedure fragment for a function call that is known to not return
(such as _Unwind_Resume).
Applications which unwind through asynchronous signals and other non-call locations
should use _Unwind_GetIPInfo below, and the additional flag that function provides.
_Unwind_GetIPInfo
uint64 _Unwind_GetIPInfo
(struct _Unwind_Context *context, int *ip_before_insn);
This function returns the same value as _Unwind_GetIP. In addition, the argument
ip_before_insn must not be not null, and *ip_before_insn is updated with a flag which
indicates whether the returned pointer is at or after the first not yet fully executed instruc-
tion.
If *ip_before_insn is false, the application calling _Unwind_GetIPInfo should assume
that the instruction pointer provided points after a call instruction which has not yet re-
turned. In general, this means that the application should use the preceding call instruction
as the instruction pointer location of the unwind context. Typically, this can be approxi-
mated by subtracting one from the returned instruction pointer.
If *ip_before_insn is true, then the instruction pointer does not refer to an active call
site. Usually, this means that the instruction pointer refers to the point at which an asyn-
chronous signal arrived. In this case, the application should use the instruction pointer
returned from _Unwind_GetIPInfo as the instruction pointer location of the unwind con-
text, without adjustment.
95
AMD64 ABI 1.0 – October 16, 2023 – 13:28
_Unwind_SetIP
void _Unwind_SetIP
(struct _Unwind_Context *context,
uint64 new_value);
This function sets the value of the instruction pointer (IP) for the routine identified by
the unwind context.
The behavior is guaranteed only when this function is called for an unwind
context representing a handler frame, for which the personality routine will return
_URC_INSTALL_CONTEXT. In this case, control will be transferred to the given address, which
should be the address of a landing pad.
_Unwind_GetLanguageSpecificData
uint64 _Unwind_GetLanguageSpecificData
(struct _Unwind_Context *context);
This routine returns the address of the language-specific data area for the current stack
frame.
This routine is not strictly required: it could be accessed through _Unwind_GetIP using
the documented format of the DWARF Call Frame Information Tables, but since this work
has been done for finding the personality routine in the first place, it makes sense to cache
the result in the context. We could also pass it as an argument to the personality routine.
_Unwind_GetRegionStart
uint64 _Unwind_GetRegionStart
(struct _Unwind_Context *context);
This routine returns the address of the beginning of the procedure or code fragment
described by the current unwind descriptor block.
This information is required to access any data stored relative to the beginning of the
procedure fragment. For instance, a call site table might be stored relative to the beginning
of the procedure fragment that contains the calls. During unwinding, the function returns
the start of the procedure fragment containing the call site in the current stack frame.
_Unwind_GetCFA
uint64 _Unwind_GetCFA
(struct _Unwind_Context *context);
This function returns the 64-bit Canonical Frame Address which is defined as the value
of %rsp at the call site in the previous frame. This value is guaranteed to be correct any
time the context has been passed to a personality routine or a stop function.
96
AMD64 ABI 1.0 – October 16, 2023 – 13:28
6.2.6 Personality Routine
_Unwind_Reason_Code (*__personality_routine)
(int version,
_Unwind_Action actions,
uint64 exceptionClass,
struct _Unwind_Exception *exceptionObject,
struct _Unwind_Context *context);
The personality routine is the function in the C++ (or other language) runtime library
which serves as an interface between the system unwind library and language-specific
exception handling semantics. It is specific to the code fragment described by an unwind
info block, and it is always referenced via the pointer in the unwind info block, and hence
it has no psABI-specified name.
Parameters
The personality routine parameters are as follows:
version Version number of the unwinding runtime, used to detect a mis-match between
the unwinder conventions and the personality routine, or to provide backward com-
patibility. For the conventions described in this document, version will be 1.
actions Indicates what processing the personality routine is expected to perform, as a bit
mask. The possible actions are described below.
context Unwinder state information for use by the personality routine. This is an opaque
handle used by the personality routine in particular to access the frame’s registers
(see the Unwind Context section above).
return value The return value from the personality routine indicates how further unwind
should happen, as well as possible error conditions. See the following section.
97
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Personality Routine Actions
The actions argument to the personality routine is a bitwise OR of one or more of the
following constants:
typedef int _Unwind_Action;
const _Unwind_Action _UA_SEARCH_PHASE = 1;
const _Unwind_Action _UA_CLEANUP_PHASE = 2;
const _Unwind_Action _UA_HANDLER_FRAME = 4;
const _Unwind_Action _UA_FORCE_UNWIND = 8;
_UA_SEARCH_PHASE Indicates that the personality routine should check if the current
frame contains a handler, and if so return _URC_HANDLER_FOUND, or otherwise
return _URC_CONTINUE_UNWIND. _UA_SEARCH_PHASE cannot be set at the same time as
_UA_CLEANUP_PHASE.
_UA_CLEANUP_PHASE Indicates that the personality routine should perform cleanup for the
current frame. The personality routine can perform this cleanup itself, by calling
nested procedures, and return _URC_CONTINUE_UNWIND. Alternatively, it can setup the
registers (including the IP) for transferring control to a "landing pad", and return
_URC_INSTALL_CONTEXT.
_UA_HANDLER_FRAME During phase 2, indicates to the personality routine that the current
frame is the one which was flagged as the handler frame during phase 1. The per-
sonality routine is not allowed to change its mind between phase 1 and phase 2, i.e.
it must handle the exception in this frame in phase 2.
98
AMD64 ABI 1.0 – October 16, 2023 – 13:28
before the call that threw the exception, as follows. All registers specified as callee-saved
by the base ABI are restored, as well as scratch registers %rdi, %rsi, %rdx, %rcx (see below).
Except for those exceptions, scratch (or caller-saved) registers are not preserved, and their
contents are undefined on transfer.
The landing pad can either resume normal execution (as, for instance, at the end
of a C++ catch), or resume unwinding by calling _Unwind_Resume and passing it the
exceptionObject argument received by the personality routine. _Unwind_Resume will never
return.
_Unwind_Resume should be called if and only if the personality routine did not return
_Unwind_HANDLER_FOUND during phase 1. As a result, the unwinder can allocate resources
(for instance memory) and keep track of them in the exception object reserved words. It
should then free these resources before transferring control to the last (handler) landing
pad. It does not need to free the resources before entering non-handler landing-pads, since
_Unwind_Resume will ultimately be called.
The landing pad may receive arguments from the runtime, typically passed in regis-
ters set using _Unwind_SetGR by the personality routine. For a landing pad that can call
to _Unwind_Resume, one argument must be the exceptionObject pointer, which must be
preserved to be passed to _Unwind_Resume.
The landing pad may receive other arguments, for instance a switch value indicating
the type of the exception. Four scratch registers are reserved for this use (%rdi, %rsi, %rdx,
%rcx) 2 .
99
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Example: Foreign Exceptions in C++. In C++, foreign exceptions can be caught by a
catch(...) statement. They can also be caught as if they were of a __foreign_exception
class, defined in <exception>. The __foreign_exception may have subclasses, such as
__java_exception and __ada_exception, if the runtime is capable of identifying some of
the foreign languages.
The behavior is undefined in the following cases:
All these cases might involve accessing C++ specific content of the thrown exception,
for instance to chain active exceptions.
Otherwise, a catch block catching a foreign exception is allowed:
• to re-throw the foreign exception. In that case, the original exception object must be
unaltered by the C++ runtime.
A catch-all block may be executed during forced unwinding. For instance, a longjmp
may execute code in a catch(...) during stack unwinding. However, if this happens,
unwinding will proceed at the end of the catch-all block, whether or not there is an explicit
re-throw.
Setting the low 4 bytes of exception class to C++\0 is reserved for use by C++ run-
times compatible with the common C++ ABI.
100
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Fortran, Ada, ...) this information is generated by the compiler itself. However for hand-
written assembly routines the debug info must be provided by the author of the code. To
ease this task some new assembler directives are added:
.cfi_startproc is used at the beginning of each function that should have an entry in
.eh_frame . It initializes some internal data structures and emits architecture de-
pendent initial CFI instructions. Each .cfi_startproc directive has to be closed by
.cfi_endproc.
.cfi_endproc is used at the end of a function where it closes its unwind entry previously
opened by .cfi_startproc and emits it to .eh_frame.
.cfi_def_cfa_offset OFFSET modifies a rule for computing CFA. The register remains
the same, but OFFSET is new. Note that this is the absolute offset that will be added
to a defined register to compute the CFA address.
.cfi_offset REGISTER, OFFSET saves the previous value of REGISTER at offset OFF-
SET from CFA.
.cfi_escape EXPRESSION[, ...] allows the user to add arbitrary bytes to the unwind
info. One might use this to add OS-specific CFI opcodes, or generic CFI opcodes
that the assembler does not support.
101
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 6.1: Examples for Unwinding in Assembler
102
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 7
Development Environment
During compilation of C or C++ code at least the symbols in table 7.1 are defined by the
pre-processor 1 .
1
__LP64 and __LP64__ were added to GCC 3.3 in March, 2003.
103
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 8
Execution Environment
104
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 9
Conventions
1
This chapter is used to document some features special to the AMD64 ABI. The different sections might
be moved to another place or removed completely.
105
AMD64 ABI 1.0 – October 16, 2023 – 13:28
9.1 C++
For the C++ ABI we will use the IA-64 C++ ABI and instantiate it appropriately. The
current draft of that ABI is available at:
http://mentorembedded.github.io/cxx-abi/
106
AMD64 ABI 1.0 – October 16, 2023 – 13:28
9.2 Fortran
A formal Fortran ABI does not exist. Most Fortran compilers are designed for very spe-
cific high performance computing applications, so Fortran compilers use different pass-
ing conventions and memory layouts optimized for their specific purpose. For example,
Fortran applications that must run on distributed memory machines need a different data
representation for array descriptors (also known as dope vectors, or fat pointers) than ap-
plications running on symmetric multiprocessor shared memory machines. A normative
ABI for Fortran is therefore not desirable. However, for interoperability of different For-
tran compilers, as well as for interoperability with other languages, this section provides
some some guidelines for data types representation, and argument passing. The guidelines
in this section are derived from the GNU Fortran 77 (G77) compiler, and are also followed
by the GNU Fortran 95 (gfortran) compiler (restricted to Fortran 77 features). Other For-
tran compilers already available for AMD64 at the time of this writing may use different
conventions, so compatibility is not guaranteed.
When this text uses the term Fortran procedure, the text applies to both Fortran
FUNCTION and SUBROUTINE subprograms as well as for alternate ENTRY points, unless
specifically stated otherwise.
Everything not explicitly defined in this ABI is left to the implementation.
9.2.1 Names
External names in Fortran are names of entities visible to all subprograms at link time.
This includes names of COMMON blocks and Fortran procedures. To avoid name space con-
flicts with linked-in libraries, all external names have to be mangled. And to avoid name
space conflicts of mangled external names with local names, all local names must also be
mangled. The mangling scheme is straightforward as follows:
• all names that do not have any underscores in it should have one underscore ap-
pended
• all external names containing one or more underscores in it (wherever) should have
two underscores appended 2 .
• all external names should be mapped to lower case, following the traditional UNIX
model for Fortran compilers
107
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 9.1: Example mapping of names
The entry point of the main program unit is called MAIN__. The symbol name for the
blank common block is __BLNK__. the external name of the unnamed BLOCK DATA routine
is __BLOCK_DATA__.
The values for type LOGICAL are .TRUE. implemented as 1 and .FALSE. implemented as
0.
3
G77 provides a header g2c.h with the equivalent C type definitions for all supported Fortran scalar
types.
108
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Data objects with a CHARACTER type4 are represented as an array of characters of the
C char type (not guaranteed to be “\0” terminated) with a separate length counter to dis-
tinguish between CHARACTER data objects with a length parameter, and aggregate types of
CHARACTER data objects, possibly also with a length parameter.
Layout of other aggregate types is implementation defined. GNU Fortran puts all
arrays in contiguous memory in column-major order. GNU Fortran 95 builds an equivalent
C struct for derived types without reordering the type fields. Other compilers may use other
representations as needed. The representation and use of Fortran 90/95 array descriptors
is implementation defined. Note that array indices start at 1 by default.
Fortran 90/95 allow different kinds of each basic type using the kind type parameter of
a type. Kind type parameter values are implementation defined.
Layout of he commonly used Cray pointers is implementation defined.
109
AMD64 ABI 1.0 – October 16, 2023 – 13:28
This ABI does not define array functions (function returning arrays). They are allowed
only in Fortran 90/95 and requires the definition of array descriptors.
Note that Fortran 90/95 procedure arguments with the INTENT(IN) attribute should also
passed by reference if the procedure is to be linked with code written in Fortran 77. Fortran
77 does not and can not support the INTENT attribute because it has no concept of explicit
interfaces. It is therefore not possible to declare the callee’s arguments as INTENT(IN). A
Fortran 77 compiler must assume that all procedure arguments are INTENT(INOUT) in the
Fortran 90/95 sense.
9.2.4 Functions
The calling of statement functions is implementation defined (as they are defined only
locally, the compiler has the freedom to apply any calling convention it likes).
Subroutines with alternate returns (e.g. "SUBROUTINE X(*,*)" called as "CALL
X(*10,*20)") are implemented as functions returning an INTEGER of the default kind. The
value of this returned integer is whatever integer is specified in the "RETURN" statement
for the subroutine 5 , or 0 for a RETURN statement without an argument. It is up to the caller
to jump to the corresponding alternate return label. The actual alternate-return arguments
are omitted from the calling sequence.
An example:
SUBROUTINE SHOW_ALTERNATE_RETURN (N)
INTEGER N
CALL ALTERNATE_RETURN_EXAMPLE (N, *10, *20, *30)
WRITE (*,*) ’OK - Normal Return’
RETURN
10 WRITE (*,*) ’1st alternate return’
RETURN
20 WRITE (*,*) ’2nd alternate return’
RETURN
30 WRITE (*,*) ’2nd alternate return’
RETURN
END
110
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Here the SUBROUTINE ALTERNATE_RETURN_EXAMPLE is implemented as a function return-
ing an INTEGER*4 with value 0 if N is 0, 1 if N is 1, 2 if N is 2 and 3 for all other values of N.
This return value is used by the caller as if the actual call were replaced by this sequence:
INTEGER X
X = CALL ALTERNATE_RETURN_EXAMPLE (N)
GOTO (10, 20, 30), X
All in all the effect is that the index of the returned to label (starting from 1) will be
contained in %rax after the call.
Alternate ENTRY points of a SUBROUTINE or FUNCTION should be treated as separate sub-
programs, as mandated by the Fortran standard. I.e. arguments passed to an alternate
ENTRY should be passed as if the alternate ENTRY is a separate SUBROUTINE or FUNCTION. If a
FUNCTION has alternate ENTRY points, the result of each of the alternate ENTRY points must be
returned as if the alternate ENTRY is a separate FUNCTION with the result type of the alternate
ENTRY. The external naming of alternate ENTRY points follows section 9.2.1.
• the layout of the COMMON block must not change if one ignores the EQUIVALENCE, which
amongst other things means:
• If two arrays are equivalenced, the larger array must be named in the COMMON block,
and there must be complete inclusion, in particular the other array may not extend
the size of the equivalenced segment. It may also not change the alignment require-
ment.
• If an array element and a scalar are equivalenced, the array must be named in the
COMMON block and it must not be smaller than the scalar. The type of the scalar must
not require bigger alignment than the array.
• if two scalars are equivalenced they must have the same size and alignment require-
ments.
111
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Other cases are implementation defined.
Because the Fortran standard allows the blank COMMON block to have different sizes in
different subprograms, it may be impossible to determine if it is small enough to fit in the
.bss section. When compiling for the medium or large code models the blank COMMON
block should therefore always be put in the .lbss section.
9.2.6 Intrinsics
This sections lists the set of intrinsics which has to be supported at minimum by a con-
forming compiler. They are separated by origin. They follow regular calling and naming
conventions.
The signature of intrinsics uses the syntax return − type(argtype1, argtype2, ...),
where the individual types can be the following characters: V (as in void) designates
a SUBROUTINE, L a LOGICAL, I an INTEGER, R a REAL, and C a CHARACTER. Hence I(R,L)
designates a FUNCTION returning an INTEGER and taking a REAL and a LOGICAL. If an argument
is an array, this is indicated using a trailing number, e.g. I13 is an INTEGER array with 13
elements. If a CHARACTER argument or return value has a fixed length, this is indicated
using an asterisk and a trailing number, for example C*16 is a CHARACTER(len=16). If a
CHARACTER argument of arbitrary length must be passed, the trailing number is replaced
with N, for example C*N.
112
AMD64 ABI 1.0 – October 16, 2023 – 13:28
BTest (I, Pos) Returns .TRUE. if bit Pos in I is set, returns .FALSE. otherwise.
IAnd (I, J) Returns value resulting from a boolean AND on each pair of bits in I and
J.
IOr (I, J) Returns value resulting from a boolean OR on each pair of bits in I and J.
IEOr (I, J) Returns value resulting from a boolean XOR on each pair of bits in I and
J.
Not (I) Returns value resulting from a boolean NOT on each bit in I.
IBClr (I, Pos) Returns the value of I with bit Pos cleared (set to zero).
IBits (I, Pos, Len) Extracts a subfield starting from bit position Pos and with a
length (towards the most significant bit) of Len bits from I. The result is right-
justified and the remaining bits are zeroed.
IBSet (I, Pos) Returns the value of I with the bit in position Pos set to one.
IShft (I, Shift) All bits of I are shifted Shift places. Shift.GT.0 indicates a left shift,
Shift.EQ.0 indicates no shift, and Shift.LT.0 indicates a right shift. Bits shifted out
from the least (when shifting right) or most (when shifting left) significant position
are lost. Bits shifted in at the opposite end are not set (i.e. zero).
IShftC (I, Shift, Size) The rightmost Size bits of the argument I are shifted circu-
larly Shift places. The unshifted bits of the result are the same as the unshifted bits
of I.
MvBits (From, FromPos, Len, To, ToPos) Move Len bits of From from bit po-
sitions FromPos through FromPos+Len-1 to bit positions ToPos through ToPos+Len-1
of To. The bit portions of To that are not affected by the movement of bits are un-
changed.
113
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 9.2: F77 intrinsics
Name Meaning
Abs Absolute value
ACos Arc cosine
AInt Truncate to whole number
ANInt Round to nearest whole number
ASin Arc sine
ATan Arc Tangent
ATan2 Arc Tangent
Char Character from code
Cmplx Construct COMPLEX(KIND=1) value
Conjg Complex conjugate
Cos Cosine
CosH Hyperbolic cosine
Dble Convert to double precision
DiM Difference magnitude (non-negative subtract)
DProd Double-precision product
Exp Exponential
IChar Code for character
Index Locate a CHARACTER substring
Int Convert to INTEGER value truncated to whole number
Len Length of character entity
LGe Lexically greater than or equal
LGt Lexically greater than
LLe Lexically less than or equal
LLt Lexically less than
Log Natural logarithm
Log10 Common logarithm
Max Maximum value
Min Minimum value
Mod Remainder
NInt Convert to INTEGER value rounded to nearest whole number
Real Convert value to type REAL(KIND=1)
Sin Sine
SinH Hyperbolic sine
SqRt Square root
Tan Tangent
TanH Hyperbolic tangent
114
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Refer to the Fortran 77 language standard for signature and definition of the F77 in-
trinsics listed in table 9.2. These intrinsics can have a prefix as per the standard hence the
table is not exhaustive.
Name Meaning
AChar ASCII character from code
Bit_Size Number of bits in arguments type
CPU_Time Get current CPU time
IAChar ASCII code for character
Len_Trim Get last non-blank character in string
System_Clock Get current system clock value
Refer to the Fortran 90 language standard for signature and definition of the F90 in-
trinsics listed in table 9.3.
BesJ0 (X) Calculates the Bessel function of the first kind of order 0 of X. Returns a REAL
of the same kind as X.
115
AMD64 ABI 1.0 – October 16, 2023 – 13:28
BesJ1 (X) Calculates the Bessel function of the first kind of order 1 of X. Returns a REAL
of the same kind as X.
BesJN (N, X) Calculates the Bessel function of the first kind of order N of X. Returns a
REAL of the same kind as X.
BesY0 (X) Calculates the Bessel function of the second kind of order 0 of X. Returns a
REAL of the same kind as X.
BesY1 (X) Calculates the Bessel function of the second kind of order 1 of X. Returns a
REAL of the same kind as X.
BesYN (N, X) Calculates the Bessel function of the second kind of order N of X. Returns
a REAL of the same kind as X.
ErF (X) Calculates the error function of X. Returns a REAL of the same kind as X.
ErFC (X) Calculates the complementary error function of X, i.e. 1 - ERF(X). Returns a
REAL of the same kind as X.
Rand (Flag) Flag is optional. Returns a uniform quasi-random number between 0 and
1. If Flag .EQ. 0 or Flag is not passed, the next number in sequence is returned. If
Flag .EQ. 1, the generator is restarted. If Flag has any other value, the generator is
restarted with the value of Flag as the new seed.
SRand (Seed) Reinitializes the random number generator for IRand and Rand with the
seed in Seed.
116
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 9.5: Unix intrinsics
117
AMD64 ABI 1.0 – October 16, 2023 – 13:28
DTime (TArray, Result) When called for the first time, returns the number of sec-
onds of runtime since the start of the program in Result, the user component of this
runtime in TArray(1), and the system time in TArray(2). Subsequent invocations
values based on accumulations since the previous invocation.
ETime (TArray, Result) Returns the number of seconds of runtime since the start
of the program in Result, the user component of this runtime in TArray(1), and the
system time in TArray(2). Subsequent invocations values based on accumulations
since the previous invocation.
Flush (Unit) Flushes the Fortran I/O unit with ID Unit. The unit must be open for
output. If the optional Unit argument is omitted, all open units are flushed.
FNum (Unit) Returns the UNIX(tm) file descriptor number corresponding to the Fortran
I/O unit Unit. The unit must be open.
FStat (Unit, SArray, Status) Obtains data about the file open on Fortran I/O
unit Unit and places it in the array SArray. The values in this array are as follows:
1. Device ID
2. Inode number
3. File mode
4. Number of links
5. Owner’s UID
6. Owner’s GID
7. ID of device containing directory entry for file
8. File size (bytes)
9. Last access time
10. Last modification time
11. Last file status change time
12. Preferred I/O block size (-1 if not available)
13. Number of blocks allocated (-1 if not available)
118
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Gerror (Message) Returns the system error message corresponding to the last system
error (errno in C). The message is returned in Message. If Message is longer than the
error message, it is padded with blanks after the message. If Message is not long
enough to hold the error message, the error message is truncated to the length of
Message.
GetArg (Pos, Value) Returns in Value the command-line argument in position Pos. If
there are fever than Pos command-line arguments, Value is filled with blanks. If Pos
is 0, the name of the program is returned. If Value is longer than the command-line
argument, it is padded with blanks after the argument. If Value is not long enough to
hold the command-line argument, the argument is truncated to the length of Value.
GetCWD (Name, Status) Returns in Name the current working directory. If the optional
Status argument is supplied, it contains 0 on success or a nonzero error code upon
return.
GetEnv (Name, Value) Returns in Value the environment variable identified with Name.
If Name has not been set, Value is filled with blanks. A null character marks the end
of the name in Name. Trailing blanks in Name are ignored. If Value is longer than the
environment variable, it is padded with blanks after the variable. If Value is not long
enough to hold the environment variable, the variable is truncated to the length of
Value.
119
AMD64 ABI 1.0 – October 16, 2023 – 13:28
IDate (TArray) Returns the current local date day, month, year in elements 1, 2, and 3
of Tarray, respectively. The year has four significant digits.
ITime (TArray) Returns the current local time hour, minutes, and seconds in elements
1, 2, and 3 of TArray, respectively.
LStat (File, SArray, Status) Obtains data about a file named File and places
places it in the array SArray. The values in this array are as follows:
1. Device ID
2. Inode number
3. File mode
4. Number of links
5. Owner’s UID
6. Owner’s GID
7. ID of device containing directory entry for file
8. File size (bytes)
9. Last access time
10. Last modification time
11. Last file status change time
12. Preferred I/O block size (-1 if not available)
13. Number of blocks allocated (-1 if not available)
Rename (Path1, Path2, Status) Renames the file named Path1 to Path2. A null
character marks the end of the names. Trailing blanks are ignored. If the optional
Status argument is supplied, it contains 0 on success or a nonzero error code upon
return.
120
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Sleep (Seconds) Causes the program to pause for Seconds seconds.
121
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 10
122
AMD64 ABI 1.0 – October 16, 2023 – 13:28
General Dynamic Model Load address of x into %rax
In TLSDESC code sequence, leal instruction must be encoded with rex prefix even
if it isn’t required by destination register. If the leal encoding has a variable length,
linker can’t tell where it starts and can’t safely perform GDesc -> IE/LE optimiza-
tion.
Initial Exec Model Load address of x into %rax. Instruction addl must be encoded with
rex prefix even if it isn’t required by destination register. Otherwise linker can’t
safely perform IE -> LE optimization.
Initial Exec Model, II Load value of x into %edi. %fs:(%eax) memory operand can’t be
used for ILP32 since its effective address is the base address of %fs + value of %eax
zero-extended to a 64-bit result, which is incorrect with negative value in %eax.
123
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 10.4: Initial Exec Model Code Sequence, II
LP64 ILP32
0x00 movq x@gottpoff(%rip),%rax 0x00 movq x@gottpoff(%rip),%rax
0x07 movl %fs:(%rax),%edi 0x07 movl %fs:(%rax),%edi
or
For code sequence with TLSDESC, local dynamic model is similar to general dy-
namic model. The same encoding requirement for leal instruction also applies.
124
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 10.7: General Dynamic Model Code Sequence with TLSDESC
LP64 ILP32
0x00 leaq x@tlsdesc(%rip),%rax 0x00 rex leal x@tlsdesc(%rip),%eax
0x07 call *x@tlsdesc((%rax) 0x07 call *x@tlsdesc(%eax)
0x08 add %fs:0x0,%eax 0x09 add %fs:0x0,%eax
or
125
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 10.11: Local Exec Model Code Sequence, II
LP64 ILP32
0x00 movq %fs:0,%rax 0x00 movl %fs:0,%eax
0x09 movl x@tpoff(%rax),%edi 0x08 movl x@tpoff(%rax),%edi
126
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 10.15: GD -> LE Code Transition
GD LE
0x00 leaq x@tlsgd(%rip),%rdi 0x00 movl %fs:0, %eax
0x07 .word 0x6666 0x08 leal x@tpoff(%rax),%eax
0x09 rex64
0x0a call __tls_get_addr@plt
or
127
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 10.19: IE -> LE Code Transition, II
IE LE
0x00 movq x@gottpoff(%rip),%rax 0x00 movq x@tpoff,%rax
0x07 movl %fs:(%rax),%edi 0x07 movl %fs:(%rax),%edi
or
128
AMD64 ABI 1.0 – October 16, 2023 – 13:28
10.4 Kernel Support
Kernel should limit stack and addresses returned from system calls bewteen 0x00000000
to 0xf f f f f f f f .
129
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Chapter 11
Figure 11.1: Function Call without PLT (Small and Medium Models)
extern void func (void); .globl func
func (void); call *func@GOTPCREL(%rip)
130
AMD64 ABI 1.0 – October 16, 2023 – 13:28
The direct branch is replaced by an indirect branch via the GOT slot, which is similar
to the first instruction in the PLT slot.
Figure 11.2: Function Address without PLT (Small and Medium Models)
extern void func (void); .globl func
void* ptr (void) func:
{ movq func@GOTPCREL(%rip), %rax
return func; ret
}
Instead using the PLT slot as function address, the function address is retrieved from
the GOT slot.
If linker determines the function is defined locally, it converts indirect branch via the
GOT slot to direct branch with a nop prefix and converts load via the GOT slot to load
immediate or lea, see Section B.2 for details.
After dynamic linker resolved all symbols by updating GOT entries with symbol ad-
dresses, GOT can be made read-only and overwriting GOT becomes a hard error imme-
diately. Since PLT is no longer used to call external function, lazy symbol resolution is
disabled and a function can only be interposed during symbol resolution at startup. Tools
and features which depend on lazy symbol resolution will not work properly. However,
there are also a few side benefits:
No extra direct branch to PLT entry Since indirect branch is 6 byte long and direct
branch is 5 byte long, when indirect branch via the GOT slot is used to call a local
function, code size will be increased by one byte for each call. Since one PLT slot
has 16 bytes, there will be code size increase when indirect branch via the GOT slot
is used to call an external function more than 16 times.
Custom calling convention Since external function is called directly via the GOT slot,
instead of invoking dynamic linker to lookup function symbol when called the first
time, parameters can be passed differently from what is specified in this document.
131
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure 11.3: __tls_get_addr Call
Direct via PLT Indirect via GOT
call __tls_get_addr@PLT call *__tls_get_addr@GOTPCREL(%rip)
the following alternate code sequence loads address of x into %rax without PLT:
132
AMD64 ABI 1.0 – October 16, 2023 – 13:28
static __thread int x;
the following alternate code sequence loads the address of the TLS block of the module,
which contains variable x, into %rax without PLT:
GD IE
0x00 .byte 0x66 0x00 movq %fs:0, %rax
0x01 leaq x@tlsgd(%rip),%rdi 0x09 addq x@gottpoff(%rip),%rax
0x09 .byte 0x66
0x0a rex64
0x0b call *__tls_get_addr@GOTPCREL(%rip)
133
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 11.6: GD -> IE Code Transition (ILP32)
GD IE
0x00 leaq x@tlsgd(%rip),%rdi 0x00 movl %fs:0, %eax
0x07 .byte 0x66 0x08 addq x@gottpoff(%rip),%rax
0x08 rex64
0x0a call *__tls_get_addr@GOTPCREL(%rip)
Local Dynamic to Local Exec For local dynamic model to local exec model transition,
linker generates 4 0x66 prefixes, instead of 3, before mov instruction for LP64 and
generate a 5-byte nop, instead of 4-byte, before mov instruction for ILP32. To load
the address of the TLS block of the module, which contains variable x, into %rax
without PLT:
134
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Table 11.9: LD -> LE Code Transition (LP64)
LD LE
0x00 leaq x@tlsld(%rip),%rdi 0x00 .long 0x66666666
0x07 call *__tls_get_addr@GOTPCREL(%rip) 0x04 movq %fs:0,%rax
135
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Appendix A
Linux Conventions
This chapter describes some details that are only relevant to GNU/Linux systems and the
Linux kernel.
136
AMD64 ABI 1.0 – October 16, 2023 – 13:28
1. User-level applications use as integer registers for passing the sequence %rdi, %rsi,
%rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and
%r9.
2. A system-call is done via the syscall instruction. The kernel clobbers registers %rcx
and %r11 but preserves all other registers except %rax.
5. Returning from the syscall, register %rax contains the result of the system-call. A
value in the range between -4095 and -1 indicates an error, it is -errno.
6. Only values of class INTEGER or class MEMORY are passed to the kernel.
137
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Appendix B
Linker Optimization
with an GOT PLT entry with an indirect jump via the GOT slot:
138
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Figure B.2: Procedure Linkage Table Entry Via GOT Slot
and resolves the PLT reference to the GOT PLT entry. Indirect jmp is an 5-byte instruction.
nop can be encoded as a 3-byte instruction or a 11-byte instruction for 8-byte or 16-byte
PLT slot. A separate PLT with 8-byte slots may be used for this optimization.
This optimization isn’t applicable to the STT_GNU_IFUNC symbols since their GOT-
PLT slots are resolved to the selected implementation and their GOT slots are resolved to
their PLT entries.
This optimization must be avoided if pointer equality is needed since the symbol value
won’t be cleared in this case and the dynamic linker won’t update the GOT slot. Otherwise,
the resulting binary will get into an infinite loop at run-time.
Convert call and jmp Convert memory operand of call and jmp into immediate
operand.
Convert mov Convert memory operand of mov into immediate operand. When position-
independent code is disabled and foo is defined locally in the lower 32-bit address
139
AMD64 ABI 1.0 – October 16, 2023 – 13:28
space, memory operand in mov can be converted into immediate operand. Otherwise,
mov must be changed to lea.
Convert Test and Binop Convert memory operand of test and binop into immediate
operand, where binop is one of adc, add, and, cmp, or, sbb, sub, xor instructions,
when position-independent code is disabled.
140
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Index
.cfi_adjust_cfa_offset, 95
.cfi_def_cfa, 95
.cfi_def_cfa_offset, 95
.cfi_def_cfa_register, 95
.cfi_endproc, 95
.cfi_escape, 95
.cfi_offset, 95
.cfi_rel_offset, 95
.cfi_startproc, 95
.eh_frame, 95
%rax, 49
_UA_CLEANUP_PHASE, 83
_UA_FORCE_UNWIND, 82
_UA_SEARCH_PHASE, 82
_Unwind_Context, 81
_Unwind_DeleteException, 81
_Unwind_Exception, 81
_Unwind_ForcedUnwind, 81, 82
_Unwind_GetCFA, 81
_Unwind_GetGR, 81
_Unwind_GetIP, 81
_Unwind_GetIPInfo, 81
_Unwind_GetLanguageSpecificData, 81
_Unwind_GetRegionStart, 81
_Unwind_RaiseException, 81, 82
_Unwind_Resume, 81
_Unwind_SetGR, 81
_Unwind_SetIP, 81
__float128, 9
141
AMD64 ABI 1.0 – October 16, 2023 – 13:28
auxiliary vector, 30
boolean, 11
byte, 9
C++, 100
Call Frame Information tables, 80
code models, 33
Convert call and jmp, 133
Convert mov, 133
Convert Test and Binop, 134
double quadword, 9
doubleword, 9
DT_FINI, 79
DT_FINI_ARRAY, 79
DT_INIT, 79
DT_INIT_ARRAY, 79
DT_JMPREL, 76
DT_PREINIT_ARRAY, 79
DWARF Debugging Information Format, 80, 94
eightbyte, 9
exceptions, 23
exec, 28
fegetround, 80
fourbyte, 9
halfword, 9
142
AMD64 ABI 1.0 – October 16, 2023 – 13:28
Large code model, 34
Large position independent code model, 35
Local Dynamic Model, 118
Local Dynamic Model, II, 119
Local Dynamic to Local Exec, 122
Local Dynamic To Local Exec, II, 122
Local Exec Model, 119
Local Exec Model, II, 119
Local Exec Model, III, 120
longjmp, 81
quadword, 9
R_X86_64_JUMP_SLOT, 76, 77
R_X86_64_TLSDESC, 77
red zone, 16, 131
register save area, 50
signal, 23
sixteenbyte, 9
size_t, 11
Small code model, 34
small code model, 116
Small position independent code model, 35
small position independent code model, 116
terminate(), 82
Thread-Local Storage, 69
twobyte, 9
143
AMD64 ABI 1.0 – October 16, 2023 – 13:28
va_arg, 52
va_list, 51
va_start, 51
word, 9
144
AMD64 ABI 1.0 – October 16, 2023 – 13:28