x86_64
keywords: x86_64, x86, abi
- 64bit synonyms:
x86_64
,x64
,amd64
,intel 64
- 32bit synonyms:
x86
,ia32
,i386
- ISA type:
CISC
- Endianness:
little
Registers
General purpose register
bytes
[7:0] [3:0] [1:0] [1] [0] desc
----------------------------------------------------------
rax eax ax ah al accumulator
rbx ebx bx bh bl base register
rcx ecx cx ch cl counter
rdx edx dx dh dl data register
rsi esi si - sil source index
rdi edi di - dil destination index
rbp ebp bp - bpl base pointer
rsp esp sp - spl stack pointer
r8-15 rNd rNw - rNb
Special register
bytes
[7:0] [3:0] [1:0] desc
---------------------------------------------------
rflags eflags flags flags register
rip eip ip instruction pointer
FLAGS register
rflags
bits desc instr comment
--------------------------------------------------------------------------------------------------------------
[21] ID identification ability to set/clear -> indicates support for CPUID instr
[18] AC alignment check alignment exception for PL 3 (user), requires CR0.AM
[13:12] IOPL io privilege level
[11] OF overflow flag
[10] DF direction flag cld/std increment (0) or decrement (1) registers in string operations
[9] IF interrupt enable cli/sti
[7] SF sign flag
[6] ZF zero flag
[4] AF auxiliary carry flag
[2] PF parity flag
[0] CF carry flag
Change flag bits with pushf
/ popf
instructions:
pushfd // push flags (4bytes) onto stack
or dword ptr [esp], (1 << 18) // enable AC flag
popfd // pop flags (4byte) from stack
There is also
pushfq
/popfq
to push and pop all 8 bytes ofrflags
.
Model Specific Register (MSR)
rdmsr // Read MSR register, effectively does EDX:EAX <- MSR[ECX]
wrmsr // Write MSR register, effectively does MSR[ECX] <- EDX:EAX
Size directives
Explicitly specify size of the operation.
mov byte ptr [rax], 0xff // save 1 byte(s) at [rax]
mov word ptr [rax], 0xff // save 2 byte(s) at [rax]
mov dword ptr [rax], 0xff // save 4 byte(s) at [rax]
mov qword ptr [rax], 0xff // save 8 byte(s) at [rax]
Addressing
mov qword ptr [rax], rbx // save val in rbx at [rax]
mov qword ptr [imm], rbx // save val in rbx at [imm]
mov rax, qword ptr [rbx+4*rcx] // load val at [rbx+4*rcx] into rax
rip
relative addressing:
lea rax, [rip+.my_str] // load addr of .my_str into rax
...
.my_str:
.asciz "Foo"
Load effective address:
mov rax, 2
lea r11, [rax + 3] // r11 <- 5
String instructions
The operand size of a string instruction is defined by the instruction suffix
b | w | d | q
.
Source and destination registers are modified according to the direction flag (DF)
in the flags
register
DF=0
increment src/dest registersDF=1
decrement src/dest registers
Following explanation assumes byte
operands with DF=0
:
movsb // move data from string to string
// ES:[DI] <- DS:[SI]
// DI <- DI + 1
// SI <- SI + 1
lodsb // load string
// AL <- DS:[SI]
// SI <- SI + 1
stosb // store string
// ES:[DI] <- AL
// DI <- DI + 1
cmpsb // compare string operands
// DS:[SI] - ES:[DI] ; set status flag (eg ZF)
// SI <- SI + 1
// DI <- DI + 1
scasb // scan string
// AL - ES:[DI] ; set status flag (eg ZF)
// DI <- DI + 1
String operations can be repeated:
rep // repeat until rcx = 0
repz // repeat until rcx = 0 or while ZF = 0
repnz // repeat until rcx = 0 or while ZF = 1
Example: Simple memset
// memset (dest, 0xaa /* char */, 0x10 /* len */)
lea di, [dest]
mov al, 0xaa
mov cx, 0x10
rep stosb
Time stamp counter - rdtsc
static inline uint64_t rdtsc() {
uint32_t eax, edx;
asm volatile("rdtsc" : "=d"(edx), "=a"(eax)::);
return (uint64_t)edx << 32 | eax;
}
Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward.
- 18.17 TIME-STAMP COUNTER - intel64-vol3
On linux one can check the constant_tsc
cpu flag, to validate if the
implemented TSC ticks with a constant frequency.
grep constant_tsc /proc/cpuinfo
SysV x86_64 ABI
Passing arguments to functions
- Integer/Pointer arguments
reg arg ----------- rdi 1 rsi 2 rdx 3 rcx 4 r8 5 r9 6
- Floating point arguments
reg arg ----------- xmm0 1 .. .. xmm7 8
- Additional arguments are passed on the stack. Arguments are pushed
right-to-left (RTL), meaning next arguments are closer to current
rsp
.
Return values from functions
- Integer/Pointer return values
reg size ----------------- rax 64 bit rax+rdx 128 bit
- Floating point return values
reg size ------------------- xmm0 64 bit xmm0+xmm1 128 bit
Caller saved registers
Caller must save these registers if they should be preserved across function calls.
rax
rcx
rdx
rsi
rdi
rsp
r8
-r11
Callee saved registers
Caller can expect these registers to be preserved across function calls. Callee must must save these registers in case they are used.
rbx
rbp
r12
–r15
Stack
- grows downwards
- frames aligned on 16 byte boundary
Hi ADDR | +------------+ | | prev frame | | +------------+ <--- 16 byte aligned (X & ~0xf) | [rbp+8] | saved RIP | | [rbp] | saved RBP | | [rbp-8] | func stack | | | ... | v +------------+ Lo ADDR
Function prologue & epilogue
- prologue
push rbp // save caller base pointer mov rbp, rsp // save caller stack pointer
- epilogue
mov rsp, rbp // restore caller stack pointer pop rbp // restore caller base pointer
Equivalent to
leave
instruction.
Windows x64 ABI
Passing arguments to functions (ref)
A single argument is never spread across multiple registers.
- Integer/Pointer arguments
reg arg ----------- rcx 1 rdx 2 r8 3 r9 4
- Floating point arguments
reg arg ----------- xmm0 1 .. .. xmm3 4
- Additional arguments are passed on the stack. Arguments are pushed
right-to-left (RTL), meaning next arguments are closer to current
rsp
. See example.
Return values from functions
- Integer/Pointer return values
reg size ----------------- rax 64 bit
- Floating point return values
reg size ------------------- xmm0 64 bit
Caller saved registers
Caller must save these registers if they should be preserved across function calls.
rax
rcx
rdx
r8
-r11
xmm0
-xmm5
Callee saved registers
Caller can expect these registers to be preserved across function calls. Callee must must save these registers in case they are used.
rbx
rbp
rdi
rsi
rsp
r12
-r15
xmm6
-xmm15
ASM skeleton - linux userspace
Small assembler skeleton, ready to use with following properties:
- use raw Linux syscalls (
man 2 syscall
for ABI) - no
C runtime (crt)
- gnu assembler
gas
- intel syntax
# file: greet.s
.intel_syntax noprefix
.section .text, "ax", @progbits
.global _start
_start:
mov rdi, 1 # fd
lea rsi, [rip + greeting] # buf
mov rdx, [rip + greeting_len] # count
mov rax, 1 # write(2) syscall nr
syscall
mov rdi, 0 # exit code
mov rax, 60 # exit(2) syscall nr
syscall
.section .rdonly, "a", @progbits
greeting:
.asciz "Hi ASM-World!\n"
greeting_len:
.int .-greeting
Syscall numbers are defined in
/usr/include/asm/unistd.h
.
To compile and run:
> gcc -o greet greet.s -nostartfiles -nostdlib && ./greet
Hi ASM-World!
MBR boot sectors example
The following shows a non-minimal MBR boot sector, which transitions from 16-bit real mode to 32-bit protected mode by setting up a small global descriptor table (GDT). A string is printed in each mode.
.code16
.intel_syntax noprefix
.section .boot, "ax", @progbits
// Disable interrupts.
cli
// Clear segment selectors.
xor ax, ax
mov ds, ax
mov es, ax
mov ss, ax
mov fs, ax
mov gs, ax
// Set cs to 0x0000, as some BIOSes load the MBR to either 07c0:0000 or 0000:7c000.
jmp 0x0000:entry_rm16
entry_rm16:
// Set video mode 3, see [1].
// * 80x25 text mode
// * 640x200 pixel resolution (8x8 pixel per char)
// * 16 colors (4bit)
// * 4 pages
// * 0xB800 screen address
//
// [1] http://www.ctyme.com/intr/rb-0069.htm
mov ax, 0x3
int 0x10
// Move cursor to second row.
// http://www.ctyme.com/intr/rb-0087.htm
mov ah, 0x02
mov bh, 0 // page
mov dh, 1 // row
mov dl, 0 // col
int 0x10
// Clear direction flag for lodsb below.
cld
// Load pointer to msg_rm string (null terminated).
lea si, [msg_rm]
// Teletype output char at current cursor position.
// http://www.ctyme.com/intr/rb-0106.htm
mov ah, 0x0e
1:
lodsb // al <- ds:si ; si+=1 ; (al char to write)
test al,al // test for null terminator
jz 2f
int 0x10
jmp 1b
2:
// Enable A20 address line.
in al, 0x92
or al, 2
out 0x92, al
// Load GDT descriptor.
lgdt [gdt_desc]
// Enable protected mode (set CR0.PE bit).
mov eax, cr0
or eax, (1 << 0)
mov cr0, eax
// Far jump which loads segment selector (0x0008) into cs.
// 0x0008 -> RPL=0, TI=0(GDT), I=1
jmp 0x0008:entry_pm32
.code32
entry_pm32:
// Select data segment selector (0x0010) for ds.
mov ax, gdt_data - gdt
mov ds, ax
// Write through VGA interface (video memory).
// Each character is represented by 2 bytes.
// 4 bit bg | 4 bit fg | 8 bit ascii char
//
// Start writing at third line.
mov edi, 0xb8000 + (80 * 2 * 2) //
lea esi, [msg_pm]
1:
lodsb // al <- ds:esi ; esi+=1
test al, al // test for null terminator
jz 2f
or eax, 0x1f00 // blue bg, white fg
stosw // ds:[edi] <- ax; edi+=2
jmp 1b
2:
hlt
jmp 2b
// For simplicity keep data used by boot sector in the same section.
.balign 8
msg_rm:
.asciz "Hello from Real Mode!"
msg_pm:
.asciz "Hello from Protected Mode!"
.balign 8
gdt:
.8byte 0x0000000000000000 // 0x00 | null descriptor
.8byte 0x00cf9a000000ffff // 0x08 | 32 bit, code (rx), present, dpl=0, g=4K, base=0, limit=fffff
gdt_data:
.8byte 0x00cf92000000ffff // 0x10 | 32 bit, data (rw), present, dpl=0, g=4K, base=0, limit=fffff
gdt_desc:
.2byte .-gdt-1 // size
.4byte gdt // address
// Write MBR boot magic value.
.fill 510 - (. - .boot), 1, 0x00
.2byte 0xaa55
The linker script.
OUTPUT_FORMAT(elf32-i386)
OUTPUT_ARCH(i386)
SECTIONS {
. = 0x7c00;
.boot : { *(.boot) }
_boot_end = .;
/DISCARD/ : { *(.*) }
ASSERT(_boot_end - 0x7c00 == 512, "boot sector must be exact 512 bytes")
}
The build instructions.
mbr: mbr.ld mbr.o
ld -o $@.elf -nostdlib -T $^
objcopy -O binary $@.elf $@
mbr.o: mbr.S
gcc -c -o $@ -m32 -ffreestanding $^
One can boot into the bootsector from legacy BIOS, either with qemu or by writing the mbr boot sector as first sector onto a usb stick.
qemu-system-i386 -hda mbr
The following gives some more detailed description for the segment selector registers, the segment descriptors in the GDT, and the GDT descriptor itself.
# Segment Selector (cs, ds, es, ss, fs, gs).
[15:3] I Descriptor Index
[2:1] TI Table Indicator (0=GTD | 1=LDT)
[0] RPL Requested Privilege Level
# Segment Descriptor (2 x 4 byte words).
0x4 [31:24] Base[31:24]
0x4 [23] G Granularity, scaling of limit (0=1B | 1=4K)
0x4 [22] D/B (0=16bit | 1=32bit)
0x4 [21] L (0=compatibility mode | 1=64bit code) if 1 -> D/B = 0
0x4 [20] AVL Free use for system sw
0x4 [19:16] Limit[19:16]
0x4 [15] P Present
0x4 [14:13] DPL Descriptor privilege level
0x4 [12] S (0=system segment | 1=code/data)
0x4 [11:0] Type Code or data and access information.
0x4 [7:0] Base[23:16]
0x0 [31:16] Base[15:0]
0x0 [15:0] Limit[15:0]
# GDT descriptor (32bit mode)
[47:16] Base address of GDT table.
[15:0] Length of GDT table.
References
- SystemV AMD64 ABI
- AMD64 Vol1: Application Programming
- AMD64 Vol2: System Programming
- AMD64 Vol3: General-Purpose & System Instructions
- X86_64 Cheat-Sheet
- Intel 64 Vol1: Basic Architecture
- Intel 64 Vol2: Instruction Set Reference
- Intel 64 Vol3: System Programming Guide
- GNU Assembler
- GNU Assembler Directives
- GNU Assembler
x86_64
dependent features juicebox-asm
anx86_64
jit assembler playground