x86_64

keywords: x86_64, x86, abi

  • 64bit synonyms: x86_64, x64, amd64, intel 64
  • 32bit synonyms: x86, ia32, i386
  • ISA type: CISC
  • Endianness: little

Registers

General purpose register

bytes
[7:0]      [3:0]   [1:0]   [1]   [0]     desc
----------------------------------------------------------
rax        eax     ax      ah    al      accumulator
rbx        ebx     bx      bh    bl      base register
rcx        ecx     cx      ch    cl      counter
rdx        edx     dx      dh    dl      data register
rsi        esi     si      -     sil     source index
rdi        edi     di      -     dil     destination index
rbp        ebp     bp      -     bpl     base pointer
rsp        esp     sp      -     spl     stack pointer
r8-15      rNd     rNw     -     rNb

Special register

bytes
[7:0]      [3:0]     [1:0]      desc
---------------------------------------------------
rflags     eflags    flags      flags register
rip        eip       ip         instruction pointer

FLAGS register

rflags
bits    desc                            instr        comment
--------------------------------------------------------------------------------------------------------------
   [21]   ID   identification                        ability to set/clear -> indicates support for CPUID instr
   [18]   AC   alignment check                       alignment exception for PL 3 (user), requires CR0.AM
[13:12] IOPL   io privilege level
   [11]   OF   overflow flag
   [10]   DF   direction flag           cld/std      increment (0) or decrement (1) registers in string operations
    [9]   IF   interrupt enable         cli/sti
    [7]   SF   sign flag
    [6]   ZF   zero flag
    [4]   AF   auxiliary carry flag
    [2]   PF   parity flag
    [0]   CF   carry flag

Change flag bits with pushf / popf instructions:

pushfd                          // push flags (4bytes) onto stack
or dword ptr [esp], (1 << 18)   // enable AC flag
popfd                           // pop flags (4byte) from stack

There is also pushfq / popfq to push and pop all 8 bytes of rflags.

Model Specific Register (MSR)

rdmsr     // Read MSR register, effectively does EDX:EAX <- MSR[ECX]
wrmsr     // Write MSR register, effectively does MSR[ECX] <- EDX:EAX

Size directives

Explicitly specify size of the operation.

mov  byte ptr [rax], 0xff    // save 1 byte(s) at [rax]
mov  word ptr [rax], 0xff    // save 2 byte(s) at [rax]
mov dword ptr [rax], 0xff    // save 4 byte(s) at [rax]
mov qword ptr [rax], 0xff    // save 8 byte(s) at [rax]

Addressing

mov qword ptr [rax], rbx         // save val in rbx at [rax]
mov qword ptr [imm], rbx         // save val in rbx at [imm]
mov rax, qword ptr [rbx+4*rcx]   // load val at [rbx+4*rcx] into rax

rip relative addressing:

lea rax, [rip+.my_str]       // load addr of .my_str into rax
...
.my_str:
.asciz "Foo"

Load effective address:

mov rax, 2
lea r11, [rax + 3]   // r11 <- 5

String instructions

The operand size of a string instruction is defined by the instruction suffix b | w | d | q.

Source and destination registers are modified according to the direction flag (DF) in the flags register

  • DF=0 increment src/dest registers
  • DF=1 decrement src/dest registers

Following explanation assumes byte operands with DF=0:

movsb   // move data from string to string
        // ES:[DI] <- DS:[SI]
        // DI <- DI + 1
        // SI <- SI + 1

lodsb   // load string
        // AL <- DS:[SI]
        // SI <- SI + 1

stosb   // store string
        // ES:[DI] <- AL
        // DI <- DI + 1

cmpsb   // compare string operands
        // DS:[SI] - ES:[DI]    ; set status flag (eg ZF)
        // SI <- SI + 1
        // DI <- DI + 1

scasb   // scan string
        // AL - ES:[DI]         ; set status flag (eg ZF)
        // DI <- DI + 1

String operations can be repeated:

rep     // repeat until rcx = 0
repz    // repeat until rcx = 0 or while ZF = 0
repnz   // repeat until rcx = 0 or while ZF = 1

Example: Simple memset

// memset (dest, 0xaa /* char */, 0x10 /* len */)

lea di, [dest]
mov al, 0xaa
mov cx, 0x10
rep stosb

Time stamp counter - rdtsc

static inline uint64_t rdtsc() {
  uint32_t eax, edx;
  asm volatile("rdtsc" : "=d"(edx), "=a"(eax)::);
  return (uint64_t)edx << 32 | eax;
}

Constant TSC behavior ensures that the duration of each clock tick is uniform and supports the use of the TSC as a wall clock timer even if the processor core changes frequency. This is the architectural behavior moving forward.

On linux one can check the constant_tsc cpu flag, to validate if the implemented TSC ticks with a constant frequency.

grep constant_tsc /proc/cpuinfo

SysV x86_64 ABI

Passing arguments to functions

  • Integer/Pointer arguments
    reg     arg
    -----------
    rdi       1
    rsi       2
    rdx       3
    rcx       4
    r8        5
    r9        6
    
  • Floating point arguments
    reg     arg
    -----------
    xmm0      1
      ..     ..
    xmm7      8
    
  • Additional arguments are passed on the stack. Arguments are pushed right-to-left (RTL), meaning next arguments are closer to current rsp.

Return values from functions

  • Integer/Pointer return values
    reg          size
    -----------------
    rax        64 bit
    rax+rdx   128 bit
    
  • Floating point return values
    reg            size
    -------------------
    xmm0         64 bit
    xmm0+xmm1   128 bit
    

Caller saved registers

Caller must save these registers if they should be preserved across function calls.

  • rax
  • rcx
  • rdx
  • rsi
  • rdi
  • rsp
  • r8 - r11

Callee saved registers

Caller can expect these registers to be preserved across function calls. Callee must must save these registers in case they are used.

  • rbx
  • rbp
  • r12r15

Stack

  • grows downwards
  • frames aligned on 16 byte boundary
    Hi ADDR
     |                +------------+
     |                | prev frame |
     |                +------------+ <--- 16 byte aligned (X & ~0xf)
     |       [rbp+8]  | saved RIP  |
     |       [rbp]    | saved RBP  |
     |       [rbp-8]  | func stack |
     |                | ...        |
     v                +------------+
    Lo ADDR
    

Function prologue & epilogue

  • prologue
    push rbp        // save caller base pointer
    mov rbp, rsp    // save caller stack pointer
    
  • epilogue
    mov rsp, rbp    // restore caller stack pointer
    pop rbp         // restore caller base pointer
    

    Equivalent to leave instruction.

Windows x64 ABI

Passing arguments to functions (ref)

A single argument is never spread across multiple registers.

  • Integer/Pointer arguments
    reg     arg
    -----------
    rcx       1
    rdx       2
    r8        3
    r9        4
    
  • Floating point arguments
    reg     arg
    -----------
    xmm0      1
      ..     ..
    xmm3      4
    
  • Additional arguments are passed on the stack. Arguments are pushed right-to-left (RTL), meaning next arguments are closer to current rsp. See example.

Return values from functions

  • Integer/Pointer return values
    reg          size
    -----------------
    rax        64 bit
    
  • Floating point return values
    reg            size
    -------------------
    xmm0         64 bit
    

Caller saved registers

Caller must save these registers if they should be preserved across function calls.

  • rax
  • rcx
  • rdx
  • r8 - r11
  • xmm0 - xmm5

Callee saved registers

Caller can expect these registers to be preserved across function calls. Callee must must save these registers in case they are used.

  • rbx
  • rbp
  • rdi
  • rsi
  • rsp
  • r12 - r15
  • xmm6 - xmm15

ASM skeleton - linux userspace

Small assembler skeleton, ready to use with following properties:

  • use raw Linux syscalls (man 2 syscall for ABI)
  • no C runtime (crt)
  • gnu assembler gas
  • intel syntax
# file: greet.s

    .intel_syntax noprefix

    .section .text, "ax", @progbits
    .global _start
_start:
    mov rdi, 1                      # fd
    lea rsi, [rip + greeting]       # buf
    mov rdx, [rip + greeting_len]   # count
    mov rax, 1                      # write(2) syscall nr
    syscall

    mov rdi, 0                      # exit code
    mov rax, 60                     # exit(2) syscall nr
    syscall

    .section .rdonly, "a", @progbits
greeting:
    .asciz "Hi ASM-World!\n"
greeting_len:
    .int .-greeting

Syscall numbers are defined in /usr/include/asm/unistd.h.

To compile and run:

> gcc -o greet greet.s -nostartfiles -nostdlib && ./greet
Hi ASM-World!

MBR boot sectors example

The following shows a non-minimal MBR boot sector, which transitions from 16-bit real mode to 32-bit protected mode by setting up a small global descriptor table (GDT). A string is printed in each mode.

.code16
.intel_syntax noprefix

.section .boot, "ax", @progbits
    // Disable interrupts.
    cli

    // Clear segment selectors.
    xor ax, ax
    mov ds, ax
    mov es, ax
    mov ss, ax
    mov fs, ax
    mov gs, ax

    // Set cs to 0x0000, as some BIOSes load the MBR to either 07c0:0000 or 0000:7c000.
    jmp 0x0000:entry_rm16

entry_rm16:
    // Set video mode 3, see [1].
    //   * 80x25 text mode
    //   * 640x200 pixel resolution (8x8 pixel per char)
    //   * 16 colors (4bit)
    //   * 4 pages
    //   * 0xB800 screen address
    //
    // [1] http://www.ctyme.com/intr/rb-0069.htm
    mov ax, 0x3
    int 0x10

    // Move cursor to second row.
    // http://www.ctyme.com/intr/rb-0087.htm
    mov ah, 0x02
    mov bh, 0  // page
    mov dh, 1  // row
    mov dl, 0  // col
    int 0x10

    // Clear direction flag for lodsb below.
    cld

    // Load pointer to msg_rm string (null terminated).
    lea si, [msg_rm]

    // Teletype output char at current cursor position.
    // http://www.ctyme.com/intr/rb-0106.htm
    mov ah, 0x0e
1:
    lodsb         // al <- ds:si ; si+=1 ; (al char to write)
    test al,al    // test for null terminator
    jz 2f
    int 0x10
    jmp 1b
2:

    // Enable A20 address line.
    in al, 0x92
    or al, 2
    out 0x92, al

    // Load GDT descriptor.
    lgdt [gdt_desc]

    // Enable protected mode (set CR0.PE bit).
    mov eax, cr0
    or  eax, (1 << 0)
    mov cr0, eax

    // Far jump which loads segment selector (0x0008) into cs.
    // 0x0008 -> RPL=0, TI=0(GDT), I=1
    jmp 0x0008:entry_pm32

.code32
entry_pm32:
    // Select data segment selector (0x0010) for ds.
    mov ax, gdt_data - gdt
    mov ds, ax

    // Write through VGA interface (video memory).
    // Each character is represented by 2 bytes.
    //   4 bit bg | 4 bit fg | 8 bit ascii char
    //
    // Start writing at third line.
    mov edi, 0xb8000 + (80 * 2 * 2) //

    lea esi, [msg_pm]
1:
    lodsb           // al <- ds:esi ; esi+=1
    test al, al     // test for null terminator
    jz 2f
    or eax, 0x1f00  // blue bg, white fg
    stosw           // ds:[edi] <- ax; edi+=2
    jmp 1b
2:
    hlt
    jmp 2b

// For simplicity keep data used by boot sector in the same section.
.balign 8
msg_rm:
    .asciz "Hello from Real Mode!"
msg_pm:
    .asciz "Hello from Protected Mode!"

.balign 8
gdt:
    .8byte 0x0000000000000000 // 0x00 | null descriptor
    .8byte 0x00cf9a000000ffff // 0x08 | 32 bit, code (rx), present, dpl=0, g=4K, base=0, limit=fffff
gdt_data:
    .8byte 0x00cf92000000ffff // 0x10 | 32 bit, data (rw), present, dpl=0, g=4K, base=0, limit=fffff
gdt_desc:
    .2byte .-gdt-1  // size
    .4byte gdt      // address

// Write MBR boot magic value.
.fill 510 - (. - .boot), 1, 0x00
.2byte 0xaa55

The linker script.

OUTPUT_FORMAT(elf32-i386)
OUTPUT_ARCH(i386)

SECTIONS {
    . = 0x7c00;
    .boot     : { *(.boot) }
    _boot_end = .;
    /DISCARD/ : { *(.*) }

    ASSERT(_boot_end - 0x7c00 == 512, "boot sector must be exact 512 bytes")
}

The build instructions.

mbr: mbr.ld mbr.o
	ld -o $@.elf -nostdlib -T $^
	objcopy -O binary $@.elf $@

mbr.o: mbr.S
	gcc -c -o $@ -m32 -ffreestanding $^

One can boot into the bootsector from legacy BIOS, either with qemu or by writing the mbr boot sector as first sector onto a usb stick.

qemu-system-i386 -hda mbr

The following gives some more detailed description for the segment selector registers, the segment descriptors in the GDT, and the GDT descriptor itself.

# Segment Selector (cs, ds, es, ss, fs, gs).

[15:3] I   Descriptor Index
 [2:1] TI  Table Indicator (0=GTD | 1=LDT)
   [0] RPL Requested Privilege Level


# Segment Descriptor (2 x 4 byte words).

0x4 [31:24] Base[31:24]
0x4    [23] G            Granularity, scaling of limit (0=1B | 1=4K)
0x4    [22] D/B          (0=16bit | 1=32bit)
0x4    [21] L            (0=compatibility mode | 1=64bit code) if 1 -> D/B = 0
0x4    [20] AVL          Free use for system sw
0x4 [19:16] Limit[19:16]
0x4    [15] P            Present
0x4 [14:13] DPL          Descriptor privilege level
0x4    [12] S            (0=system segment | 1=code/data)
0x4  [11:0] Type         Code or data and access information.
0x4   [7:0] Base[23:16]

0x0 [31:16] Base[15:0]
0x0  [15:0] Limit[15:0]


# GDT descriptor (32bit mode)

[47:16] Base address of GDT table.
 [15:0] Length of GDT table.

References