Prefix

This is a very opinionated alternative Instruction Set Reference for the x86-64 Architecture written in mdbook.

The goal of this reference is to prioritize readability and simplicity over accuracy or completeness. Thus, for any project with stakes or anything that requires more complicated instructions, grab the official Intel x86-64 Software Developer Manual instead.

This reference is nowhere near finished. Most Instructions are missing and some of the ones that are included have only been slightly altered from the original Intel Manual. The end goal is to give simple examples and simplify the Operation Pseudocode enough to make it easy and understandable for Assembly beginners, who, for example, wouldn’t care much about the specific opcode encoding of instructions.

From my personal experience using the Intel Manual, it is littered with inconsistencies and often annoyingly complicated tech-talk that will do nothing but confuse newbies, which is the reason I created this book.

Registers

x86-64

64-bit	32-bit	16-bit	8-bit high	8-bit low
`rax`	`eax`	`ax`	`ah`	`al`
`rbx`	`ebx`	`bx`	`bh`	`bl`
`rcx`	`ecx`	`cx`	`ch`	`cl`
`rdx`	`edx`	`dx`	`dh`	`dl`
`rdi`	`edi`	`di`		`dil`
`rsi`	`esi`	`si`		`sil`
`rbp`	`ebp`	`bp`		`bpl`
`rsp`	`esp`	`sp`		`spl`

`r8`	`r8d`	`r8w`		`r8b`
`r9`	`r9d`	`r9w`		`r9b`
`r10`	`r10d`	`r10w`		`r10b`
`r11`	`r11d`	`r11w`		`r11b`
`r12`	`r12d`	`r12w`		`r12b`
`r13`	`r13d`	`r13w`		`r13b`
`r14`	`r14d`	`r14w`		`r14b`
`r15`	`r15d`	`r15w`		`r15b`

SSE

128-bit
`xmm0`
`xmm1`
`xmm2`
`xmm3`
`xmm4`
`xmm5`
`xmm6`
`xmm7`
`xmm8`
`xmm9`
`xmm10`
`xmm11`
`xmm12`
`xmm13`
`xmm14`
`xmm15`

Flags

CF: Carry flag
Set if an arithmetic operation generates a carry or a borrow out of the most-significant bit of the result; cleared otherwise.
This flag indicates an overflow condition for unsigned-integer arithmetic. It is also used in multiple-precision arithmetic.
PF: Parity flag
Set if the least-significant byte of the result contains an even number of 1 bits; cleared otherwise.
AF: Auxiliary Carry flag/Adjust flag
Set if an arithmetic operation generates a carry or a borrow out of bit 3 of the result; cleared otherwise.
This flag is used in binary-coded decimal (BCD) arithmetic.
ZF: Zero flag
Set if the result is zero; cleared otherwise.
SF: Sign flag
Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.)
OF: Overflow flag
Set if the integer result is too large a positive number or too small a negative number (excluding the sign-bit) to fit in the destination operand; cleared otherwise.
This flag indicates an overflow condition for signed-integer (two’s complement) arithmetic.

Of these status flags, only the CF flag can be modified directly, using the STC, CLC, and CMC instructions. Also the bit instructions (BT, BTS, BTR, and BTC) copy a specified bit into the CF flag.

Condition Codes

Condition Code	Name	Definition
`e`, `z`	Equal, Zero	`ZF == 1`
`ne`, `nz`	Not Equal, Not Zero	`ZF == 0`
`o`	Overflow	`OF == 1`
`no`	No Overflow	`OF == 0`
`s`	Signed	`SF == 1`
`ns`	Not Signed	`SF == 0`
`p`	Parity	`PF == 1`
`np`	No Parity	`PF == 0`

`c`, `b`, `nae`	Carry, Below, Not Above or Equal	`CF == 1`
`nc`, `nb`, `ae`	No Carry, Not Below, Above or Equal	`CF == 0`
`a`, `nbe`	Above, Not Below or Equal	`CF == 0` & `ZF == 0`
`na`, `be`	Not Above, Below or Equal	`CF == 1` \| `ZF == 1`

`ge`, `nl`	Greater or Equal, Not Less	`SF == OF`
`nge`, `l`	Not Greater or Equal, Less	`SF != OF`
`g`, `nle`	Greater, Not Less or Equal	`ZF == 0` & `SF == OF`
`ng`, `le`	Not Greater, Less or Equal	`ZF == 1` \| `SF != OF`

_{adapted from https://riptutorial.com/x86/example/6976/flags-register}

Instructions

How to read Instructions

Operation

MSB: Most Significant Bit
LSB: Least Significant Bit

The Syntax for the Operations closely follows the Rust syntax. Thus, Operators like ^ for Bitwise XOR, & for Bitwise AND and | for Bitwise OR are used.

Instruction Set Summary

Data Transfer Instructions

Mnemonic	Summary
`MOV`	Move
`XCHG`	Exchange
`PUSH`	Push Onto Stack
`POP`	Pop Off of Stack

Binary Arithmetic Instructions

Mnemonic	Summary
`ADD`	Integer Add
`SUB`	Subtract
`IMUL`	Signed Multiply
`MUL`	Unsigned Multiply
`IDIV`	Signed Divide
`DIV`	Unsigned Divide
`INC`	Increment
`DEC`	Decrement
`NEG`	Negate
`CMP`	Compare

Logical Instructions

Mnemonic	Summary
`AND`	Perform Bitwise Logical AND
`OR`	Perform Bitwise Logical OR
`XOR`	Perform Bitwise Logical Exclusive OR
`NOT`	Perform Bitwise Logical NOT

Shift and Rotate Instructions

Mnemonic	Summary
`SAR`	Shift Arithmetic Right
`SHR`	Shift Logical Right
`SAL`\|`SHL`	Shift Arithmetic Left/Shift Logical Left

Bit and Byte Instructions

Mnemonic	Summary
`TEST`	Logical Compare

Control Transfer Instructions

Mnemonic	Summary
`JMP`	Jump
`Jcc`	Jump if `cc`
`CALL`	Call Procedure
`RET`	Return

Miscellaneous Instructions

Mnemonic	Summary
`LEA`	Load Effective Address
`NOP`	No Operation

Data Transfer Instructions

Mnemonic	Summary
`MOV`	Move
`XCHG`	Exchange
`PUSH`	Push Onto Stack
`POP`	Pop Off of Stack

`MOV`

Move

Instruction	Description
`MOV r/m8, r8`	Move `r8` to `r/m8`
`MOV r/m16, r16`	Move `r16` to `r/m16`
`MOV r/m32, r32`	Move `r32` to `r/m32`
`MOV r/m64, r64`	Move `r64` to `r/m64`

`MOV r8, r/m8`	Move `r/m8` to `r8`
`MOV r16, r/m16`	Move `r/m16` to `r16`
`MOV r32, r/m32`	Move `r/m32` to `r32`
`MOV r64, r/m64`	Move `r/m64` to `r64`

`MOV r/m8, imm8`	Move `imm8` to `r/m8`
`MOV r/m16, imm16`	Move `imm16` to `r/m16`
`MOV r/m32, imm32`	Move `imm32` to `r/m32`
`MOV r64, imm64`	Move `imm64` to `r64`
`MOV r/m64, imm32`	Move `imm32` sign-extended to `r/m64`

Description

Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register, segment register, or memory location; the destination register can be a general-purpose register, segment register, or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword.

The MOV instruction cannot be used to load the CS register. Attempting to do so results in an invalid opcode exception (#UD). To load the CS register, use the far JMP, CALL, or RET instruction.

If the destination operand is a segment register (DS, ES, FS, GS, or SS), the source operand must be a valid segment selector. In protected mode, moving a segment selector into a segment register automatically causes the segment descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register. While loading this information, the segment selector and segment descriptor information is validated (see the Operation algorithm below). The segment descriptor data is obtained from the GDT or LDT entry for the specified segment selector.

A NULL segment selector (values 0000-0003) can be loaded into the DS, ES, FS, and GS registers without causing a protection exception. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a NULL value causes a general protection exception (#GP) and no memory reference occurs.

Loading the SS register with a MOV instruction suppresses or inhibits some debug exceptions and inhibits interrupts on the following instruction boundary. (The inhibition ends after delivery of an exception or the execution of the next instruction.) This behavior allows a stack pointer to be loaded into the ESP register with the next instruction (MOV ESP, stack-pointer value) before an event can be delivered. See Section 6.8.3, “Masking Exceptions and Interrupts When Switching Stacks,” in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. Intel recommends that software use the LSS instruction to load the SS register and ESP together.

When executing MOV Reg, Sreg, the processor copies the content of Sreg to the 16 least significant bits of the general-purpose register. The upper bits of the destination register are zero for most IA-32 processors (Pentium Pro processors and later) and all Intel 64 processors, with the exception that bits 31:16 are undefined for Intel Quark X1000 processors, Pentium and earlier processors.

In 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST: first operand
SRC: second operand

DEST = SRC;

Flags Affected

None.

`XCHG`

Exchange Register/Memory with Register

Instruction	Description
`XCHG r/m8, r8`	Exchange `r/m8` with `r8`
`XCHG r/m16 r16`	Exchange `r/m16` with `r16`
`XCHG r/m32 r32`	Exchange `r/m32` with `r32`
`XCHG r/m64 r64`	Exchange `r/m64` with `r64`

`XCHG r8, r/m8`	Exchange `r8` with `r/m8`
`XCHG r16, r/m16`	Exchange `r16` with `r/m16`
`XCHG r32, r/m32`	Exchange `r32` with `r/m32`
`XCHG r64, r/m64`	Exchange `r64` with `r/m64`

Description

Exchanges the contents of the destination (first) and source (second) operands. The operands can be two general-purpose registers or a register and a memory location. If a memory operand is referenced, the processor’s locking protocol is automatically implemented for the duration of the exchange operation, regardless of the presence or absence of the LOCK prefix or of the value of the IOPL. (See the LOCK prefix description in this chapter for more information on the locking protocol.)

This instruction is useful for implementing semaphores or similar data structures for process synchronization. (See “Bus Locking” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information on bus locking.)

The XCHG instruction can also be used instead of the BSWAP instruction for 16-bit operands.

In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.

XCHG (E)AX, (E)AX (encoded instruction byte is 90H) is an alias for NOP regardless of data size prefixes, including REX.W.

Operation

DEST: first operand
SRC: second operand

TEMP = DEST;
DEST = SRC;
SRC = TEMP;

Flags Affected

None.

`PUSH`

Push Word, Doubleword or Quadword Onto the Stack

Instruction	Description
`PUSH r/m16`	Push `r/m16`
`PUSH r/m32`	Push `r/m32`
`PUSH r/m64`	Push `r/m64`

`PUSH r16`	Push `r16`
`PUSH r32`	Push `r32`
`PUSH r64`	Push `r64`

`PUSH imm8`	Push `imm8`
`PUSH imm16`	Push `imm16`
`PUSH imm32`	Push `imm32`

Description

Decrements the stack pointer and then stores the source operand on the top of the stack. Address and operand sizes are determined and used as follows:

Address size. The D flag in the current code-segment descriptor determines the default address size; it may be overridden by an instruction prefix (0x67).
The address size is used only when referencing a source operand in memory.
Operand size. The D flag in the current code-segment descriptor determines the default operand size; it may be overridden by instruction prefixes (0x66 or REX.W).

The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is decremented (2, 4 or 8).

If the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack. If the source operand is a segment register (16 bits) and the operand size is 64-bits, a zero-extended value is pushed on the stack; if the operand size is 32-bits, either a zero-extended value is pushed on the stack or the segment selector is written on the stack using a 16-bit move. For the last case, all recent Core and Atom processors perform a 16-bit move, leaving the upper portion of the stack location unmodified.
Stack-address size. Outside of 64-bit mode, the B flag in the current stack-segment descriptor determines the size of the stack pointer (16 or 32 bits); in 64-bit mode, the size of the stack pointer is always 64 bits.

The stack-address size determines the width of the stack pointer when writing to the stack in memory and when decrementing the stack pointer. (As stated above, the amount by which the stack pointer is decremented is determined by the operand size.)

If the operand size is less than the stack-address size, the PUSH instruction may result in a misaligned stack pointer (a stack pointer that is not aligned on a doubleword or quadword boundary). The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. If a PUSH instruction uses a memory operand in which the ESP register is used for computing the operand address, the address of the operand is computed before the ESP register is decremented.

If the ESP or SP register is 1 when the PUSH instruction is executed in real-address mode, a stack-fault exception (#SS) is generated (because the limit of the stack segment is violated). Its delivery encounters a second stackfault exception (for the same reason), causing generation of a double-fault exception (#DF). Delivery of the double-fault exception encounters a third stack-fault exception, and the logical processor enters shutdown mode. See the discussion of the double-fault exception in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.

Operation

See Description section for possible sign-extension or zero-extension of source operand and for a case in which the size of the memory store may be smaller than the instruction’s operand size

SRC: operand

OperandSize = 16

// push word
RSP = RSP - 2;
Memory[SS:RSP] = SRC;

OperandSize = 32

// push dword
RSP = RSP - 4;
Memory[SS:RSP] = SRC;

OperandSize = 64

// push quadword
RSP = RPS - 8;
Memory[SS:RSP] = SRC;

Flags Affected

None.

`POP`

Pop a Value from the Stack

Instruction	Description
`POP r/m16`	Pop top of stack into `r/m16`; increment stack pointer
`POP r/m32`	Pop top of stack into `r/m32`; increment stack pointer
`POP r/m64`	Pop top of stack into `r/m64`; increment stack pointer

`POP r16`	Pop top of stack into `r16`; increment stack pointer
`POP r32`	Pop top of stack into `r32`; increment stack pointer
`POP r64`	Pop top of stack into `r64`; increment stack pointer

Description

Loads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register.

Address and operand sizes are determined and used as follows:

Address size. The D flag in the current code-segment descriptor determines the default address size; it may be overridden by an instruction prefix (0x67).

The address size is used only when writing to a destination operand in memory.
Operand size. The D flag in the current code-segment descriptor determines the default operand size; it may be overridden by instruction prefixes (0x66 or REX.W).

The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is incremented (2, 4 or 8).
Stack-address size. Outside of 64-bit mode, the B flag in the current stack-segment descriptor determines the size of the stack pointer (16 or 32 bits); in 64-bit mode, the size of the stack pointer is always 64 bits.

The stack-address size determines the width of the stack pointer when reading from the stack in memory and when incrementing the stack pointer. (As stated above, the amount by which the stack pointer is incremented is determined by the operand size.) If the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automatically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (see the Operation section below).

A NULL value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a NULL value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is NULL.

The POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction.

If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0x0 as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.

The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.

Loading the SS register with a POP instruction suppresses or inhibits some debug exceptions and inhibits interrupts on the following instruction boundary. (The inhibition ends after delivery of an exception or the execution of the next instruction.) This behavior allows a stack pointer to be loaded into the ESP register with the next instruction (POP ESP) before an event can be delivered. See Section 6.8.3, “Masking Exceptions and Interrupts When Switching Stacks,” in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. Intel recommends that software use the LSS instruction to load the SS register and ESP together.

In 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). When in 64-bit mode, POPs using 32-bit operands are not encodable and POPs to DS, ES, SS are not valid. See the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST: operand

OperandSize = 16

DEST = SS:RSP;
RSP = RSP + 2;

OperandSize = 32

DEST = SS:RSP;
RSP = RSP + 4;

OperandSize = 64

DEST = SS:RSP;
RSP = RSP + 8;

Flags Affected

None.

Binary Arithmetic Instructions

Mnemonic	Summary
`ADD`	Integer Add
`SUB`	Subtract
`IMUL`	Signed Multiply
`MUL`	Unsigned Multiply
`IDIV`	Signed Divide
`DIV`	Unsigned Divide
`INC`	Increment
`DEC`	Decrement
`NEG`	Negate
`CMP`	Compare

`ADD`

Add

Instruction	Description
`ADD r/m8, imm8`	Add `imm8` to `r/m8`
`ADD r/m16, imm16`	Add `imm16` to `r/m16`
`ADD r/m32, imm32`	Add `imm32` to `r/m32`
`ADD r/m64, imm32`	Add sign-extended `imm32` to `r/m64`

`ADD r/m16, imm8`	Add sign-extended `imm8` to `r/m16`
`ADD r/m32, imm8`	Add sign-extended `imm8` to `r/m32`
`ADD r/m64, imm8`	Add sign-extended `imm8` to `r/m64`

`ADD r/m8, r8`	Add `r8` to `r/m8`
`ADD r/m16, r16`	Add `r16` to `r/m16`
`ADD r/m32, r32`	Add `r32` to `r/m32`
`ADD r/m64, r64`	Add `r64` to `r/m64`

`ADD r8, r/m8`	ADD `r/m8` to `r8`
`ADD r16, r/m16`	ADD `r/m16` to `r16`
`ADD r32, r/m32`	ADD `r/m32` to `r32`
`ADD r64, r/m64`	ADD `r/m64` to `r64`

Description

Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

The ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer operands and sets the OF and CF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

Operation

DEST: first operand
SRC: second operand

DEST = DEST + SRC;

Flags Affected

OF, SF, ZF, AF, CF, PF

`SUB`

Subtract

Instruction	Description
`SUB r/m8, imm8`	Subtract `imm8` from `r/m8`
`SUB r/m16, imm16`	Subtract `imm16` from `r/m16`
`SUB r/m32, imm32`	Subtract `imm32` from `r/m32`
`SUB r/m64, imm32`	Subtract sign-extended `imm32` from `r/m64`

`SUB r/m16, imm8`	Subtract sign-extended `imm8` from `r/m16`
`SUB r/m32, imm8`	Subtract sign-extended `imm8` from `r/m32`
`SUB r/m64, imm8`	Subtract sign-extended `imm8` from `r/m64`

`SUB r/m8, r8`	Subtract `r8` from `r/m8`
`SUB r/m16, r16`	Subtract `r16` from `r/m16`
`SUB r/m32, r32`	Subtract `r32` from `r/m32`
`SUB r/m64, r64`	Subtract `r64` from `r/m64`

`SUB r8, r/m8`	Subtract `r/m8` from `r8`
`SUB r16, r/m16`	Subtract `r/m16` from `r16`
`SUB r32, r/m32`	Subtract `r/m32` from `r32`
`SUB r64, r/m64`	Subtract `r/m64` from `r64`

Description

Subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

The SUB instruction performs integer subtraction. It evaluates the result for both signed and unsigned integer operands and sets the OF and CF flags to indicate an overflow in the signed or unsigned result, respectively. The SF flag indicates the sign of the signed result.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

Operation

DEST: first operand
SRC: second operand

DEST = DEST – SRC;

Flags Affected

OF, SF, ZF, AF, PF, CF

`IMUL`

Signed Multiply

Instruction	Description
`IMUL r/m8`	`AX = AL * r/m8`
`IMUL r/m16`	`DX:AX = AX * r/m16`
`IMUL r/m32`	`EDX:EAX = EAX * r/m32`
`IMUL r/m64`	`RDX:RAX = RAX * r/m64`

`IMUL r16, r/m16`	`r16 = r16 * r/m16`
`IMUL r32, r/m32`	`r32 = r32 * r/m32`
`IMUL r64, r/m64`	`r64 = r64 * r/m64`

`IMUL r16, r/m16, imm16`*	`r16 = r/m16 * imm16`
`IMUL r32, r/m32, imm32`*	`r32 = r/m32 * imm32`
`IMUL r64, r/m64, imm32`*	`r64 = r/m64 * imm32`

`IMUL r16, r/m16, imm8`*	`r16 = r/m16 *` sign-extended `imm8`
`IMUL r32, r/m32, imm8`*	`r32 = r/m32 *` sign-extended `imm8`
`IMUL r64, r/m64, imm8`*	`r64 = r/m64 *` sign-extended `imm8`

* If the first two operands are the same, the second one can be left out when using nasm or .intel_syntax noprefix.

Description

Performs a signed multiplication of two operands. This instruction has three forms, depending on the number of operands.

One-operand form
This form is identical to that used by the MUL instruction. Here, the source operand (in a general-purpose register or memory location) is multiplied by the value in the AL, AX, EAX, or RAX register (depending on the operand size) and the product (twice the size of the input operand) is stored in the AX, DX:AX, EDX:EAX, or RDX:RAX registers, respectively.
Two-operand form
With this form the destination operand (the first operand) is multiplied by the source operand (second operand). The destination operand is a general-purpose register and the source operand is an immediate value, a general-purpose register, or a memory location. The intermediate product (twice the size of the input operand) is truncated and stored in the destination operand location.
Three-operand form
This form requires a destination operand (the first operand) and two source operands (the second and the third operands). Here, the first source operand (which can be a general-purpose register or a memory location) is multiplied by the second source operand (an immediate value). The intermediate product (twice the size of the first source operand) is truncated and stored in the destination operand (a general-purpose register). When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

The CF and OF flags are set when the signed integer value of the intermediate product differs from the sign-extended operand-size-truncated product, otherwise the CF and OF flags are cleared.

The three forms of the IMUL instruction are similar in that the length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, the result is truncated to the length of the destination before it is stored in the destination register. Because of this truncation, the CF or OF flag should be tested to ensure that no significant bits are lost.

The two- and three-operand forms may also be used with unsigned operands because the lower half of the product is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to determine if the upper half of the result is non-zero.

In 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. Use of REX.W modifies the three forms of the instruction as follows.

One-operand form
The source operand (in a 64-bit general-purpose register or memory location) is multiplied by the value in the RAX register and the product is stored in the RDX:RAX registers.
Two-operand form
The source operand is promoted to 64 bits if it is a register or a memory location. The destination operand is promoted to 64 bits.
Three-operand form
The first source operand (either a register or a memory location) and destination operand are promoted to 64 bits. If the source operand is an immediate, it is sign-extended to 64 bits.

Operation

Single Operand

SRC: operand

8-bit

TMP_XP = AL * SRC; // Signed multiplication
// TMP-XP is a signed integer at twice the width of the SRC
AX = TMP_XP[0..=15];
if SignExtend(TMP_XP[0..=7]) == TMP_XP {
    CF = 0;
    OF = 0;
} else {
    CF = 1;
    OF = 1;
}

16-bit

TMP_XP = AX * SRC; // Signed multiplication
// TMP_XP is a signed integer at twice the width of the SRC
DX:AX = TMP_XP[0..=31];
if SignExtend(TMP_XP[0..=15]) == TMP_XP {
    CF = 0;
    OF = 0;
} else {
    CF = 1;
    OF = 1;
}

32-bit

TMP_XP = EAX * SRC; // Signed multiplication
// TMP_XP is a signed integer at twice the width of the SRC
EDX:EAX = TMP_XP[0..=63];
if SignExtend(TMP_XP[0..=31]) == TMP_XP {
    CF = 0;
    OF = 0;
} else {
    CF = 1;
    OF = 1;
}

64-bit

TMP_XP = RAX * SRC; // Signed multiplication
// TMP_XP is a signed integer at twice the width of the SRC
RDX:RAX = TMP_XP[0..=127];
if SignExtend(TMP_XP[0..=63]) == TMP_XP {
    CF = 0;
    OF = 0;
} else {
    CF = 1;
    OF = 1;
}

Two Operands

DEST: first operand
SRC: second operand

TMP_XP = DEST * SRC // Signed multiplication
// TMP_XP is a signed integer at twice the width of the SRC
DEST = TruncateToOperandSize(TMP_XP);
if SignExtend(DEST) == TMP_XP {
    CF = 0;
    OF = 0;
} else {
    CF = 1;
    OF = 1;
}

Three Operands

DEST: first operand
SRC1: second operand
SRC2: third operand

TMP_XP = SRC1 * SRC2 // Signed multiplication
// TMP_XP is a signed integer at twice the width of the SRC1
DEST = TruncateToOperandSize(TMP_XP);
if SignExtend(DEST) == TMP_XP {
    CF = 0;
    OF = 0;
} else {
    CF = 1;
    OF = 1;
}

Flags Affected

For the one operand form of the instruction, the CF and OF flags are set when significant bits are carried into the upper half of the result and cleared when the result fits exactly in the lower half of the result. For the two- and three-operand forms of the instruction, the CF and OF flags are set when the result must be truncated to fit in the destination operand size and cleared when the result fits exactly in the destination operand size. The SF, ZF, AF, and PF flags are undefined.

`MUL`

Unsigned Multiply

Instruction	Description
`MUL r/m8`	Unsigned multiply (`AX = AL * r/m8`)
`MUL r/m16`	Unsigned multiply (`DX:AX = AX * r/m16`)
`MUL r/m32`	Unsigned multiply (`EDX:EAX = EAX * r/m32`)
`MUL r/m64`	Unsigned multiply (`RDX:RAX = RAX * r/m64`)

Description

Performs an unsigned multiplication of the first operand (destination operand) and the second operand (source operand) and stores the result in the destination operand. The destination operand is an implied operand located in register AL, AX or EAX (depending on the size of the operand); the source operand is located in a general-purpose register or a memory location. The action of this instruction and the location of the result depends on the opcode and the operand size as shown in the table below.

The result is stored in register AX, register pair DX:AX, or register pair EDX:EAX (depending on the operand size), with the high-order bits of the product contained in register AH, DX, or EDX, respectively. If the high-order bits of the product are 0, the CF and OF flags are cleared; otherwise, the flags are set.

See the summary chart at the beginning of this section for encoding data and limits.

Operand Size	Source 1	Source 2	Destination
Byte	`AL`	`r/m8`	`AX`
Word	`AX`	`r/m16`	`DX:AX`
Doubleword	`EAX`	`r/m32`	`EDX:EAX`
Quadword	`RAX`	`r/m64`	`RDX:RAX`

Operation

SRC: operand

OperandSize = 8

AX = AL * SRC;

OperandSize = 16

DX:AX = AX * SRC;

OperandSize = 32

EDX:EAX = EAX * SRC;

OperandSize = 64

RDX:RAX = RAX * SRC;

Flags Affected

The OF and CF flags are set to 0 if the upper half of the result is 0; otherwise, they are set to 1. The SF, ZF, AF, and PF flags are undefined.

`IDIV`

Signed Divide

Instruction	Description
`IDIV r/m8`	Signed divide `AX` by `r/m8`; `AL` = Quotient, `AH` = Remainder
`IDIV r/m16`	Signed divide `DX:AX` by `r/m16`; `AX` = Quotient, `DX` = Remainder
`IDIV r/m32`	Signed divide `EDX:EAX` by `r/m32`; `EAX` = Quotient, `EDX` = Remainder
`IDIV r/m64`	Signed divide `RDX:RAX` by `r/m64`; `RAX` = Quotient, `RDX` = Remainder

$AL AX EAX RAX = \frac{AX}{r/m8} = \frac{DX:AX}{r/m16} = \frac{EDX:EAX}{r/m32} = \frac{RDX:RAX}{r/m64} AH DX EDX RDX = AX mod r/m8 = DX:AX mod r/m16 = EDX:EAX mod r/m32 = RDX:RAX mod r/m64$

Description

Divides the (signed) value in the AX, DX:AX, or EDX:EAX (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, or EDX:EAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor).

Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magnitude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag.

In 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is applied, the instruction divides the signed value in RDX:RAX by the source operand. RAX contains a 64-bit quotient; RDX contains a 64-bit remainder.

See the summary chart at the beginning of this section for encoding data and limits. See the table below.

Operand Size	Dividend	Divisor	Quotient	Remainder	Quotient Range
Word/byte	`AX`	`r/m8`	`AL`	`AH`	-128 to +127
Doubleword/word	`DX:AX`	`r/m16`	`AX`	`DX`	-32,768 to +32,767
Quadword/doubleword	`EDX:EAX`	`r/m32`	`EAX`	`EDX`	-2³¹ to +2³¹ - 1
Doublequadword/quadword	`RDX:RAX`	`r/m64`	`RAX`	`RDX`	-2⁶³ to +2⁶³ - 1

Operation

SRC: operand

OperandSize = 8

if SRC == 0 {
    DE; // divide error
}

temp = AX / SRC; // signed division
if temp > 0x7F || temp < 0x80 {
    // if a positive result is greater than 0x7F
    // or a negative result is less than 0x80
    DE; // divide error
} else {
    AL = temp;
    AH = AX % SRC; // signed modulus
}

OperandSize = 16

if SRC == 0 {
    DE; // divide error
}

temp = DX:AX / SRC; // signed division
if temp > 0x7FFF || temp < 0x8000 {
    // if a positive result is greater than 0x7FFF
    // or a negative result is less than 0x8000
    DE; // divide error
} else {
    AX = temp;
    DX = DX:AX % SRC; // signed modulus
}

OperandSize = 32

if SRC == 0 {
    DE; // divide error
}

temp = EDX:EAX / SRC; // signed division
if temp > 0x7FFF_FFFF || temp < 0x8000_0000 {
    // if a positive result is greater than 0x7FFF_FFFF
    // or a negative result is less than 0x8000_0000
    DE; // divide error
} else {
    EAX = temp;
    EDX = EDX:EAX % SRC; // signed modulus
}

OperandSize = 64

temp = RDX:RAX / SRC; // signed division
if temp > 0x7FFF_FFFF_FFFF_FFFF || temp < 0x8000_0000_0000_0000 {
    // if a positive result is greater than 0x7FFF_FFFF_FFFF_FFFF
    // or a negative result is less than 0x8000_0000_0000_0000
    DE; // divide error
} else {
    RAX = temp;
    RDX = RDX:RAX % SRC; // signed modulus
}

Flags Affected

The CF, OF, SF, ZF, AF, and PF flags are undefined.

`DIV`

Unsigned Divide

Instruction	Description
`DIV r/m8`	Unsigned divide `AX` by `r/m8`; `AL` = Quotient, `AH` = Remainder
`DIV r/m16`	Unsigned divide `DX:AX` by `r/m16`; `AX` = Quotient, `DX` = Remainder
`DIV r/m32`	Unsigned divide `EDX:EAX` by `r/m32`; `EAX` = Quotient, `EDX` = Remainder
`DIV r/m64`	Unsigned divide `RDX:RAX` by `r/m64`; `RAX` = Quotient, `RDX` = Remainder

$AL AX EAX RAX = \frac{AX}{r/m8} = \frac{DX:AX}{r/m16} = \frac{EDX:EAX}{r/m32} = \frac{RDX:RAX}{r/m64} AH DX EDX RDX = AX mod r/m8 = DX:AX mod r/m16 = EDX:EAX mod r/m32 = RDX:RAX mod r/m64$

Description

Divides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand (divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can be a general-purpose register or a memory location. The action of this instruction depends on the operand size (dividend/divisor). Division using 64-bit operand is available only in 64-bit mode.

In 64-bit mode, the instruction’s default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is applied, the instruction divides the unsigned value in RDX:RAX by the source operand and stores the quotient in RAX, the remainder in RDX.

See the summary chart at the beginning of this section for encoding data and limits. See the table below.

Operand Size	Dividend	Divisor	Quotient	Remainder	Maximum Quotient
Word/byte	`AX`	`r/m8`	`AL`	`AH`	255
Doubleword/word	`DX:AX`	`r/m16`	`AX`	`DX`	65,535
Quadword/doubleword	`EDX:EAX`	`r/m32`	`EAX`	`EDX`	2³² - 1
Doublequadword/quadword	`RDX:RAX`	`r/m64`	`RAX`	`RDX`	2⁶⁴ - 1

Operation

SRC: operand

OperandSize = 8

if SRC == 0 {
    DE; // divide error
}

temp = AX / SRC;
if temp > 0xFF {
    DE; // divide error
} else {
    AL = temp;
    AH = AX % SRC;
}

OperandSize = 16

if SRC == 0 {
    DE; // divide error
}

temp = DX:AX / SRC;
if temp > 0xFFFF {
    DE; // divide error
} else {
    AX = temp;
    DX = DX:AX % SRC;
}

OperandSize = 32

if SRC == 0 {
    DE; // divide error
}

temp = EDX:EAX / SRC;
if temp > 0xFFFF_FFFF {
    DE; // divide error
} else {
    EAX = temp;
    EDX = EDX:EAX % SRC;
}

OperandSize = 64

if SRC == 0 {
    DE; // divide error
}

temp = RDX:RAX / SRC;
if temp > 0xFFFF_FFFF_FFFF_FFFF {
    DE; // divide error
} else {
    RAX = temp;
    RDX = RDX:RAX % SRC;
}

Flags Affected

The CF, OF, SF, ZF, AF, and PF flags are undefined.

`INC`

Increment by 1

Instruction	Description
`INC r/m8`	Increment `r/m8` by 1
`INC r/m16`	Increment `r/m16` by 1
`INC r/m32`	Increment `r/m32` by 1
`INC r/m64`	Increment `r/m64` by 1

Description

Adds 1 to the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (Use a ADD instruction with an immediate operand of 1 to perform an increment operation that does updates the CF flag.)

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, INC r16 and INC r32 are not encodable (because opcodes 0x40 through 0x47 are REX prefixes). Otherwise, the instruction’s 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.

Operation

DEST: operand

DEST = DEST + 1;

Flags Affected

The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result.

`DEC`

Decrement by 1

Instruction	Description
`DEC r/m8`	Decrement `r/m8` by 1
`DEC r/m16`	Decrement `r/m16` by 1
`DEC r/m32`	Decrement `r/m32` by 1
`DEC r/m64`	Decrement `r/m64` by 1

Description

Subtracts 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. (To perform a decrement operation that updates the CF flag, use a SUB instruction with an immediate operand of 1.)

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, DEC r16 and DEC r32 are not encodable (because opcodes 0x48 through 0x4F are REX prefixes). Otherwise, the instruction’s 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits.

See the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST: operand

DEST = DEST - 1;

Flags Affected

The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result.

`NEG`

Two’s Complement Negation

Instruction	Description
`NEG r/m8`	Two’s complement negate `r/m8`
`NEG r/m16`	Two’s complement negate `r/m16`
`NEG r/m32`	Two’s complement negate `r/m32`
`NEG r/m64`	Two’s complement negate `r/m64`

Description

Replaces the value of operand (the destination operand) with its two’s complement. (This operation is equivalent to subtracting the operand from 0.) The destination operand is located in a general-purpose register or a memory location.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

Operation

DEST: operand

if DEST == 0 {
    CF = 0;
} else {
    CF = 1;
}
DEST = -DEST;

Flags Affected

The CF flag set to 0 if the source operand is 0; otherwise it is set to 1. The OF, SF, ZF, AF, and PF flags are set according to the result.

`CMP`

Compare Two Operands

The same operation as SUB, but do not save the result in a register, only keep the flags.

Instruction	Description
`CMP r/m8, imm8`	Compare `imm8` with `r/m8`
`CMP r/m16, imm16`	Compare `imm16` with `r/m16`
`CMP r/m32, imm32`	Compare `imm32` with `r/m32`
`CMP r/m64, imm32`	Compare sign-extended `imm32` with `r/m64`

`CMP r/m16, imm8`	Compare sign-extended `imm8` with `r/m16`
`CMP r/m32, imm8`	Compare sign-extended `imm8` with `r/m32`
`CMP r/m64, imm8`	Compare sign-extended `imm8` with `r/m64`

`CMP r/m8, r8`	Compare `r8` with `r/m8`
`CMP r/m16, r16`	Compare `r16` with `r/m16`
`CMP r/m32, r32`	Compare `r32` with `r/m32`
`CMP r/m64, r64`	Compare `r64` with `r/m64`

`CMP r8, r/m8`	Compare `r/m8` with `r8`
`CMP r16, r/m16`	Compare `r/m16` with `r16`
`CMP r32, r/m32`	Compare `r/m32` with `r32`
`CMP r64, r/m64`	Compare `r/m64` with `r64`

Intuition:
Condition codes are read in order of the operands. For example:

cmp     rax, 10
jg      .label

“jump if RAX is greater than 10”

Description

Compares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is sign-extended to the length of the first operand.

The condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction. Appendix B, “EFLAGS Condition Codes,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, shows the relationship of the status flags and the condition codes.

Operation

SRC1: first operand
SRC2: second operand

temp = SRC1 - SignExtend(SRC2);
ModifyStatusFlags(); // modify status flags in the same manner as the SUB instruction

Flags Affected

The CF, OF, SF, ZF, AF, and PF flags are set according to the result.

Logical Instructions

Mnemonic	Summary
`AND`	Perform Bitwise Logical AND
`OR`	Perform Bitwise Logical OR
`XOR`	Perform Bitwise Logical Exclusive OR
`NOT`	Perform Bitwise Logical NOT

`AND`

Logical AND

Instruction	Description
`AND r/m8, imm8`	`r/m8` AND `imm8`
`AND r/m16, imm16`	`r/m16` AND `imm16`
`AND r/m32, imm32`	`r/m32` AND `imm32`
`AND r/m64, imm32`	`r/m64` AND `imm32` (sign-extended)

`AND r/m16, imm8`	`r/m16` AND `imm8` (sign-extended)
`AND r/m32, imm8`	`r/m32` AND `imm8` (sign-extended)
`AND r/m64, imm8`	`r/m64` AND `imm8` (sign-extended)

`AND r/m8, r8`	`r/m8` AND `r8`
`AND r/m16, r16`	`r/m16` AND `r16`
`AND r/m32, r32`	`r/m32` AND `r32`
`AND r/m64, r64`	`r/m64` AND `r64`

`AND r8, r/m8`	`r8` AND `r/m8`
`AND r16, r/m16`	`r16` AND `r/m16`
`AND r32, r/m32`	`r32` AND `r/m32`
`AND r64, r/m64`	`r64` AND `r/m64`

Description

Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0.

This instruction can be used with a LOCK prefix to allow the it to be executed atomically.

Operation

DEST: first operand
SRC: second operand

DEST = DEST & SRC;

Flags Affected

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

`OR`

Logical Inclusive OR

Instruction	Description
`OR r/m8, imm8`	`r/m8` OR `imm8`
`OR r/m16, imm16`	`r/m16` OR `imm16`
`OR r/m32, imm32`	`r/m32` OR `imm32`
`OR r/m64, imm32`	`r/m64` OR `imm32` (sign-extended)

`OR r/m16, imm8`	`r/m16` OR `imm8` (sign-extended)
`OR r/m32, imm8`	`r/m32` OR `imm8` (sign-extended)
`OR r/m64, imm8`	`r/m64` OR `imm8` (sign-extended)

`OR r/m8, r8`	`r/m8` OR `r8`
`OR r/m16, r16`	`r/m16` OR `r16`
`OR r/m32, r32`	`r/m32` OR `r32`
`OR r/m64, r64`	`r/m64` OR `r64`

`OR r8, r/m8`	`r8` OR `r/m8`
`OR r16, r/m16`	`r16` OR `r/m16`
`OR r32, r/m32`	`r32` OR `r/m32`
`OR r64, r/m64`	`r64` OR `r/m64`

Description

Performs a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result of the OR instruction is set to 0 if both corresponding bits of the first and second operands are 0; otherwise, each bit is set to 1.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

Operation

DEST: first operand
SRC: second operand

DEST = DEST | SRC;

Flags Affected

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

`XOR`

Logical Exclusive OR

Instruction	Description
`XOR r/m8, imm8`	`r/m8` XOR `imm8`
`XOR r/m16, imm16`	`r/m16` XOR `imm16`
`XOR r/m32, imm32`	`r/m32` XOR `imm32`
`XOR r/m64, imm32`	`r/m64` XOR `imm32` (sign-extended)

`XOR r/m16, imm8`	`r/m16` XOR `imm8` (sign-extended)
`XOR r/m32, imm8`	`r/m32` XOR `imm8` (sign-extended)
`XOR r/m64, imm8`	`r/m64` XOR `imm8` (sign-extended)

`XOR r/m8, r8`	`r/m8` XOR `r8`
`XOR r/m16, r16`	`r/m16` XOR `r16`
`XOR r/m32, r32`	`r/m32` XOR `r32`
`XOR r/m64, r64`	`r/m64` XOR `r64`

`XOR r8, r/m8`	`r8` XOR `r/m8`
`XOR r16, r/m16`	`r16` XOR `r/m16`
`XOR r32, r/m32`	`r32` XOR `r/m32`
`XOR r64, r/m64`	`r64` XOR `r/m64`

Description

Performs a bitwise exclusive OR (XOR) operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is 1 if the corresponding bits of the operands are different; each bit is 0 if the corresponding bits are the same.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

In 64-bit mode, using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST: first operand
SRC: second operand

DEST = DEST ^ SRC;

Flags Affected

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is undefined.

`NOT`

One’s Complement Negation

Instruction	Description
`NOT r/m8`	Reverse each bit of `r/m8`
`NOT r/m16`	Reverse each bit of `r/m16`
`NOT r/m32`	Reverse each bit of `r/m32`
`NOT r/m64`	Reverse each bit of `r/m64`

Description

Performs a bitwise NOT operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location.

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically.

Operation

DEST: operand

DEST = ~DEST;

Flags Affected

None.

Shift and Rotate Instructions

Mnemonic	Summary
`SAR`	Shift Arithmetic Right
`SHR`	Shift Logical Right
`SAL`\|`SHL`	Shift Arithmetic Left/Shift Logical Left

`SAL`|`SAR`|`SHL`|`SHR`

Shift

SAL: Shift Arithmetic Left
SAR: Shift Arithmetic Right

SHL: Shift Logical Left
SHR: Shift Logical Right

Right Arithmetic Shifts preserve the sign by filling with the same value as the sign bit.
Right Logical Shift always fill with 0s.

Left Shifts always fill with 0s, thus SAL and SHL are synonymous.

`SAL`|`SHL`

SAL and SHL are synonymous.

Instruction	Description
`SAL/SHL r/m8, imm8`	Shift `r/m8` to the left, `imm8` times, filling with 0
`SAL/SHL r/m16, imm8`	Shift `r/m16` to the left, `imm8` times, filling with 0
`SAL/SHL r/m32, imm8`	Shift `r/m32` to the left, `imm8` times, filling with 0
`SAL/SHL r/m64, imm8`	Shift `r/m64` to the left, `imm8` times, filling with 0

`SAL/SHL r/m8, CL`	Shift `r/m8` to the left, `CL` times, filling with 0
`SAL/SHL r/m16, CL`	Shift `r/m16` to the left, `CL` times, filling with 0
`SAL/SHL r/m32, CL`	Shift `r/m32` to the left, `CL` times, filling with 0
`SAL/SHL r/m64, CL`	Shift `r/m64` to the left, `CL` times, filling with 0

`SAR`

Instruction	Description
`SAR r/m8, imm8`	Shift `r/m8` to the right, `imm8` times, preserving the sign
`SAR r/m16, imm8`	Shift `r/m16` to the right, `imm8` times, preserving the sign
`SAR r/m32, imm8`	Shift `r/m32` to the right, `imm8` times, preserving the sign
`SAR r/m64, imm8`	Shift `r/m64` to the right, `imm8` times, preserving the sign

`SAR r/m8, CL`	Shift `r/m8` to the right, `CL` times, preserving the sign
`SAR r/m16, CL`	Shift `r/m16` to the right, `CL` times, preserving the sign
`SAR r/m32, CL`	Shift `r/m32` to the right, `CL` times, preserving the sign
`SAR r/m64, CL`	Shift `r/m64` to the right, `CL` times, preserving the sign

`SHR`

Instruction	Description
`SHR r/m8, imm8`	Shift `r/m8` to the right, `imm8` times, filling with 0
`SHR r/m16, imm8`	Shift `r/m16` to the right, `imm8` times, filling with 0
`SHR r/m32, imm8`	Shift `r/m32` to the right, `imm8` times, filling with 0
`SHR r/m64, imm8`	Shift `r/m64` to the right, `imm8` times, filling with 0

`SHR r/m8, CL`	Shift `r/m8` to the right, `CL` times, filling with 0
`SHR r/m16, CL`	Shift `r/m16` to the right, `CL` times, filling with 0
`SHR r/m32, CL`	Shift `r/m32` to the right, `CL` times, filling with 0
`SHR r/m64, CL`	Shift `r/m64` to the right, `CL` times, filling with 0

Description

Shifts the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand.

The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation; they shift the bits in the destination operand to the left (toward more significant bit locations). For each shift count, the most significant bit of the destination operand is shifted into the CF flag, and the least significant bit is cleared (see Figure 7-7 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1).

The shift arithmetic right (SAR) and shift logical right (SHR) instructions shift the bits of the destination operand to the right (toward less significant bit locations). For each shift count, the least significant bit of the destination operand is shifted into the CF flag, and the most significant bit is either set or cleared depending on the instruction type. The SHR instruction clears the most significant bit (see Figure 7-8 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1); the SAR instruction sets or clears the most significant bit to correspond to the sign (most significant bit) of the original value in the destination operand. In effect, the SAR instruction fills the empty bit position’s shifted value with the sign of the unshifted value (see Figure 7-9 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1).

The SAR and SHR instructions can be used to perform signed or unsigned division, respectively, of the destination operand by powers of 2. For example, using the SAR instruction to shift a signed integer 1 bit to the right divides the value by 2.

Using the SAR instruction to perform a division operation does not produce the same result as the IDIV instruction. The quotient from the IDIV instruction is rounded toward zero, whereas the “quotient” of the SAR instruction is rounded toward negative infinity. This difference is apparent only for negative numbers. For example, when the IDIV instruction is used to divide -9 by 4, the result is -2 with a remainder of -1. If the SAR instruction is used to shift -9 right by two bits, the result is -3 and the “remainder” is +3; however, the SAR instruction stores only the most significant bit of the remainder (in the CF flag).

The OF flag is affected only on 1-bit shifts. For left shifts, the OF flag is set to 0 if the most-significant bit of the result is the same as the CF flag (that is, the top two bits of the original operand were the same); otherwise, it is set to 1. For the SAR instruction, the OF flag is cleared for all 1-bit shifts. For the SHR instruction, the OF flag is set to the most-significant bit of the original operand.

In 64-bit mode, the instruction’s default operation size is 32 bits and the mask width for CL is 5 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64-bits and sets the mask width for CL to 6 bits. See the summary chart at the beginning of this section for encoding data and limits.

Operation

DEST: first operand
COUNT: second operand
countMASK: 0b00111111 (= 63) for 64-bit operations, 0b00011111 (= 31) otherwise

`SAL`|`SHL`

tempCOUNT = COUNT & countMASK;
while tempCOUNT != 0 {
    CF = MSB(DEST);
    DEST = DEST * 2;
    tempCOUNT = tempCOUNT - 1;
}

// determine overflow
if COUNT & countMASK == 1 {
    OF = MSB(DEST) ^ CF;
} else if COUNT & countMASK == 0 {
    // all flags unchanged
} else {
    // COUNT not 1 or 0
    // OF undefined
}

`SAR`

tempCOUNT = COUNT & countMASK;
while tempCOUNT != 0 {
    CF = LSB(DEST);
    DEST = DEST / 2; // signed divide, rounding toward negative infinity
    tempCOUNT = tempCOUNT - 1;
}

// determine overflow
if COUNT & countMASK == 1 {
    OF = 0;
} else if COUNT & countMASK == 0 {
    // all flags unchanged
} else {
    // COUNT not 1 or 0
    // OF undefined
}

`SHR`

tempCOUNT = COUNT & countMASK;
while tempCOUNT != 0 {
    CF = LSB(DEST);
    DEST = DEST / 2; // unsigned divide
    tempCOUNT = tempCOUNT - 1;
}

// determine overflow
if COUNT & countMASK == 1 {
    OF = MSB(tempDEST);
} else if COUNT & countMASK == 0 {
    // all flags unchanged
} else {
    // COUNT not 1 or 0
    // OF undefined
}

Flags Affected

The CF flag contains the value of the last bit shifted out of the destination operand; it is undefined for SHL and SHR instructions where the count is greater than or equal to the size (in bits) of the destination operand. The OF flag is affected only for 1-bit shifts (see Description below); otherwise, it is undefined. The SF, ZF, and PF flags are set according to the result. If the count is 0, the flags are not affected. For a non-zero count, the AF flag is undefined.

Bit and Byte Instructions

Mnemonic	Summary
`TEST`	Logical Compare

`TEST`

Logical Compare

The same operation as AND, but do not save the result in a register, only keep the flags.

Instruction	Description
`TEST r/m8, imm8`	AND `imm8` with `r/m8`; only set Flags
`TEST r/m16, imm16`	AND `imm16` with `r/m16`; only set Flags
`TEST r/m32, imm32`	AND `imm32` with `r/m32`; only set Flags
`TEST r/m64, imm32`	AND sign-extended `imm32` with `r/m64`; only set Flags

`TEST r/m8, r8`	AND `r8` with `r/m8`; only set Flags
`TEST r/m16, r16`	AND `r16` with `r/m16`; only set Flags
`TEST r/m32, r32`	AND `r32` with `r/m32`; only set Flags
`TEST r/m64, r64`	AND `r64` with `r/m64`; only set Flags

Description

Computes the bit-wise logical AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result. The result is then discarded.

Operation

SRC1: first operand
SRC2: second operand

TEMP = SRC1 & SRC2;
SF = MSB(TEMP);

if TEMP == 0 {
    ZF = 1;
} else {
    ZF = 0;
}

PF = BitwiseXNOR(TEMP[0..=7]);
CF = 0;
OF = 0;
// AF is undefined

Flags Affected

The OF and CF flags are set to 0. The SF, ZF, and PF flags are set according to the result (see the Operation section above). The state of the AF flag is undefined.

Control Transfer Instructions

Mnemonic	Summary
`JMP`	Jump
`Jcc`	Jump if `cc`
`CALL`	Call Procedure
`RET`	Return

`JMP`

Jump

Instruction	Description
`JMP rel8`	Jump short, `RIP` = `RIP` + 8-bit displacement sign-extended to 64-bits
`JMP rel16`	Jump near, relative, displacement relative to next instruction; Not supported in 64-bit mode
`JMP rel32`	Jump near, realtive, `RIP` = `RIP` + 32-bit displacement sign-extended to 64-bits

`JMP r/m16`	Jump near, absolute indirect, address = zer-extended `r/m16`; Not supported in 64-bit mode
`JMP r/m32`	Jump near, absolute indirect, address given in `r/m32`; Not supported in 64-bit mode
`JMP r/m64`	Jump near, absolute indirect, `RIP` = 64-bit offset from register or memory

`JMP ptr16:16`	Jump far, absolute, address given in operand
`JMP ptr16:32`	Jump far, absolute, address given in operand

`JMP m16:16`	Jump far, absolute indirect, address given in operand
`JMP m16:32`	Jump far, absolute indirect, address given in operand
`JMP m16:64`	Jump far, absolute indirect, address given in operand

Description

Transfers program control to a different point in the instruction stream without recording return information. The destination (target) operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a general-purpose register, or a memory location.

This instruction can be used to execute four different types of jumps:

Near jump
A jump to an instruction within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment jump.
Short jump
A near jump where the jump range is limited to –128 to +127 from the current EIP value.
Far jump
A jump to an instruction located in a different segment than the current code segment but at the same privilege level, sometimes referred to as an intersegment jump.
Task switch
A jump to an instruction located in a different task. A task switch can only be executed in protected mode (see Chapter 7, in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on performing task switches with the JMP instruction).

Near and Short Jumps

When executing a near jump, the processor jumps to the address (within the current code segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register). A near jump to a relative offset of 8-bits (rel8) is referred to as a short jump. The CS register is not changed on near and short jumps.

An absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16 or r/m32). The operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded directly into the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits.

A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed 8-, 16-, or 32-bit immediate value. This value is added to the value in the EIP register. (Here, the EIP register contains the address of the instruction following the JMP instruction). When using relative offsets, the opcode (for short vs. near jumps) and the operand-size attribute (for near relative jumps) determines the size of the target operand (8, 16, or 32 bits).

Far Jumps in Real-Address or Virtual-8086 Mode

When executing a far jump in real-address or virtual-8086 mode, the processor jumps to the code segment and offset specified with the target operand. Here the target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and address of the called procedure is encoded in the instruction, using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared.

Far Jumps in Protected Mode

When the processor is operating in protected mode, the JMP instruction can be used to perform the following three types of far jumps:

A far jump to a conforming or non-conforming code segment.
A far jump through a call gate.
A task switch. (The JMP instruction cannot be used to perform inter-privilege-level far jumps.)

In protected mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of jump to be performed.

If the selected descriptor is for a code segment, a far jump to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far jump to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register, and the offset from the instruction is loaded into the EIP register. Note that a call gate (described in the next paragraph) can also be used to perform far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making jumps between 16-bit and 32-bit code segments.

When executing a far jump through a call gate, the segment selector specified by the target operand identifies the call gate. (The offset part of the target operand is ignored.) The processor then jumps to the code segment specified in the call gate descriptor and begins executing the instruction at the offset specified in the call gate. No stack switch occurs. Here again, the target operand can specify the far address of the call gate either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32).

Executing a task switch with the JMP instruction is somewhat similar to executing a jump through a call gate. Here the target operand specifies the segment selector of the task gate for the task being switched to (and the offset part of the target operand is ignored). The task gate in turn points to the TSS for the task, which contains the segment selectors for the task’s code and stack segments. The TSS also contains the EIP value for the next instruction that was to be executed before the task was suspended. This instruction pointer value is loaded into the EIP register so that the task begins executing again at this next instruction.

The JMP instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. See Chapter 7 in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for detailed information on the mechanics of a task switch.

Note that when you execute at task switch with a JMP instruction, the nested task flag (NT) is not set in the EFLAGS register and the new TSS’s previous task link field is not loaded with the old task’s TSS selector. A return to the previous task can thus not be carried out by executing the IRET instruction. Switching tasks with the JMP instruction differs in this regard from the CALL instruction which does set the NT flag and save the previous task link information, allowing a return to the calling task with an IRET instruction.

Refer to Chapter 6, “Procedure Calls, Interrupts, and Exceptions” and Chapter 18, “Control-Flow Enforcement Technology (CET)” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1 for CET details.

In 64-Bit Mode

The instruction’s operation size is fixed at 64 bits. If a selector points to a gate, then RIP equals the 64-bit displacement taken from gate; else RIP equals the zero-extended offset from the far pointer referenced in the instruction.

See the summary chart at the beginning of this section for encoding data and limits.

Instruction ordering

Instructions following a far jump may be fetched from memory before earlier instructions complete execution, but they will not execute (even speculatively) until all instructions prior to the far jump have completed execution (the later instructions may execute before data stored by the earlier instructions have become globally visible).

Certain situations may lead to the next sequential instruction after a near indirect JMP being speculatively executed. If software needs to prevent this (e.g., in order to prevent a speculative execution side channel), then an INT3 or LFENCE instruction opcode can be placed after the near indirect JMP in order to block speculative execution.

Operation

if /* near jump */ {
    if /* 64-bit Mode */ {
        if /* near relative jump */ {
            // RIP is instruction following JMP instruction
            tempRIP = RIP + DEST;
        } else {
            // near absolute jump
            tempRIP = DEST;
        }
    } else {
        if /* near relative jump */ {
            // EIP is instruction following JMP instruction
            tempEIP = EIP + DEST;
        } else {
            // near absolute jump
            tempEIP = DEST;
        }
    }
    if (IA32_EFER.LMA == 0 || target mode == Compatibility mode)
            && /* tempEIP outside code segment limit */ {
        #GP(0);
    }
    if /* 64-bit mode and tempRIP is not canonical */ {
        #GP(0);
    }
    if OperandSize == 32 {
        EIP = tempEIP;
    } else {
        if OperandSize == 16 {
            EIP = tempEIP & 0x0000_FFFF;
        } else {
            // OperandSize == 64
            RIP = tempRIP;
        }
    }
    if /* JMP near indirect, absolute indirect */ {
        if EndbranchEnabledAndNotSuppressed(CPL) {
            if CPL == 3 {
                if /* no 3EH prefix */ || IA32_U_CET.NO_TRACK_EN == 0 {
                    IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
                }
            } else {
                if /* no 3EH prefix */ || IA32_S_CET.NO_TRACK_EN == 0 {
                    IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
                }
            }
        }
    }
}

if /* far jump */ && (PE == 0 || (PE == 1 && VM == 1)) {
    // real-address or virtual-8086 mode
    tempEIP = DEST(Offset); // DEST is ptr16:32 or [m16:32]
    if /* tempEIP is beyond code segment limit */ {
        #GP(0);
    }
    CS = DEST(segment selector); // DEST is ptr16:32 or [m16:32]
    if OperandSize == 32 {
        EIP = tempEIP; // DEST is ptr16:32 or [m16:32]
    } else {
        // OperandSize == 16
        EIP = tempEIP & 0x0000_FFFF; // clear upper 16 bits
    }
}

if /* far jump */ && PE == 1 && VM == 0 {
    // IA-32e mode or protected mode, not virtual-8086 mode
    if /* effective address in the CS, DS, ES, FS, GS, or SS segment is illegal
            or segment selector in target operand NULL */ {
        #GP(0);
    }
    if /* segment selector index not within descriptor table limits */ {
        #GP(new selector);
    }
    /* read type and access rights of segment descriptor; */
    if IA32_EFER.LMA == 0 {
        if /* segment type is not a conforming or nonconforming code
                segment, call gate, task gate, or TSS */ {
            #GP(segment selector);
        }
    } else {
        if /* segment type is not a conforming or nonconforming code segment
                call gate */ {
            #GP(segment selector);
        }
    }
    /* Depending on type and access rights: */
        goto 'CONFORMING_CODE_SEGMENT;
        goto 'NONCONFORMING_CODE_SEGMENT;
        goto 'CALL_GATE;
        goto 'TASK_GATE;
        goto 'TASK_STATE_SEGMENT;
} else {
    #GP(segment selector);
}

'CONFORMING_CODE_SEGMENT {
    if L-Bit == 1 && D-BIT == 1 && IA32_EFER.LMA == 1 {
        #GP(new code segment selector);
    }
    if DPL > CPL {
        #GP(segment selector);
    }
    if /* segment not present */ {
        #NP(segment selector);
    }
    tempEIP = DEST(Offset);
    if OperandSize == 16 {
        tempEIP = tempEIP & 0x0000_FFFF;
    }
    if (IA32_EFER.LMA == 0 || target mode == Compatibility mode)
            && /* tempEIP outside code segment limit */ {
        #GP(0);
    }
    if /* tempEIP is non-canonical */ {
        #GP(0);
    }
    if ShadowStackEnabled(CPL) {
        if (IA32_EFER.LMA & DEST(segment selector).L) == 0 {
            // if target is legacy or compatibility mode then the SSP must be in low 4GB
            if SSP & 0xFFFF_FFFF_0000_0000 != 0
                #GP(0);
            FI;
        }
    }
    CS = DEST[segment selector]; // segment descriptor information also loaded
    CS(RPL) = CPL
    EIP = tempEIP;
    if EndbranchEnabled(CPL) {
        if CPL = 3 {
            IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_U_CET.SUPPRESS = 0;
        } else {
            IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_S_CET.SUPPRESS = 0;
        }
    }
}

'NONCONFORMING_CODE_SEGMENT {
    if L-Bit == 1 && D-BIT == 1 && IA32_EFER.LMA == 1 {
        #GP(new code segment selector);
    }
    if (RPL > CPL) || (DPL != CPL) {
        #GP(code segment selector);
    }
    if /* segment not present */ {
        #NP(segment selector);
    }
    tempEIP = DEST(Offset);
    if OperandSize == 16 {
        tempEIP = tempEIP & 0x0000_FFFF;
    }
    if (IA32_EFER.LMA == 0 || target mode == Compatibility mode)
            && /* tempEIP outside code segment limit */ {
        #GP(0);
    }
    if /* tempEIP is non-canonical */ {
        #GP(0);
    }
    if ShadowStackEnabled(CPL) {
        if (IA32_EFER.LMA & DEST(segment selector).L) == 0 {
            // if target is legacy or compatibility mode then the SSP must be in low 4GB
            if (SSP & 0xFFFF_FFFF_0000_0000 != 0) {
                #GP(0);
            }
        }
    }
    CS = DEST[segment selector]; // segment descriptor information also loaded
    CS(RPL) = CPL;
    EIP = tempEIP;
    if EndbranchEnabled(CPL) {
        if CPL == 3 {
            IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_U_CET.SUPPRESS = 0;
        } else {
            IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_S_CET.SUPPRESS = 0;
        }
    }
}

'CALL_GATE {
    if call gate DPL < CPL
            || call gate DPL < call gate segment-selector RPL {
        #GP(call gate selector);
    }
    if /* call gate not present */ {
        #NP(call gate selector);
    }
    if /* call gate code-segment selector is NULL */ {
        #GP(0);
    }
    if /* call gate code-segment selector index outside descriptor table limits */ {
        #GP(code segment selector);
    }
    /* Read code segment descriptor; */
    if /* code-segment segment descriptor does not indicate a code segment */
            || (/* code-segment segment descriptor is conforming */ && DPL > CPL)
            || (/* code-segment segment descriptor is non-conforming */ && DPL != CPL)
        #GP(code segment selector);
    }
    if IA32_EFER.LMA == 1 && (/* code-segment descriptor is not a 64-bit code segment */
            || /* code-segment segment descriptor has both L-Bit and D-bit set */) {
        #GP(code segment selector);
    }
    if /* code segment is not present */ {
        #NP(code-segment selector);
    }
    tempEIP = DEST(Offset);
    if GateSize == 16 {
        tempEIP = tempEIP & 0x0000_FFFF;
    }
    if (IA32_EFER.LMA == 0 || target mode == Compatibility mode)
            && /* tempEIP outside code segment limit */ {
        #GP(0);
    }
    CS = DEST[SegmentSelector]; // segment descriptor information also loaded
    CS(RPL) = CPL;
    EIP = tempEIP;
    if EndbranchEnabled(CPL) {
        if CPL == 3 {
            IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_U_CET.SUPPRESS = 0;
        } else {
            IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_S_CET.SUPPRESS = 0;
        }
    }
}

'TASK_GATE {
    if task gate DPL < CPL
            || task gate DPL < task gate segment-selector RPL {
        #GP(task gate selector);
    }
    if /* task gate not present */ {
        #NP(gate selector);
    }
    /* Read the TSS segment selector in the task-gate descriptor; */
    if /* TSS segment selector local/global bit is set to local */
            || /* index not within GDT limits */
            || /* descriptor is not a TSS segment */
            || /* TSS descriptor specifies that the TSS is busy */ {
        #GP(TSS selector);
    }
    if /* TSS not present */ {
        #NP(TSS selector);
    }
    SWITCH-TASKS to TSS;
    if /* EIP not within code segment limit */ {
        #GP(0);
    }
}

'TASK_STATE_SEGMENT {
    if TSS DPL < CPL
            || TSS DPL < TSS segment-selector RPL
            || /* TSS descriptor indicates TSS not available */ {
        #GP(TSS selector);
    }
    if /* TSS is not present */ {
        #NP(TSS selector);
    }
    SWITCH-TASKS to TSS;
    if /* EIP not within code segment limit */
        #GP(0);
    }
}

Flags Affected

All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur.

`Jcc`

Jump if Condition Is Met

Instruction	Description
`Jcc rel8`	Jump short if `cc`
`Jcc rel16`	Jump near if `cc`; Not supported in 64-bit mode
`Jcc rel32`	Jump near if `cc`

Description

Checks the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and, if the flags are in the specified state (condition), performs a jump to the target instruction specified by the destination operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc instruction.

The target instruction is specified with a relative offset (a signed offset relative to the current value of the instruction pointer in the EIP register). A relative offset (rel8, rel16, or rel32) is generally specified as a label in assembly code, but at the machine code level, it is encoded as a signed, 8-bit or 32-bit immediate value, which is added to the instruction pointer. Instruction coding is most efficient for offsets of –128 to +127. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits.

The conditions for each Jcc mnemonic are given in the “Description” column of the table on the preceding page. The terms “less” and “greater” are used for comparisons of signed integers and the terms “above” and “below” are used for unsigned integers.

Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the JA (jump if above) instruction and the JNBE (jump if not below or equal) instruction are alternate mnemonics for the opcode 0x77.

The Jcc instruction does not support far jumps (jumps to other code segments). When the target for the conditional jump is in a different segment, use the opposite condition from the condition being tested for the Jcc instruction, and then access the target with an unconditional far jump (JMP instruction) to the other segment. For example, the following conditional far jump is illegal:

jz FARLABEL

To accomplish this far jump, use the following two instructions:

jnz BEYOND
jmp FARLABEL
BEYOND:

The JRCXZ, JECXZ and JCXZ instructions differ from other Jcc instructions because they do not check status flags. Instead, they check RCX, ECX or CX for 0. The register checked is determined by the address-size attribute. These instructions are useful when used at the beginning of a loop that terminates with a conditional loop instruction (such as LOOPNE). They can be used to prevent an instruction sequence from entering a loop when RCX, ECX or CX is 0. This would cause the loop to execute 264, 232 or 64K times (not zero times).

All conditional jumps are converted to code fetches of one or two cache lines, regardless of jump address or cacheability.

In 64-bit mode, operand size is fixed at 64 bits. JMP Short is RIP = RIP + 8-bit offset sign extended to 64 bits. JMP Near is RIP = RIP + 32-bit offset sign extended to 64 bits.

Operation

if condition {
    tempEIP = EIP + SignExtend(DEST);
    if OperandSize == 16 {
        tempEIP = tempEIP && 0x0000_FFFF;
    }
    if /* tempEIP is not within code segment limit */ {
        #GP(0);
    } else {
        EIP = tempEIP;
    }
}

Flags Affected

None.

`CALL`

Call Procedure

Instruction	Description
`CALL rel16`	Call near, relative, displacement realtive to next instruction
`CALL rel32`	Call near, realtive, displacement relative to next instruction; 32-bit displacement sign extended to 64-bits in 64-bit mode

`CALL r/m16`	Call near, absolute indirect, address given in `r/m16`
`CALL r/m32`	Call near, absolute indirect, address given in `r/m32`
`CALL r/m64`	Call near, absolute indirect, address given in `r/m64`

`CALL ptr16:16`	Call far, absolute, address given in operand
`CALL ptr16:32`	Call far, absolute, address given in operand

`CALL m16:16`	Call far, absolute indirect address given in `m16:16`; In 32-bit mode: If selector points to a gater, then `RIP` = 32-bit zero-extended displacement taken from gate; else `RIP` = zero-extended 16-bit offset from far pointer referenced in the instruction
`CALL m16:32`	In 64-bit mode: If selector points to a gate, then `RIP` = 64-bit displacement taken from gate; else `RIP` = zero-extended 32-bit offset from far pointer referenced in the instruction
`CALL m16:64`	In 64-bit mode: If selector points to a gate, then `RIP` = 64-bit displacement taken from gate; else `RIP` = 64-bit offset from far pointer referenced in the instruction

Description

Saves procedure linking information on the stack and branches to the called procedure specified using the target operand. The target operand specifies the address of the first instruction in the called procedure. The operand can be an immediate value, a general-purpose register, or a memory location.

This instruction can be used to execute four types of calls:

Near Call
A call to a procedure in the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intra-segment call.
Far Call
A call to a procedure located in a different segment than the current code segment, sometimes referred to as an inter-segment call.
Inter-privilege-level far call
A far call to a procedure in a segment at a different privilege level than that of the currently executing program or procedure.
Task switch
A call to a procedure located in a different task. The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See “Calling Procedures Using Call and RET” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 7, “Task Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on performing task switches with the CALL instruction.

Near Call

When executing a near call, the processor pushes the value of the EIP register (which contains the offset of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer). The processor then branches to the address in the current code segment specified by the target operand. The target operand specifies either an absolute offset in the code segment (an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register; this value points to the instruction following the CALL instruction). The CS register is not changed on near calls.

For a near call absolute, an absolute offset is specified indirectly in a general-purpose register or a memory location (r/m16, r/m32, or r/m64). The operand-size attribute determines the size of the target operand (16, 32 or 64 bits). When in 64-bit mode, the operand size for near call (and all near branches) is forced to 64-bits. Absolute offsets are loaded directly into the EIP(RIP) register. If the operand size attribute is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. When accessing an absolute offset indirectly using the stack pointer [ESP] as the base register, the base value used is the value of the ESP before the instruction executes.

A relative offset (rel16 or rel32) is generally specified as a label in assembly code. But at the machine code level, it is encoded as a signed, 16- or 32-bit immediate value. This value is added to the value in the EIP(RIP) register. In 64-bit mode the relative offset is always a 32-bit immediate value which is sign extended to 64-bits before it is added to the value in the RIP register for the target calculation. As with absolute offsets, the operand-size attribute determines the size of the target operand (16, 32, or 64 bits). In 64-bit mode the target operand will always be 64-bits because the operand size is forced to 64-bits for near branches.

Far Calls in Real-Address or Virtual-8086 Mode

When executing a far call in realaddress or virtual-8086 mode, the processor pushes the current value of both the CS and EIP registers on the stack for use as a return-instruction pointer. The processor then performs a “far branch” to the code segment and offset specified with the target operand for the called procedure. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). With the pointer method, the segment and offset of the called procedure is encoded in the instruction using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared.

Far Calls in Protected Mode

When the processor is operating in protected mode, the CALL instruction can be used to perform the following types of far calls:

Far call to the same privilege level
Far call to a different privilege level (inter-privilege level call)
Task switch (far call to another task) In protected mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access rights determine the type of call operation to be performed.

If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in protected mode is very similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand- size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register; the offset from the instruction is loaded into the EIP register.

A call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making calls between 16-bit and 32-bit code segments.

When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a call gate. The segment selector specified by the target operand identifies the call gate. The target operand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)

On inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, no stack switch occurs.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack, an optional set of parameters from the calling procedures stack, and the segment selector and instruction pointer for the calling procedure’s code segment. (A value in the call gate descriptor determines how many parameters to copy to the new stack.) Finally, the processor branches to the address of the procedure being called within the new code segment.

Executing a task switch with the CALL instruction is similar to executing a call through a call gate. The target operand specifies the segment selector of the task gate for the new task activated by the switch (the offset in the target operand is ignored). The task gate in turn points to the TSS for the new task, which contains the segment selectors for the task’s code and stack segments. Note that the TSS also contains the EIP value for the next instruction that was to be executed before the calling task was suspended. This instruction pointer value is loaded into the EIP register to re-start the calling task.

The CALL instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the task gate. See Chapter 7, “Task Management,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on the mechanics of a task switch.

When you execute at task switch with a CALL instruction, the nested task flag (NT) is set in the EFLAGS register and the new TSS’s previous task link field is loaded with the old task’s TSS selector. Code is expected to suspend this nested task by executing an IRET instruction which, because the NT flag is set, automatically uses the previous task link to return to the calling task. (See “Task Linking” in Chapter 7 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for information on nested tasks.) Switching tasks with the CALL instruction differs in this regard from JMP instruction. JMP does not set the NT flag and therefore does not expect an IRET instruction to suspend the task.

Mixing 16-Bit and 32-Bit Calls

When making far calls between 16-bit and 32-bit code segments, use a call gate. If the far call is from a 32-bit code segment to a 16-bit code segment, the call should be made from the first 64 KBytes of the 32-bit code segment. This is because the operand-size attribute of the instruction is set to 16, so only a 16-bit return address offset can be saved. Also, the call should be made using a 16-bit call gate so that 16-bit values can be pushed on the stack. See Chapter 20, “Mixing 16-Bit and 32-Bit Code,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B, for more information.

Far Calls in Compatibility Mode

When the processor is operating in compatibility mode, the CALL instruction can be used to perform the following types of far calls:

Far call to the same privilege level, remaining in compatibility mode
Far call to the same privilege level, transitioning to 64-bit mode
Far call to a different privilege level (inter-privilege level call), transitioning to 64-bit mode Note that a CALL instruction can not be used to cause a task switch in compatibility mode since task switches are not supported in IA-32e mode.

In compatibility mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine the type of call operation to be performed.

If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in compatibility mode is very similar to one carried out in protected mode. The target operand specifies an absolute far address either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its descriptor are loaded into CS register and the offset from the instruction is loaded into the EIP register. The difference is that 64-bit mode may be entered. This specified by the L bit in the new code segment descriptor.

Note that a 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. However, using this mechanism requires that the target code segment descriptor have the L bit set, causing an entry to 64-bit mode.

When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target operand can specify the call gate segment selector either directly with a pointer (ptr16:16 or ptr16:32) or indirectly with a memory location (m16:16 or m16:32). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)

On inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. The full value of RSP is used for the offset, of which the upper 32-bits are undefined.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack and the segment selector and instruction pointer for the calling procedure’s code segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called within the new code segment.

Near/(Far) Calls in 64-bit Mode

When the processor is operating in 64-bit mode, the CALL instruction can be used to perform the following types of far calls:

Far call to the same privilege level, transitioning to compatibility mode
Far call to the same privilege level, remaining in 64-bit mode
Far call to a different privilege level (inter-privilege level call), remaining in 64-bit mode Note that in this mode the CALL instruction can not be used to cause a task switch in 64-bit mode since task switches are not supported in IA-32e mode.

In 64-bit mode, the processor always uses the segment selector part of the far address to access the corresponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine the type of call operation to be performed.

If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, a general-protection exception is generated.) A far call to the same privilege level in 64-bit mode is very similar to one carried out in compatibility mode. The target operand specifies an absolute far address indirectly with a memory location (m16:16, m16:32 or m16:64). The form of CALL with a direct specification of absolute far address is not defined in 64-bit mode. The operand-size attribute determines the size of the offset (16, 32, or 64 bits) in the far address. The new code segment selector and its descriptor are loaded into the CS register; the offset from the instruction is loaded into the EIP register. The new code segment may specify entry either into compatibility or 64-bit mode, based on the L bit value.

A 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same privilege level. However, using this mechanism requires that the target code segment descriptor have the L bit set.

When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target operand can only specify the call gate segment selector indirectly with a memory location (m16:16, m16:32 or m16:64). The processor obtains the segment selector for the new code segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is used.)

Note that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. (The full value of RSP is used for the offset.) On the new stack, the processor pushes the segment selector and stack pointer for the calling procedure’s stack and the segment selector and instruction pointer for the calling procedure’s code segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called within the new code segment.

Instruction ordering

Instructions following a far call may be fetched from memory before earlier instructions complete execution, but they will not execute (even speculatively) until all instructions prior to the far call have completed execution (the later instructions may execute before data stored by the earlier instructions have become globally visible).

Certain situations may lead to the next sequential instruction after a near indirect CALL being speculatively executed. If software needs to prevent this (e.g., in order to prevent a speculative execution side channel), then an LFENCE instruction opcode can be placed after the near indirect CALL in order to block speculative execution.

Operation

if /* near call */ {
    if /* near relative call */ {
        if OperandSize == 64 {
            tempDEST = SignExtend(DEST); // DEST is rel32
            tempRIP = RIP + tempDEST;
            if /* stack not large enough for a 8-byte return address */ {
                #SS(0);
            }
            Push(RIP);
            if ShadowStackEnabled(CPL) && DEST != 0 {
                ShadowStackPush8B(RIP);
            }
            RIP = tempRIP;
        }
        if OperandSize == 32 {
            tempEIP = EIP + DEST; // DEST is rel32
            if /* tempEIP is not within code segment limit */ {
                #GP(0);
            }
            if /* stack not large enough for a 4-byte return address */ {
                #SS(0);
            }
            Push(EIP);
            if ShadowStackEnabled(CPL) && DEST != 0 {
                ShadowStackPush4B(EIP);
            }
            EIP = tempEIP;
        }
        if OperandSize == 16 {
            tempEIP = (EIP + DEST) && 0x0000_FFFF; // DEST is rel16
            if /* tempEIP is not within code segment limit */ {
                #GP(0);
            }
            if /* stack not large enough for a 2-byte return address */ {
                #SS(0);
            }
            Push(IP);
            if ShadowStackEnabled(CPL) && DEST != 0 {
                // IP is zero extended and pushed as a 32 bit value on shadow stack
                ShadowStackPush4B(IP);
            }
            EIP = tempEIP;
        }
    } else { // near absolute call *)
        if OperandSize == 64 {
            tempRIP = DEST; // DEST is r/m64
            if /* stack not large enough for a 8-byte return address */ {
                #SS(0);
            }
            Push(RIP);
            if ShadowStackEnabled(CPL) {
                ShadowStackPush8B(RIP);
            }
            RIP = tempRIP;
        }
        if OperandSize == 32 {
            tempEIP = DEST; // DEST is r/m32
            if /* tempEIP is not within code segment limit */ {
                #GP(0);
            }
            if /* stack not large enough for a 4-byte return address */ {
                #SS(0);
            }
            Push(EIP);
            if ShadowStackEnabled(CPL) {
                ShadowStackPush4B(EIP);
            }
            EIP = tempEIP;
        }
        if OperandSize == 16 {
            tempEIP = DEST && 0x0000_FFFF; // DEST is r/m16
            if /* tempEIP is not within code segment limit */ {
                #GP(0);
            }
            if /* stack not large enough for a 2-byte return address */ {
                #SS(0);
            }
            Push(IP);
            if ShadowStackEnabled(CPL) {
                // IP is zero extended and pushed as a 32 bit value on shadow stack
                ShadowStackPush4B(IP);
            }
            EIP = tempEIP;
        }
    } // rel/abs
    if /* Call near indirect, absolute indirect */ {
        if EndbranchEnabledAndNotSuppressed(CPL) {
            if CPL == 3 {
                if /* no 3EH prefix */ || IA32_U_CET.NO_TRACK_EN == 0 {
                    IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
                }
            } else {
                IF /* no 3EH prefix */ || IA32_S_CET.NO_TRACK_EN == 0 {
                    IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
                }
            }
        }
    }
} // near

if /* far call */ && (PE == 0 || (PE == 1 && VM == 1)) // real-address or virtual-8086 mode {
    if OperandSize == 32 {
        if /* stack not large enough for a 6-byte return address */ {
            #SS(0);
        }
        if /* DEST[16..=31] is not zero */ {
            #GP(0);
        }
        Push(CS); // padded with 16 high-order bits
        Push(EIP);
        CS = DEST[32..=47]; // DEST is ptr16:32 or [m16:32]
        EIP = DEST[0..=31]; // DEST is ptr16:32 or [m16:32]
    } else { // OperandSize == 16
        if /* stack not large enough for a 4-byte return address */ {
            #SS(0);
        }
        Push(CS);
        Push(IP);
        CS = DEST[16..=31]; // DEST is ptr16:16 or [m16:16]
        EIP = DEST[0..=15]; // DEST is ptr16:16 or [m16:16]; clear upper 16 bits
    }
}

if /* far call */ && (PE == 1 and VM == 0) // protected mode or IA-32e Mode, not virtual-8086 mode
    if /* segment selector in target operand NULL */ {
        #GP(0);
    }
    if /* segment selector index not within descriptor table limits */ {
        #GP(new code segment selector);
    }
    /* Read type and access rights of selected segment descriptor; */
    if IA32_EFER.LMA == 0 {
        if /* segment type is not a conforming or nonconforming code segment, call
                gate, task gate, or TSS */ {
            #GP(segment selector);
        }
    } else {
        if /* segment type is not a conforming or nonconforming code segment or
                64-bit call gate */ {
            #GP(segment selector);
        }
    }
    /* Depending on type and access rights: */
        goto 'CONFORMING_CODE_SEGMENT;
        goto 'NONCONFORMING_CODE_SEGMENT;
        goto 'CALL_GATE;
        goto 'TASK_GATE;
        goto 'TASK_STATE_SEGMENT;
}

'CONFORMING_CODE_SEGMENT {
    if L-bit == 1 && D-bit == 1 && IA32_EFER.LMA == 1 {
        #GP(new code segment selector);
    }
    if DPL > CPL {
        #GP(new code segment selector);
    }
    if /* segment not present */ {
        #NP(new code segment selector);
    }
    if /* stack not large enough for return address */ {
        #SS(0);
    }
    tempEIP = DEST(Offset);
    if target mode == Compatibility mode {
        tempEIP = tempEIP && 0x0000_0000_FFFF_FFFF;
    }
    if OperandSize == 16 {
        tempEIP = tempEIP && 0x0000_FFFF; // clear upper 16 bits
    }
    if (IA32_EFER.LMA == 0 || target mode == Compatibility mode) && (/* tempEIP outside new code segment limit */)
        #GP(0);
    }
    if /* tempEIP is non-canonical */ {
        #GP(0);
    }
    if ShadowStackEnabled(CPL) {
        if OperandSize == 32 {
            tempPushLIP = CSBASE + EIP;
        } else {
            if OperandSize == 16 {
                tempPushLIP = CSBASE + IP;
            } else { // OperandSize == 64
                tempPushLIP = RIP;
            }
        }
        tempPushCS = CS;
    }
    if OperandSize == 32 {
        Push(CS); // padded with 16 high-order bits
        Push(EIP);
        CS = DEST(CodeSegmentSelector);
        // segment descriptor information also loaded
        CS(RPL) = CPL;
        EIP = tempEIP;
    } else {
        if OperandSize == 16 {
            Push(CS);
            Push(IP);
            CS = DEST(CodeSegmentSelector);
            // segment descriptor information also loaded
            CS(RPL) = CPL;
            EIP = tempEIP;
        } else { // OperandSize == 64
            Push(CS); // padded with 48 high-order bits
            Push(RIP);
            CS = DEST(CodeSegmentSelector);
            // segment descriptor information also loaded
            CS(RPL) = CPL;
            RIP = tempEIP;
        }
    }
    if ShadowStackEnabled(CPL) {
        if (IA32_EFER.LMA & DEST(CodeSegmentSelector).L) == 0 {
            // If target is legacy or compatibility mode then the SSP must be in low 4GB
            if SSP & 0xFFFF_FFFF_0000_0000 != 0 {
                #GP(0);
            }
        }
        // align to 8 byte boundary if not already aligned
        tempSSP = SSP;
        /* Shadow_stack_store 4 bytes of 0 to (SSP – 4) */
        SSP = SSP & 0xFFFF_FFFF_FFFF_FFF8;
        ShadowStackPush8B(tempPushCS); // padded with 48 high-order bits of 0
        ShadowStackPush8B(tempPushLIP); // padded with 32 high-order bits of 0 for 32 bit LIP
        ShadowStackPush8B(tempSSP);
    }
    if EndbranchEnabled(CPL) {
        if CPL == 3 {
            IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_U_CET.SUPPRESS = 0;
        } else {
            IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_S_CET.SUPPRESS = 0;
        }
    }
}

'NONCONFORMING_CODE_SEGMENT {
    if L-Bit == 1 && D-BIT == 1 && IA32_EFER.LMA == 1 {
        #GP(new code segment selector);
    }
    if (RPL > CPL) || (DPL != CPL) {
        #GP(new code segment selector);
    }
    if /* segment not present */ {
        #NP(new code segment selector);
    }
    if /* stack not large enough for return address */ {
        #SS(0);
    }
    tempEIP = DEST(Offset);
    if target mode == Compatibility mode {
        tempEIP = tempEIP && 0x0000_0000_FFFF_FFFF;
    }
    if OperandSize == 16 {
        tempEIP = tempEIP && 0x0000_FFFF; // clear upper 16 bits
    }
    if (IA32_EFER.LMA == 0 || target mode == Compatibility mode) && /* tempEIP outside new code segment limit */ {
        #GP(0);
    }
    if /* tempEIP is non-canonical */ {
        #GP(0);
    }
    if ShadowStackEnabled(CPL) {
        if IA32_EFER.LMA && CS.L {
            tempPushLIP = RIP;
        } else {
            tempPushLIP = CSBASE + EIP;
        }
        tempPushCS = CS;
    }
    if OperandSize == 32 {
        Push(CS); // padded with 16 high-order bits
        Push(EIP);
        CS = DEST(CodeSegmentSelector);
        // segment descriptor information also loaded
        CS(RPL) = CPL;
        EIP = tempEIP;
    } else {
        if OperandSize == 16 {
            Push(CS);
            Push(IP);
            CS = DEST(CodeSegmentSelector);
            // segment descriptor information also loaded
            CS(RPL) = CPL;
            EIP = tempEIP;
        } else { // OperandSize == 64
            Push(CS); // padded with 48 high-order bits
            Push(RIP);
            CS = DEST(CodeSegmentSelector);
            // segment descriptor information also loaded
            CS(RPL) = CPL;
            RIP = tempEIP;
        }
    }
    if ShadowStackEnabled(CPL) {
        if (IA32_EFER.LMA && DEST(CodeSegmentSelector).L) == 0 {
            // if target is legacy or compatibility mode then the SSP must be in low 4GB
            if SSP & 0xFFFF_FFFF_0000_0000 != 0 {
                #GP(0);
            }
        }
        // align to 8 byte boundary if not already aligned
        tempSSP = SSP;
        /* Shadow_stack_store 4 bytes of 0 to (SSP – 4) */
        SSP = SSP & 0xFFFF_FFFF_FFFF_FFF8;
        ShadowStackPush8B(tempPushCS); // padded with 48 high-order 0 bits
        ShadowStackPush8B(tempPushLIP); // padded 32 high-order bits of 0 for 32 bit LIP
        ShadowStackPush8B(tempSSP);
    }
    if EndbranchEnabled(CPL) {
        if CPL == 3 {
            IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_U_CET.SUPPRESS = 0;
        } else {
            IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_S_CET.SUPPRESS = 0;
        }
    }
}

'CALL_GATE {
    if call gate (DPL < CPL) || (RPL > DPL) {
        #GP(call-gate selector);
    }
    if /* call gate not present */ {
        #NP(call-gate selector);
    }
    if /* call-gate code-segment selector is NULL */ {
        #GP(0);
    }
    if /* call-gate code-segment selector index is outside descriptor table limits */ {
        #GP(call-gate code-segment selector);
    }
    /* Read call-gate code-segment descriptor; */
    if /* call-gate code-segment descriptor does not indicate a code segment */
            || call-gate code-segment descriptor DPL > CPL {
        #GP(call-gate code-segment selector);
    }
    if IA32_EFER.LMA == 1 && (/* call-gate code-segment descriptor is
            not a 64-bit code segment */ || /* call-gate code-segment descriptor has both L-bit and D-bit set */) {
        #GP(call-gate code-segment selector);
    }
    if /* call-gate code segment not present */ {
        #NP(call-gate code-segment selector);
    }
    if /* call-gate code segment is non-conforming */ && DPL < CPL {
        goto 'MORE_PRIVILEGE;
    } else {
        goto 'SAME_PRIVILEGE;
    }
}

'MORE_PRIVILEGE {
    if /* current TSS is 32-bit */ {
        TSSstackAddress = (new code-segment DPL ∗ 8) + 4;
        if (TSSstackAddress + 5) > current TSS limit {
            #TS(current TSS selector);
        }
        NewSS = 2 bytes loaded from (TSS base + TSSstackAddress + 4);
        NewESP = 4 bytes loaded from (TSS base + TSSstackAddress);
    } else {
        if /* current TSS is 16-bit */ {
            TSSstackAddress = (new code-segment DPL ∗ 4) + 2;
            if (TSSstackAddress + 3) > current TSS limit {
                #TS(current TSS selector);
            }
            NewSS = 2 bytes loaded from (TSS base + TSSstackAddress + 2);
            NewESP = 2 bytes loaded from (TSS base + TSSstackAddress);
        } else { // current TSS is 64-bit
            TSSstackAddress = (new code-segment DPL ∗ 8) + 4;
            if (TSSstackAddress + 7) > current TSS limit {
                #TS(current TSS selector);
            }
            NewSS = new code-segment DPL; // NULL selector with RPL = new CPL
            NewRSP = 8 bytes loaded from (current TSS base + TSSstackAddress);
        }
    }
    if IA32_EFER.LMA == 0 && /* NewSS is NULL */ {
        #TS(NewSS);
    }
    /* Read new stack-segment descriptor; */
    if IA32_EFER.LMA == 0 && (NewSS RPL != new code-segment DPL
            || new stack-segment DPL != new code-segment DPL || /* new stack segment is not a
            writable data segment */) {
        #TS(NewSS);
    }
    if IA32_EFER.LMA == 0 && /* new stack segment not present */ {
        #SS(NewSS);
    }
    if CallGateSize == 32 {
        if /* new stack does not have room for parameters plus 16 bytes */ {
            #SS(NewSS);
        }
        if /* CallGate(InstructionPointer) not within new code-segment limit */ {
            #GP(0);
        }
        SS = newSS; // segment descriptor information also loaded
        ESP = newESP;
        CS:EIP = CallGate(CS:InstructionPointer);
        // segment descriptor information also loaded
        Push(oldSS:oldESP); // from calling procedure
        temp = /* parameter count from call gate, masked to 5 bits */;
        Push(parameters from calling procedure's stack, temp);
        Push(oldCS:oldEIP); // return address to calling procedure
    } else {
        if CallGateSize == 16 {
            if /* new stack does not have room for parameters plus 8 bytes */ {
                #SS(NewSS);
            }
            if /* (CallGate(InstructionPointer) && 0xFFFF) not in new code-segment limit */ {
                #GP(0);
            }
            SS = newSS; // segment descriptor information also loaded
            ESP = newESP;
            CS:IP = CallGate(CS:InstructionPointer);
            // segment descriptor information also loaded
            Push(oldSS:oldESP); // From calling procedure
            temp = /* parameter count from call gate, masked to 5 bits */;
            Push(parameters from calling procedure's stack, temp);
            Push(oldCS:oldEIP); // Return address to calling procedure
        } else { // CallGateSize == 64
            if /* pushing 32 bytes on the stack would use a non-canonical address */ {
                #SS(NewSS);
            }
            if /* CallGate(InstructionPointer) is non-canonical */ {
                #GP(0);
            }
            SS = NewSS; // NewSS is NULL
            RSP = NewESP;
            CS:IP = CallGate(CS:InstructionPointer);
            // segment descriptor information also loaded
            Push(oldSS:oldESP); // from calling procedure
            Push(oldCS:oldEIP); // return address to calling procedure
        }
    }
    if ShadowStackEnabled(CPL) && CPL == 3 {
        if IA32_EFER.LMA == 0 {
            IA32_PL3_SSP = SSP;
        } else { // adjust so bits 63:N get the value of bit N–1, where N is the CPU's maximum linear-address width
        IA32_PL3_SSP = LA_adjust(SSP);
        }
    }
    CPL = CodeSegment(DPL)
    CS(RPL) = CPL
    if ShadowStackEnabled(CPL) {
        oldSSP = SSP;
        SSP = IA32_PLi_SSP; // where i is the CPL
        if SSP & 0x07 != 0 // if SSP not aligned to 8 bytes then #GP {
            #GP(0);
        }
        // token and CS:LIP:oldSSP pushed on shadow stack must be contained in a naturally aligned 32-byte region
        if (SSP & ~0x1F) != ((SSP – 24) & ~0x1F) {
            #GP(0);
        }
        if (IA32_EFER.LMA and CS.L) == 0 && SSP[32..=63] != 0 {
            #GP(0);
        }
        expected_token_value = SSP;
        // busy bit - bit position 0 - must be clear
        new_token_value = SSP | BUSY_BIT;
        // set the busy bit
        if shadow_stack_lock_cmpxchg8b(SSP, new_token_value, expected_token_value) != expected_token_value {
            #GP(0);
        }
        if oldSS.DPL != 3 {
            // these stack pushes should not cause faults, VM exits, or data breakpoints
            // such events will apply to the earlier accesses to the token, which is in the same naturally aligned 32-byte region
            ShadowStackPush8B(oldCS); // padded with 48 high-order bits of 0
            ShadowStackPush8B(oldCSBASE+oldRIP); // padded with 32 high-order bits of 0 for 32 bit LIP
            ShadowStackPush8B(oldSSP);
        }
    }
    if EndbranchEnabled(CPL) {
        IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
        IA32_S_CET.SUPPRESS = 0;
    }
}

'SAME_PRIVILEGE {
    if CallGateSize == 32 {
        if /* stack does not have room for 8 bytes */ {
            #SS(0);
        }
        if /* CallGate(InstructionPointer) not within code segment limit */ {
            #GP(0);
        }
        CS:EIP = CallGate(CS:EIP); // segment descriptor information also loaded
        Push(oldCS:oldEIP); // return address to calling procedure
    } else {
        if CallGateSize == 16 {
            if /* stack does not have room for 4 bytes */ {
                #SS(0);
            }
            if /* CallGate(InstructionPointer) not within code segment limit */ {
                #GP(0);
            }
            CS:IP = CallGate(CS:instruction pointer);
            // segment descriptor information also loaded
            Push(oldCS:oldIP); // return address to calling procedure
        } else { // CallGateSize == 64
            if /* pushing 16 bytes on the stack touches non-canonical addresses */ {
                #SS(0);
            }
            if /* RIP non-canonical */ {
                #GP(0);
            }
            CS:IP = CallGate(CS:instruction pointer);
            // segment descriptor information also loaded
            Push(oldCS:oldIP); // return address to calling procedure
        }
    }
    CS(RPL) = CPL;
    if ShadowStackEnabled(CPL) {
        // align to next 8 byte boundary
        tempSSP = SSP;
        /* Shadow_stack_store 4 bytes of 0 to (SSP – 4) */
        SSP = SSP & 0xFFFF_FFFF_FFFF_FFF8;
        // push cs:lip:ssp on shadow stack
        ShadowStackPush8B(oldCS); // padded with 48 high-order bits of 0
        ShadowStackPush8B(oldCSBASE + oldRIP); // padded with 32 high-order bits of 0 for 32 bit LIP
        ShadowStackPush8B(tempSSP);
    }
    if EndbranchEnabled(CPL) {
        if CPL == 3 {
            IA32_U_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_U_CET.SUPPRESS = 0;
        } else {
            IA32_S_CET.TRACKER = WAIT_FOR_ENDBRANCH;
            IA32_S_CET.SUPPRESS = 0;
        }
    }
}

'TASK_GATE {
    if task gate DPL < CPL || RPL {
        #GP(task gate selector);
    }
    if /* task gate not present */ {
        #NP(task gate selector);
    }
    /* Read the TSS segment selector in the task-gate descriptor; */
    if /* TSS segment selector local/global bit is set to local */
            || /* index not within GDT limits */ {
        #GP(TSS selector);
    }
    /* Access TSS descriptor in GDT; */
    if /* descriptor is not a TSS segment */ {
        #GP(TSS selector);
    }
    if /* TSS descriptor specifies that the TSS is busy */ {
        #GP(TSS selector);
    }
    if /* TSS not present */ {
        #NP(TSS selector);
    }
    /* SWITCH-TASKS (with nesting) to TSS; */
    if /* EIP not within code segment limit */ {
        #GP(0);
    }
}

'TASK_STATE_SEGMENT {
    if TSS DPL < CPL || RPL
            || /* TSS descriptor indicates TSS not available */ {
        #GP(TSS selector);
    }
    if /* TSS is not present */ {
        #NP(TSS selector);
    }
    /* SWITCH-TASKS (with nesting) to TSS; */ {
    if /* EIP not within code segment limit */ {
        #GP(0);
    }
}

Flags Affected

All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur.

`RET`

Return from Procedure

Instruction	Description
`RET`	Near return to calling procedure
`RET`	Far return to calling procedure

`RET imm16`	Near return to calling procedure and pop `imm16` bytes from stack
`RET imm16`	Far return to calling procedure and pop `imm16` bytes from stack

Description

Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.

The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate.

The RET instruction can be used to execute three different types of returns:

Near return
A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.
Far return
A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.
Inter-privilege-level far return
A far return to a different privilege level than that of the currently executing program or procedure. The inter-privilege-level return type can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for detailed information on near, far, and inter-privilege-level returns.

When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged.

When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.

The mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to determine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack.

If parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return. Here, the parameters are released both from the called procedure’s stack and the calling procedure’s stack (that is, the stack being returned to).

In 64-bit mode, the default operation size of this instruction is the stack-address size, i.e. 64 bits. This applies to near returns, not far returns; the default operation size of far returns is 32 bits.

Instruction ordering

Instructions following a far return may be fetched from memory before earlier instructions complete execution, but they will not execute (even speculatively) until all instructions prior to the far return have completed execution (the later instructions may execute before data stored by the earlier instructions have become globally visible).

Unlike near indirect CALL and near indirect JMP, the processor will not speculatively execute the next sequential instruction after a near RET unless that instruction is also the target of a jump or is a target in a branch predictor.

Operation

// near return
if instruction == near return {
    if OperandSize == 32 {
        if /* top 4 bytes of stack not within stack limits */ {
            #SS(0);
        }
        EIP = Pop();
        if ShadowStackEnabled(CPL) {
            tempSsEIP = ShadowStackPop4B();
            if EIP != TempSsEIP {
                #CP(NEAR_RET);
            }
        }
    } else {
        if OperandSize == 64 {
            if /* top 8 bytes of stack not within stack limits */ {
                #SS(0);
            }
            RIP = Pop();
            if ShadowStackEnabled(CPL) {
                tempSsEIP = ShadowStackPop8B();
                if RIP != tempSsEIP {
                    #CP(NEAR_RET);
                }
            }
        } else { // OperandSize == 16
            if /* top 2 bytes of stack not within stack limits */ {
                #SS(0);
            }
            tempEIP = Pop();
            tempEIP = tempEIP && 0x0000_FFFF;
            if /* tempEIP not within code segment limits */ {
                #GP(0);
            }
            EIP = tempEIP;
            if ShadowStackEnabled(CPL) {
                tempSsEip = ShadowStackPop4B();
                if EIP != tempSsEIP {
                    #CP(NEAR_RET);
                }
            }
        }
    }
    if /* instruction has immediate operand */ {
        // release parameters from stack
        if StackAddressSize == 32 {
            ESP = ESP + SRC;
        } else {
            if StackAddressSize == 64 {
                RSP = RSP + SRC;
            } else { // StackAddressSize == 16
                SP = SP + SRC;
            }
        }
    }
}

// real-address mode or virtual-8086 mode
if ((PE == 0) || (PE == 1 && VM == 1)) && instruction == far return {
    if OperandSize == 32 {
        if /* top 8 bytes of stack not within stack limits */ {
            #SS(0);
        }
        EIP = Pop();
        CS = Pop(); // 32-bit pop, high-order 16 bits discarded
    } else { // OperandSize == 16
        if /* top 4 bytes of stack not within stack limits */ {
            #SS(0);
        }
        tempEIP = Pop();
        tempEIP = tempEIP && 0x0000_FFFF;
        if /* tempEIP not within code segment limits */ {
            #GP(0);
        }
        EIP = tempEIP;
        CS = Pop(); // 16-bit pop
    }
    if /* instruction has immediate operand */ {
        // release parameters from stack
        SP = SP + (SRC && 0xFFFF);
    }
}

// protected mode, not virtual-8086 mode
if (PE == 1 && VM == 0 && IA32_EFER.LMA == 0) && instruction == far return {
    if OperandSize == 32 {
        if /* second doubleword on stack is not within stack limits */ {
            #SS(0);
        }
    } else { // OperandSize == 16
        if /* second word on stack is not within stack limits */ {
            #SS(0);
        }
    }
    if /* return code segment selector is NULL */ {
        #GP(0);
    }
    if /* return code segment selector addresses descriptor beyond descriptor table limit */ {
        #GP(selector);
    }
    /* Obtain descriptor to which return code segment selector points from descriptor table; */
    if /* return code segment descriptor is not a code segment */ {
        #GP(selector);
    }
    if return code segment selector RPL < CPL {
        #GP(selector);
    }
    if /* return code segment descriptor is conforming */
            && return code segment DPL > return code segment selector RPL {
        #GP(selector);
    }
    if /* return code segment descriptor is non-conforming */
            && return code segment DPL != return code segment selector RPL {
        #GP(selector);
    }
    if /* return code segment descriptor is not present */ {
        #NP(selector);
    }
    if return code segment selector RPL > CPL {
        goto 'RETURN_TO_OUTER_PRIVILEGE_LEVEL;
    } else {
        goto 'RETURN_TO_SAME_PRIVILEGE_LEVEL;
    }
}

'RETURN_TO_SAME_PRIVILEGE_LEVEL {
    if /* the return instruction pointer is not within the return code segment limit */ {
        #GP(0);
    }
    if OperandSize == 32 {
        EIP = Pop();
        CS = Pop(); // 32-bit pop, high-order 16 bits discarded
    } else { // OperandSize == 16
        EIP = Pop();
        EIP = EIP && 0x0000_FFFF;
        CS = Pop(); // 16-bit pop
    }
    if /* instruction has immediate operand */ {
        // release parameters from stack
        if StackAddressSize == 32 {
            ESP = ESP + SRC;
        } else { // StackAddressSize == 16
            SP = SP + SRC;
        }
    }
    if ShadowStackEnabled(CPL) {
        // SSP must be 8 byte aligned
        if SSP && 0x7 != 0 {
            #CP(FAR-RET/IRET);
        }
        tempSsCS = shadow_stack_load 8 bytes from SSP+16;
        tempSsLIP = shadow_stack_load 8 bytes from SSP+8;
        prevSSP = shadow_stack_load 8 bytes from SSP;
        SSP = SSP + 24;
        // do a 64 bit-compare to check if any bits beyond bit 15 are set
        tempCS = CS; // zero pad to 64 bit
        if tempCS != tempSsCS {
            #CP(FAR-RET/IRET);
        }
        // do a 64 bit-compare; pad CSBASE+RIP with 0 for 32 bit LIP
        if CSBASE + RIP != tempSsLIP {
            #CP(FAR-RET/IRET);
        }
        // prevSSP must be 4 byte aligned
        if prevSSP && 0x3 != 0 {
            #CP(FAR-RET/IRET);
        }
        // in legacy mode SSP must be in low 4GB
        if prevSSP[32..=63] != 0 {
            #GP(0);
        }
        SSP = prevSSP;
    }
}

'RETURN_TO_OUTER_PRIVILEGE_LEVEL {
    if /* top (16 + SRC) bytes of stack are not within stack limits (OperandSize == 32) */
            || /* top (8 + SRC) bytes of stack are not within stack limits (OperandSize == 16) */ {
        #SS(0);
    }
    /* Read return segment selector; */
    if /* stack segment selector is NULL */ {
        #GP(0);
    }
    if /* return stack segment selector index is not within its descriptor table limits */ {
        #GP(selector);
    }
    /* Read segment descriptor pointed to by return segment selector; */
    if stack segment selector RPL != RPL of the return code segment selector
            || /* stack segment is not a writable data segment */
            || stack segment descriptor DPL != RPL of the return code segment selector {
        #GP(selector);
    }
    if /* stack segment not present */ {
        #SS(StackSegmentSelector);
    }
    if /* the return instruction pointer is not within the return code segment limit */ {
        #GP(0);
    }
    if OperandSize == 32 {
        EIP = Pop();
        CS = Pop(); // 32-bit pop, high-order 16 bits discarded; segment descriptor loaded
        CS(RPL) = ReturnCodeSegmentSelector(RPL);
        if /* instruction has immediate operand */ {
            // release parameters from called procedure's stack
            if StackAddressSize == 32 {
                ESP = ESP + SRC;
            } else { // StackAddressSize == 16
                SP = SP + SRC;
            }
        }
        tempESP = Pop();
        tempSS = Pop(); // 32-bit pop, high-order 16 bits discarded; seg. descriptor loaded
    } else { // OperandSize == 16
        EIP = Pop();
        EIP = EIP && 0x0000_FFFF;
        CS = Pop(); // 16-bit pop; segment descriptor loaded
        CS(RPL) = ReturnCodeSegmentSelector(RPL);
        if /* instruction has immediate operand */ {
            // release parameters from called procedure's stack
            if StackAddressSize == 32 {
                ESP = ESP + SRC;
            } else { // StackAddressSize == 16
                SP = SP + SRC;
            }
        }
        tempESP = Pop();
        tempSS = Pop(); // 16-bit pop; segment descriptor loaded
    }
    if ShadowStackEnabled(CPL) {
        // check if 8 byte aligned
        if SSP && 0x7 != 0 {
            #CP(FAR-RET/IRET);
        }
        if ReturnCodeSegmentSelector(RPL) != 3 {
            tempSsCS = shadow_stack_load 8 bytes from SSP+16;
            tempSsLIP = shadow_stack_load 8 bytes from SSP+8;
            tempSSP = shadow_stack_load 8 bytes from SSP;
            SSP = SSP + 24;
            // do 64 bit compare to detect bits beyond 15 being set
            tempCS = CS; // zero extended to 64 bit
            if tempCS != tempSsCS {
                #CP(FAR-RET/IRET);
            }
            // do 64 bit compare; pad CSBASE+RIP with 0 for 32 bit LA
            if CSBASE + RIP != tempSsLIP {
                #CP(FAR-RET/IRET);
            }
            // check if 4 byte aligned
            if tempSSP && 0x3 != 0 {
                #CP(FAR-RET/IRET);
            }
        }
    }
    tempOldCPL = CPL;

    CPL = ReturnCodeSegmentSelector(RPL);
    ESP = tempESP;
    SS = tempSS;
    tempOldSSP = SSP;
    if ShadowStackEnabled(CPL) {
        if CPL == 3 {
            tempSSP = IA32_PL3_SSP;
        }
        if tempSSP[32..=63] != 0 {
            #GP(0);
        }
        SSP = tempSSP;
    }
    // Now past all faulting points; safe to free the token. The token free is done using the old SSP
    // and using a supervisor override as old CPL was a supervisor privilege level
    if ShadowStackEnabled(tempOldCPL) {
        expected_token_value = tempOldSSP | BUSY_BIT; // busy bit - bit position 0 - must be set
        new_token_value = tempOldSSP; // clear the busy bit
        shadow_stack_lock_cmpxchg8b(tempOldSSP, new_token_value, expected_token_value);
    }
    for SegReg in (ES, FS, GS, and DS) {
        tempDesc = /* descriptor cache for SegReg */; // hidden part of segment register
        if SegmentSelector == NULL || (tempDesc(DPL) < CPL && tempDesc(Type) is (/* data or non-conforming code */)) {
            // segment register invalid
            SegmentSelector = 0; // segment selector becomes null
        }
    }
    if /* instruction has immediate operand */ {
        // release parameters from calling procedure's stack
        if StackAddressSize == 32 {
            ESP = ESP + SRC;
        } else { // StackAddressSize == 16
            SP = SP + SRC;
        }
    }

    // IA-32e Mode
    if (PE == 1 && VM == 0 && IA32_EFER.LMA == 1) && instruction == far return {
        if OperandSize == 32 {
            if /* second doubleword on stack is not within stack limits */ {
                #SS(0);
            }
            if /* first or second doubleword on stack is not in canonical space */ {
                #SS(0);
            }
        } else {
            if OperandSize == 16 {
                if /* second word on stack is not within stack limits */ {
                    #SS(0);
                }
                if /* first or second word on stack is not in canonical space */ {
                    #SS(0);
                }
            } else { // OperandSize == 64
                if /* first or second quadword on stack is not in canonical space */ {
                    #SS(0);
                }
            }
        }
        if /* return code segment selector is NULL */ {
            #GP(0);
        }
        if /* return code segment selector addresses descriptor beyond descriptor table limit */ {
            #GP(selector);
        }
        if /* return code segment selector addresses descriptor in non-canonical space */ {
            #GP(selector);
        }
        /* obtain descriptor to which return code segment selector points from descriptor table; */
        if /* return code segment descriptor is not a code segment */ {
            #GP(selector);
        }
        if /* return code segment descriptor has L-bit == 1 and D-bit == 1 */ {
            #GP(selector);
        }
        if return code segment selector RPL < CPL {
            #GP(selector);
        }
        if /* return code segment descriptor is conforming */
                && return code segment DPL > return code segment selector RPL {
            #GP(selector);
        }
        if /* return code segment descriptor is non-conforming */
                && return code segment DPL != return code segment selector RPL {
            #GP(selector);
        }
        if /* return code segment descriptor is not present */ {
            #NP(selector);
        }
        if return code segment selector RPL > CPL {
            goto 'IA_32E_MODE_RETURN_TO_OUTER_PRIVILEGE_LEVEL;
        } else {
            goto 'IA_32E_MODE_RETURN_TO_SAME_PRIVILEGE_LEVEL;
        }
    }
}

'IA_32E_MODE_RETURN_TO_SAME_PRIVILEGE_LEVEL {
    if /* the return instruction pointer is not within the return code segment limit */ {
        #GP(0);
    }
    if /* the return instruction pointer is not within canonical address space */ {
        #GP(0);
    }
    if OperandSize == 32 {
        EIP = Pop();
        CS = Pop(); // 32-bit pop, high-order 16 bits discarded
    } else {
        if OperandSize == 16 {
            EIP = Pop();
            EIP = EIP && 0x0000_FFFF;
            CS = Pop(); // 16-bit pop
        } else { // OperandSize == 64
            RIP = Pop();
            CS = Pop(); // 64-bit pop, high-order 48 bits discarded
        }
    }
    if /* instruction has immediate operand */ {
        // release parameters from stack
        if StackAddressSize == 32 {
            ESP = ESP + SRC;
        } else {
            if StackAddressSize == 16 {
                SP = SP + SRC;
            } else { // StackAddressSize == 64
                RSP = RSP + SRC;
            }
        }
    }
    if ShadowStackEnabled(CPL) {
        if SSP && 0x7 != 0 {
            // check if aligned to 8 bytes
            #CP(FAR-RET/IRET);
        }
        tempSsCS = shadow_stack_load 8 bytes from SSP+16;
        tempSsLIP = shadow_stack_load 8 bytes from SSP+8;
        tempSSP = shadow_stack_load 8 bytes from SSP;
        SSP = SSP + 24;
        tempCS = CS; // zero padded to 64 bit
        if tempCS != tempSsCS {
            // 64 bit compare; CS zero padded to 64 bits
            #CP(FAR-RET/IRET);
        }
        if CSBASE + RIP != tempSsLIP {
            // 64 bit compare
            #CP(FAR-RET/IRET);
        }
        if tempSSP && 0x3 != 0 {
            // check if aligned to 4 bytes
            #CP(FAR-RET/IRET);
        }
        if (CS.L == 0 && tempSSP[32..=63] != 0)
                || (CS.L == 1 && /* tempSSP is not canonical relative to the current paging mode */) {
            #GP(0);
        }
        SSP = tempSSP;
    }
}

'IA_32E_MODE_RETURN_TO_OUTER_PRIVILEGE_LEVEL {
    if /* top (16 + SRC) bytes of stack are not within stack limits (OperandSize == 32) */
            || /* top (8 + SRC) bytes of stack are not within stack limits (OperandSize == 16) */ {
        #SS(0);
    }
    if /* top (16 + SRC) bytes of stack are not in canonical address space (OperandSize == 32) */
            || /* top (8 + SRC) bytes of stack are not in canonical address space (OperandSize == 16) */
            || /* top (32 + SRC) bytes of stack are not in canonical address space (OperandSize == 64) */ {
        #SS(0);
    }
    /* Read return stack segment selector; */
    if /* stack segment selector is NULL */ {
        if new CS descriptor L-bit == 0 {
            #GP(selector);
        }
        if stack segment selector RPL == 3 {
            #GP(selector);
        }
    }
    if /* return stack segment descriptor is not within descriptor table limits */ {
        #GP(selector);
    }
    if /* return stack segment descriptor is in non-canonical address space */ {
        #GP(selector);
    }
    /* Read segment descriptor pointed to by return segment selector; */
    if stack segment selector RPL != RPL of the return code segment selector
            || /* stack segment is not a writable data segment */
            || stack segment descriptor DPL != RPL of the return code segment selector {
        #GP(selector);
    }
    if /* stack segment not present */ {
        #SS(StackSegmentSelector);
    }
    if /* the return instruction pointer is not within the return code segment limit */ {
        #GP(0);
    }
    if /* the return instruction pointer is not within canonical address space */ {
        #GP(0);
    }
    if OperandSize == 32 {
        EIP = Pop();
        CS = Pop(); // 32-bit pop, high-order 16 bits discarded, segment descriptor loaded
        CS(RPL) = ReturnCodeSegmentSelector(RPL);
        if /* instruction has immediate operand */ {
            // release parameters from called procedure's stack
            if StackAddressSize == 32 {
                ESP = ESP + SRC;
            } else {
                if StackAddressSize == 16 {
                    SP = SP + SRC;
                } else { // StackAddressSize == 64
                    RSP = RSP + SRC;
                }
            }
        }
        tempESP = Pop();
        tempSS = Pop(); // 32-bit pop, high-order 16 bits discarded, segment descriptor loaded
    } else {
        if OperandSize == 16 {
            EIP = Pop();
            EIP = EIP && 0x0000_FFFF;
            CS = Pop(); // 16-bit pop; segment descriptor loaded
            CS(RPL) = ReturnCodeSegmentSelector(RPL);
            if /* instruction has immediate operand */ {
                // release parameters from called procedure's stack
                if StackAddressSize == 32 {
                    ESP = ESP + SRC;
                } else {
                    if StackAddressSize == 16 {
                        SP = SP + SRC;
                    } else { // StackAddressSize == 64
                        RSP = RSP + SRC;
                    }
                }
            }
            tempESP = Pop();
            tempSS = Pop(); // 16-bit pop; segment descriptor loaded
        } else { // OperandSize = 64
            RIP = Pop();
            CS = Pop(); // 64-bit pop; high-order 48 bits discarded; seg. descriptor loaded
            CS(RPL) = ReturnCodeSegmentSelector(RPL);
            if /* instruction has immediate operand */ {
                // Release parameters from called procedure's stack
                RSP = RSP + SRC;
            }
            tempESP = Pop();
            tempSS = Pop(); // 64-bit pop; high-order 48 bits discarded; seg. desc. loaded
        }
    }

    if ShadowStackEnabled(CPL) {
        // check if 8 byte aligned
        if SSP && 0x7 != 0 {
            #CP(FAR-RET/IRET);
        }
        if ReturnCodeSegmentSelector(RPL) != 3 {
            tempSsCS = shadow_stack_load 8 bytes from SSP+16;
            tempSsLIP = shadow_stack_load 8 bytes from SSP+8;
            tempSSP = shadow_stack_load 8 bytes from SSP;
            SSP = SSP + 24;
            // do 64 bit compare to detect bits beyond 15 being set
            tempCS = CS; // zero padded to 64 bit
            if tempCS != tempSsCS {
                #CP(FAR-RET/IRET);
            }
            // do 64 bit compare; pad CSBASE+RIP with 0 for 32 bit LIP
            if CSBASE + RIP != tempSsLIP {
                #CP(FAR-RET/IRET);
            }
            // check if 4 byte aligned
            if tempSSP && 0x3 != 0 {
                #CP(FAR-RET/IRET);
            }
        }
    }
    tempOldCPL = CPL;
    CPL = ReturnCodeSegmentSelector(RPL);
    ESP = tempESP;
    SS = tempSS;
    tempOldSSP = SSP;
    if ShadowStackEnabled(CPL) {
        if CPL == 3 {
            tempSSP = IA32_PL3_SSP;
        }
        if (CS.L == 0 && tempSSP[32..=63] != 0)
                || (CS.L == 1 && /* tempSSP is not canonical relative to the current paging mode */) {
            #GP(0);
        }
        SSP = tempSSP;
    }
    // Now past all faulting points; safe to free the token. The token free is done using the old SSP
    // and using a supervisor override as old CPL was a supervisor privilege level
    if ShadowStackEnabled(tempOldCPL) {
        expected_token_value = tempOldSSP | BUSY_BIT; // busy bit - bit position 0 - must be set
        new_token_value = tempOldSSP; // clear the busy bit
        shadow_stack_lock_cmpxchg8b(tempOldSSP, new_token_value, expected_token_value);
    }
    for each of segment register (ES, FS, GS, and DS) {
        if /* segment register points to data or non-conforming code segment */
                && CPL > segment descriptor DPL {
            // DPL in hidden part of segment register
            SegmentSelector = 0; // SegmentSelector invalid
        }
    }
    if /* instruction has immediate operand */ {
        // release parameters from calling procedure's stack
        if StackAddressSize == 32 {
            ESP = ESP + SRC;
        } else {
            if StackAddressSize == 16 {
                SP = SP + SRC;
            } else { // StackAddressSize == 64
                RSP = RSP + SRC;
            }
        }
    }
}

Flags Affected

None.

Miscellaneous Instructions

Mnemonic	Summary
`LEA`	Load Effective Address
`NOP`	No Operation

`LEA`

Load Effective Address

Instruction	Description
`LEA r16, m`	Store effective address for `m` in `r16`
`LEA r32, m`	Store effective address for `m` in `r32`
`LEA r64, m`	Store effective address for `m` in `r64`

Description

Computes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand). The source operand is a memory address (offset part) specified with one of the processors addressing modes; the destination operand is a general-purpose register. The address-size and operand-size attributes affect the action performed by this instruction, as shown in the following table. The operand-size attribute of the instruction is determined by the chosen register; the address-size attribute is determined by the attribute of the code segment.

Operand Size	Address Size	Action Performed
16	16	16-bit effective address is calculated and stored in requested 16-bit register destination.
16	32	32-bit effective address is calculated. The lower 16 bits of the address are stored in the requested 16-bit register destination.
32	16	16-bit effective address is calculated. The 16-bit address is zero-extended and stored in the requested 32-bit register destination.
32	32	32-bit effective address is calculated and stored in the requested 32-bit register destination.

Different assemblers may use different algorithms based on the size attribute and symbolic reference of the source operand.

In 64-bit mode, the instruction’s destination operand is governed by operand size attribute, the default operand size is 32 bits. Address calculation is governed by address size attribute, the default address size is 64-bits. In 64-bit mode, address size of 16 bits is not encodable. See the table below.

Operand Size	Address Size	Action Performed
16	32	32-bit effective address is calculated (using `0x67` prefix). The lower 16 bits of the address are stored in the requested 16-bit register destination (using `0x66` prefix).
16	64	64-bit effective address is calculated (default address size). The lower 16 bits of the address are stored in the requested 16-bit register destination (using `0x66` prefix).
32	32	32-bit effective address is calculated (using `0x67` prefix) and stored in the requested 32-bit register destination.
32	64	64-bit effective address is calculated (default address size) and the lower 32 bits of the address are stored in the requested 32-bit register destination.
64	32	32-bit effective address is calculated (using `0x67` prefix), zero-extended to 64-bits, and stored in the requested 64-bit register destination (using `REX.W`).
64	64	64-bit effective address is calculated (default address size) and all 64-bits of the address are stored in the requested 64-bit register destination (using `REX.W`).

Operation

DEST: first operand
SRC: second operand

OperandSize = 16

if AddressSize == 16 {
    DEST = EffectiveAddress(SRC); // 16-bit address
} else if AddressSize == 32 {
    temp = EffectiveAddress(SRC); // 32-bit address
    DEST = temp[0..=15]; // 16-bit address
} else if AddressSize == 64 {
    temp = EffectiveAddress(SRC); // 64-bit address
    DEST = temp[0..=15]; // 16-bit address
}

OperandSize = 32

if AddressSize == 16 {
    temp = EffectiveAddress(SRC); // 16-bit address
    DEST = ZeroExtend(temp); // 32-bit address
} else if AddressSize == 32 {
    DEST = EffectiveAddress(SRC); // 32-bit address
} else if AddressSize == 64 {
    temp = EffectiveAddress(SRC); // 64-bit address
    DEST = temp[0..=31]; // 16-bit address
}

OperandSize = 64

if AddressSize == 64 {
    DEST = EffectiveAddress(SRC); // 64-bit address
}

Flags Affected

None.

`NOP`

No Operation

Instruction	Description
`NOP`	One byte no-operation instruction
`NOP r/m16`	Multi-byte no-operation instruction
`NOP r/m32`	Multi-byte no-operation instruction

Description

This instruction performs no operation. It is a one-byte or multi-byte NOP that takes up space in the instruction stream but does not impact machine context, except for the EIP register.

The multi-byte form of NOP is available on processors with model encoding:

CPUID.01H.EAX[Bytes 11:8] = 0b0110 or 0b1111 The multi-byte NOP instruction does not alter the content of a register and will not issue a memory operation. The instruction’s operation is the same in non-64-bit modes and 64-bit mode.

Operation

The one-byte NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction.

The multi-byte NOP instruction performs no operation on supported processors and generates undefined opcode exception on processors that do not support the multi-byte NOP instruction.

The memory operand form of the instruction allows software to create a byte sequence of “no operation” as one instruction. For situations where multiple-byte NOPs are needed, the recommended operations (32-bit mode and 64-bit mode) are:

Length	Assembly	Byte Sequence
2 bytes	`66 nop`	`66 90`
3 bytes	`nop DWORD ptr [eax]`	`0F 1F 00`
4 bytes	`nop DWORD ptr [eax + 0x00]`	`0F 1F 40 00`
5 bytes	`nop DWORD ptr [eax + eax*1 + 0x00]`	`0F 1F 44 00 00`
6 bytes	`66 nop DWORD ptr [eax + eax*1 + 0x00]`	`66 0F 1F 44 00 00`
7 bytes	`nop DWORD ptr [eax + 0x0000_0000]`	`0F 1F 80 00 00 00 00`
8 bytes	`nop DWORD ptr [eax + eax*1 + 0x0000_0000]`	`0F 1F 84 00 00 00 00 00`
9 bytes	`66 nop DWORD ptr [eax + eax*1 + 0x0000_0000]`	`66 0F 1F 84 00 00 00 00 00`

Flags Affected

None.

x86-64 Simplified