Skip to content

AMx64 is a simulated 64-bit environment that can interpret nasm-like asm code. It allows a usage of different 64-bit registers and 64-bit addressable memory. It also has a build-in debugger called amdb.

AleksaMCode/AMx64

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMx64

AMx64 is a simplified 64-bit processor simulator implemented in C#. It comes with a built-in, assembly language loosely based around NASM. The processor acts as 64-bit machine code interpreter with its own instruction set that includes integer computations. The motivation behind this project was a better understanding of NASM and assembly language.

Table of contents

Usage

To start the interpreter, all you need to do is run the following command:

.\AMx64.exe program.asm

You can use --help (abbreviated -h) with no arguments to display a short list of commands available in AMx64,

or you can run your program in debug mode by using a --debug or -d option.

.\AMx64.exe program.asm -d

CPU details

Registers are small storage cells built directly into a processor that are vastly faster than main memory (RAM) but are also more expensive per byte. Because of this price factor, there is not typically much room in a processor for storing data. The execution of a typical program is: move data from memory to registers, perform computations, move processed data from registers to memory and repeat.

General-purpose registers are used for processing integral instructions (the most common type) and are under the complete control of the programmer.

NOTE:

If you modify a subdivision of a register, the other subdivisions of that register will see the change.

The AMASM Language (AMx64 Assembly Language)

AMx64 comes with built-in assembly language loosely based around NASM or Intel syntax. Before we describe the syntax of operations and other utilities, we need to go over some of the basics of AMx64 assembly language.

Sections

In a typical assembly language, your program is broken up into several sections.

NOTE:

  • .data and .bss sections must come before .text section in asm code.
  • .rodata section isn't supported.

Data Section (.data)

The data section holds all variables that are initialized to specific values. This will typically be used only for global variables. The initialized data must be declared in the section .data section. There must be a space after the word section. All initialized variables and constants are placed in this section. Variable names must start with a letter, followed by letters or numbers, including a special character, underscore. Variable definitions must include the name, the data type, and the initial value for the variable.

BSS Section (.bss)

The BSS section (Block Started by Symbol) is a section that has no contents in terms of data or instructions. It consists only of a number that represents its length that the operating system then expands upon program initialization to a zero-filled, contiguous block of memory with said length (hence the name). This is used when you would ordinarily put something in the data section, but you don’t care about initializing it to a specific value. Uninitialized data must be declared in the section .bss section. There must be a space after the word section. All uninitialized variables are declared in this section. Variable names must start with a letter, followed by letters or numbers, including a special character, underscore. Variable definitions must include the name, the data type, and the count.

Text Section (.text)

The text section holds all of your executable code and will typically dwarf the other sections in terms of size. The code is placed in the section .text section. There must be a space after the word section. The instructions are specified one per line, and each must be a valid instruction with the appropriate required operands. The text section will include some headers or labels that define the initial program entry point. The following declarations must be included.

global main
main:

NOTE:

AMx64 require asm file to define the program entry point, where execution will begin when the program is run. In NASM you specify the entry point by declaring the special symbol ..start at the point where you wish execution to begin. In AMASM you can use user-defined entry point.

Layout of a AMASM Source Line

Like most assemblers, each AMASM source line contains some combination of the four fields

label: instruction operands ; comment

As usual, most of these fields are optional; the presence or absence of any combination of a label, an instruction and a comment is allowed. Of course, the operand field is either required or forbidden by the presence and nature of the instruction field. It doesn't support multiline commands that are available in NASM using the backslash character (\) as the line continuation character.

AMASM places no restrictions on white space within a line: labels may have white space before them, or instructions may have no space before them, or anything. The colon after a label is also optional.

Pseudo-Instructions

Pseudo-instructions are things which, though not real x86 machine instructions, are used in the instruction field anyway because that’s the most convenient place to put them. The current pseudo-instructions are DB, DW, DD and DQ; their uninitialized counterparts RESB, RESW, RESD and RESQ.

NOTE:

  • The INCBIN command, the EQU command, and the TIMES prefix are not currently available.
  • Pseudo-instructions DT, DO, DY, REST, RESO and RESY are also not available.

DB and Friends: Declaring Initialized Data

DB, DW, DD and DQ are used, much as in MASM, to declare initialized data in the output file. They can be invoked in a wide range of ways:

  db 0x55 ; just the byte 0x55
  db 0x55,0x56,0x57 ; three bytes in succession
  db 'a',0x55 ; character constants are OK
  db 'hello',13,10,'$' ; so are string constants
  dw 0x1234 ; 0x34 0x12
  dw 'a' ; 0x61 0x00 (it’s just a number)
  dw 'ab' ; 0x61 0x62 (character constant)
  dw 'abc' ; 0x61 0x62 0x63 0x00 (string)
  dd 0x12345678 ; 0x78 0x56 0x34 0x12
  dq 0x123456789abcdef0 ; eight byte constant

RESB and Friends: Declaring Uninitialized Data

RESB, RESW, RESD and RESQ are designed to be used in the BSS section of a module: they declare uninitialized storage space. Each takes a single operand, which is the number of bytes, words, doublewords or whatever to reserve. For example:

  buffer resb 64 ; reserve 64 bytes
  wordvar resw 1 ; reserve a word

Numeric Constants

A numeric constant is simply a number. Number values may be specified in decimal, hex, or octal. AMASM allows you to specify numbers in a variety of number bases, in a variety of ways: you can suffix H or X, D or T, Q or O, and B or Y for hexadecimal, decimal, octal and binary respectively, or you can prefix 0x, for hexadecimal in the style of C. In addition, AMASM accept the prefix 0h for hexadecimal, 0d or 0t for decimal, 0o or 0q for octal, and 0b or 0y for binary. Please note that unlike C, a 0 prefix by itself does not imply an octal constant!

Some examples (all producing exactly the same code):

 mov ax,200 ; decimal
 mov ax,0200 ; still decimal
 mov ax,0200d ; explicitly decimal
 mov ax,0d200 ; also decimal
 mov ax,0c8h ; hex
 mov ax,0xc8 ; hex yet again
 mov ax,0hc8 ; still hex
 mov ax,310q ; octal
 mov ax,310o ; octal again
 mov ax,0o310 ; octal yet again
 mov ax,0q310 ; hex yet again
 mov ax,11001000b ; binary
 mov ax,1100_1000b ; same binary constant
 mov ax,0b1100_1000 ; same binary constant yet again

NOTE:

Numeric constants can have underscores ('_') interspersed to break up long strings.

Character Strings

In addition to numeric data, symbolic (non-numeric) data is often required. Consequently, the symbols are represented by assigning numeric values to each symbol or character. A character is typically stored in a byte (8-bits) of space. This works well since memory is byte addressable. Examples of characters include letters, numerical digits, common punctuation marks (such as '.' or '!'), and whitespace.
A character string consists of up to eight characters enclosed in either single quotes ('...'), double quotes ("...") or backquotes (`...`). Single or double quotes are equivalent to NASM (except of course that surrounding the constant with single quotes allows double quotes to appear within it and vice versa); the contents of those are represented verbatim. The general concept also includes control characters, which do not correspond to symbols in a particular language, but to other information used to process text. Examples of control characters include carriage return or tab.

Strings enclosed in backquotes support C−style \–escapes for special characters. The following escape sequences are recognized by backquoted strings:

\’ single quote (’)
\" double quote (")
\‘ backquote (`)
\\ backslash (\)
\? question mark (?)
\a BEL (ASCII 7)
\b BS (ASCII 8)
\t TAB (ASCII 9)
\n LF (ASCII 10)
\v VT (ASCII 11)
\f FF (ASCII 12)
\r CR (ASCII 13)
\e ESC (ASCII 27)
\377 Up to 3 octal digits − literal byte
\xFF Up to 2 hexadecimal digits − literal byte

NOTE:

  • Character literals don't currently support quotes '"'.

  • Unicode character escapes are not yet supported.

  • Characters can be displayed to the console, but cannot be used for calculations. Integers can be used for calculations, but cannot be displayed to the console (without changing the representation).

Character Constants

A character constant consists of a string up to eight bytes long, used in an expression context. It is treated as if it was an integer. A character constant with more than one byte will be arranged with little−endian order in mind: if you code

mov eax,'abcd'

then the constant generated is not 0x61626364, but 0x64636261, so that if you were then to store the value into memory, it would read abcd rather than dcba.

String Constants

String constants are character strings used in the context of some pseudo−instructions, namely the DB family. A string constant looks like a character constant, only longer. It is treated as a concatenation of maximum−size character constants for the conditions. So the following are equivalent:

db 'hello' ; string constant
db 'h','e','l','l','o'   ; equivalent character constants

NOTE:

When used in a string−supporting context, quoted strings are treated as a string constants even if they are short enough to be a character constant, because otherwise db ’ab’ would have the same effect as db ’a’, which would be silly.

Comments

The semicolon (';') is used to note program comments. Comments (using the ';') may be placed anywhere, including after an instruction. Any characters after the ';' are ignored by the interpreter. This can be used to explain steps taken in the code or to comment out sections of code.

Labels

A program label is the target, or a location to jump to, for control statements. For example, the start of a loop might be marked with a label such as “loopStart”. The code may be re-executed by jumping to the label. Generally, a label starts with a letter, followed by letters, numbers, or symbols (limited to '_'), terminated with a colon (':').

NOTE:

  • Local labels aren't available.
  • Program labels may be defined only once.

Operand/Address Size (Data Storage Sizes)

The x86-64 architecture supports a specific set of data storage size elements, all based on powers of two. To specify a size of an operand, simply preface the operands or operand with a mnemonic for the size you want. In a situation when you have, for instance add qword rax, rbx, size is perfectly valid but redundant. These sizes are not case-sensitive. You should already be quite aware that addresses can have different sizes. Almost any instruction that references memory must use one of the prefixes BYTE, WORD, DWORD or QWORD to indicate what size of memory operand it refers to (e.q. add byte rax, [rbx]). The supported storage sizes are as follows:

Storage Size
(bits)
Size
(bytes)
BYTE 8-bits 1 byte
WORD 16-bits 2 byte
DWORD
(Double-word)
32-bits 4 byte
QWORD
(Quadword)
64-bits 8 byte

Supported instructions

This chapter provides a basic overview for a simple subset of the x86-64 instruction set, focusing on the integer operation. This section summarizes the notation used, is fairly common in the technical literature. In general, an instruction will consist of the instruction or operation itself (e.q., add, sub, etc.) and the operands. The operands refer to where the data (to be operated on) is coming from and/or where the result is to be placed.

NOTE:

Instructions, register and variable names are case-insensitive.

ADD - Add

Adds the destination operand (first operand) and the source operand (second operand) and then stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

Usage:

ADD r, imm/r/m
ADD m, imm/r

Format:

DEST ← DEST + SRC;

Flags affected:

  1. ZF is set if the result is zero; it's cleared otherwise.
  2. SF is set if the result is negative; it's cleared otherwise.
  3. PF is set if the result has even parity in the low 8 bits; it's cleared otherwise.
  4. CF is set if the addition caused a carry-out from the high bit; it's cleared otherwise.
  5. OF is set if the addition resulted in arithmetic under/overflow; it's cleared otherwise.

NOTE:

The ADD instruction performs integer addition.

SUB - Subtract

Subtracts the second operand (source operand) from the first operand (destination operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the source operand can be an immediate, register, or memory location. (However, two memory operands cannot be used in one instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination operand format.

Usage:

SUB r, imm/r/m
SUB m, imm/r

Format:

DEST ← (DEST – SRC);

Flags affected:

  1. ZF is set if the result is zero; it's cleared otherwise.
  2. SF is set if the result is negative; it's cleared otherwise.
  3. PF is set if the result has even parity in the low 8 bits; it's cleared otherwise.
  4. CF is set if the subtraction caused a borrow from the low 4 bits; it's cleared otherwise.
  5. OF is set if the subtraction resulted in arithmetic under/overflow; it's cleared otherwise.

NOTE:

The SUB instruction performs integer subtraction.

AND - Bitwise AND

Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; otherwise, it is set to 0.

Usage:

AND r, imm/r/m
AND m, imm/r

Format:

DEST ← DEST AND SRC;

Flags affected:

  1. ZF is set if the result is zero; it's cleared otherwise.
  2. SF is set if the result is negative; it's cleared otherwise.
  3. PF is set if the result has even parity in the low 8 bits; it's cleared. otherwise.
  4. CF and OF are cleared.

OR - Bitwise OR

Performs a bitwise inclusive OR operation between the destination (first) and source (second) operands and stores the result in the destination operand location. The source operand can be an immediate, a register, or a memory location; the destination operand can be a register or a memory location. (However, two memory operands cannot be used in one instruction.) Each bit of the result of the OR instruction is set to 0 if both corresponding bits of the first and second operands are 0; otherwise, each bit is set to 1.

Usage:

OR r, imm/r/m
OR m, imm/r

Format:

DEST ← DEST OR SRC;

Flags affected:

  1. ZF is set if the result is zero; it's cleared otherwise.
  2. SF is set if the result is negative; it's cleared otherwise.
  3. PF is set if the result has even parity in the low 8 bits; it's cleared
  4. CF and OF are cleared.

NOT - Bitwise NOT

Performs a bitwise NOT operation (each 1 is set to 0, and each 0 is set to 1) on the destination operand and stores the result in the destination operand location. The destination operand can be a register or a memory location.

Usage:

NOT r/m

Format:

DEST ← NOT DEST;

NOTE:

It doesn't affect flags.

MOV - Move

Copies the second operand (source operand) to the first operand (destination operand). The source operand can be an immediate value, general-purpose register or memory location; the destination register can be a general-purpose register or memory location. Both operands must be the same size, which can be a byte, a word, a doubleword, or a quadword.

Usage:

MOV r, imm/r/m
MOV m, imm/r

NOTE:

It doesn't affect flags.

CMP - Compare

Compares the first source operand with the second source operand and sets the status flags in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as an operand, it is sign-extended to the length of the first operand. SUB should be used in place of CMP when the result is needed. The condition codes used by the Jcc instructions are based on the results of a CMP instruction.

Usage:

CMP r, imm/r/m
CMP m, imm/r

Format:

temp ← SRC1 − SignExtend(SRC2);
ModifyStatusFlags;

Flags affected:

  1. ZF is set if the result is zero; it's cleared otherwise.
  2. SF is set if the result is negative; it's cleared otherwise.
  3. PF is set if the result has even parity in the low 8 bits; it's cleared otherwise.
  4. CF is set if the subtraction caused a borrow from the low 4 bits; it's cleared otherwise.
  5. OF is set if the subtraction resulted in arithmetic under/overflow; it's cleared otherwise.

JMP - Unconditional Jump

Jumps execution to the provided location in a program, denoted with a program label. This instruction does not depend on the current conditions of the flag bits in the EFLAG register. Transfer of control may be forward, to execute a new set of instructions or backward, to re-execute the same steps.

Usage:
JMP label

NOTE:

It doesn't affect flags.

Jcc - Jump if Condition Is Met (Conditional Jump)

Jcc is not a single instruction, it describes the jump mnemonics that checks the condition code before jumping. If some specified condition is satisfied in a conditional jump, the control flow is transferred to a target instruction. These instructions form the basis for all conditional branching. There are numerous conditional jump instructions depending upon the condition and data.

Two steps are required for a Jcc; the compare instruction and the conditional jump instruction. The conditional jump instruction will jump or not jump to the provided label based on the result of the previous comparison operation. The compare instruction will compare two operands and store the results of the comparison in the EFLAG register. This requires that the compare instruction is immediately followed by the conditional jump instruction. If other instructions are placed between the compare and conditional jump, the EFLAG register will be altered, and the conditional jump may not reflect the correct condition.

Usage:

Jcc label
Instruction Description Flags tested Condition
JE Jump Equal ZF ZF == 1
JNE Jump not Equal ZF ZF == 0
JGE Jump Greater/Equal OF, SF SF == 0
JL Jump Less OF, SF SF != 0

NOTE:

It doesn't affect flags.

PUSH

Decrements the stack pointer and then stores the source operand on the top of the stack. The size parameter determines the size of the value that is pushed.

Usage:
PUSH imm/r/m

NOTE:

  • It doesn't affect flags.
  • The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is decremented (2, 4 or 8).
  • If the source operand is an immediate of size less than the operand size, a sign-extended value is pushed on the stack.

POP

Loads the value from the top of the stack to the location specified with the destination operand (or explicit opcode) and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register.

Usage:
POP r/m

Format:

pop value;
dest ← value;

NOTE:

  • It doesn't affect flags.
  • The operand size (16, 32, or 64 bits) determines the amount by which the stack pointer is incremented (2, 4 or 8).

Memory

A memory value is an expression that evaluates to the address of some value in memory. In AMx64 assembly language, addresses are enclosed in brackets “[…]” with the address expression inside.

NOTE:

Despite the fact that you can use 64-bit address, you only have 2 GB of memory available due to internal limits of C# in Visual Studio.

Stack

In a computer, a stack is a type of data structure where items are added and then removed from the stack in reverse order. That is, the most recently added item is the very first one that is removed. This is often referred to as Last-In, First-Out (LIFO). A stack is heavily used in programming for the storage of information during procedure or function calls.

In most languages (even low-level ones) the stack is completely hidden from the programmer. In these languages we can only indirectly impact it. We already know that declaring a variable sets aside space on the stack, and that calling a function uses the stack as well. The difference now is that in assembly language, the programmer is responsible for managing the stack.

The stack is managed by RBP and RSP (base pointer and stack pointer). Upon program initialization, RBP and RSP are set to the address of the top of the stack, which begins at the high side of the program's available memory and grows downward. Because of this, RSP will always point to the most-recently-added item on the stack. To add an item to the stack you can use the PUSH instruction. To remove an item, you can use the POP instruction.

NOTE:

  • You can't push 8 bit value on stack.
  • RSP can modified directly without damaging the stack structure, but care should be taken when doing so.

Registers

Register operand refers to the contents of a register. AMx64 has a total of 16 registers, but not all of them are currently in use. To refer to one of the available registers, you simply need to designate the name of the partition you want to use (e.q. RAX, RBX, etc.). The register name you use indicates the size of the operand (i.e. how much data is moved, processed, etc.). For instance, using EAX to load a value from memory (e.g. mov eax, [var]) loads a 32-bit value from memory into the 32-bit partition of RAX.

AMx64 uses the following names for general-purpose registers in 64-bit mode. This is consistent with the AMD/Intel documentation and most other assemblers.

General-Purpose Registers (GPRs)

There are sixteen, 64-bit General Purpose Registers (GPRs). The currently available GPRs are described in the following table. A GPR register can be accessed with all 64-bits or some portion or subset accessed.

Naming conventions 64 bits 32 bits 16 bits High 8 bits Low 8 bits
Accumulator RAX EAX AX AH AL
Base RBX EBX BX BH BL
Counter RCX ECX CX CH CL
Data RDX EDX DX DH DL
Stack pointer RSP ESP SP
Stack base pointer RBP EBP BP
Source index RSI ESI SI
Destination index RDI EDI DI

When using data element sizes less than 64-bits (e.q. 32-bit, 16-bit, or 8-bit), the lower portion of the register can be accessed by using a different register name as shown in the table.

NOTE:

Some of the GPR registers are used for dedicated purposes as described inthe later sections.

FLAGS register

Status register contains the current state of the processor. This register stores status information about the instruction that was just executed. It's 16 bits wide. Its successors, the EFLAGS and RFLAGS registers, are 32 bits and 64 bits wide, respectively. The wider registers retain compatibility with their smaller predecessors, as it is the case with the other registers. AMx64 flags register conforms to Intel x86_64 standard; not all bits are used in the current version.

Bit Mark Abbreviation Name

Description

=1 =0 Implementation status
0 0x0001

CF

Carry flag

Set if the last arithmetic operation carried (addition) or borrowed (subtraction) a bit beyond the size of the register. This is then checked when the operation is followed with an add-with-carry or subtract-with-borrow to deal with values too large for just one register to contain.

CY (Carry) NC (No Carry)

2 0x0004

PF

Adjust flag

Carry of Binary Code Decimal (BCD) numbers arithmetic operations.

AC (Auxiliary Carry) NA (No Auxiliary Carry)

4 0x0010

AF

Parity flag

Set if the number of set bits in the least significant byte is a multiple of 2.

PE (Parity Even) PO (Parity Odd)

6 0x0040

ZF

Zero flag

Set if the result of an operation is Zero (0).

ZR (Zero) NZ (Not Zero)

7 0x0080

SF

Sign flag

Set if the result of an operation is negative.

NG (Negative) PL (Positive)

8 0x0100

TF

Trap flag

Set if step by step debugging.

9 0x0200

IF

Interrupt enable flag

Set if interrupts are enabled.

EI (Enable Interrupt) DI (Disable Interrupt)

10 0x0400

DF

Direction flag

Stream direction. If set, string operations will decrement their pointer rather than incrementing it, reading memory backwards.

DN (Down) UP (Up)

11 0x0800

OF

Overflow flag

Set if signed arithmetic operations result in a value too large for the register to contain.

OV (Overflow) NV (Not Overflow)

12-13 0x3000

IOPL

I/O privilege level

I/O Privilege Level of the current process.

Addressing modes for data

The addressing mode indicates the manner in which the operand is presented, or the addressing modes are the supported methods for accessing a value in memory using the address of a data item being accessed (read or written). This might include the name of a variable or the location in an array.

NOTE:

The only way to access memory is with the brackets ("[]"). Omitting the brackets will not access memory and instead obtain the address of the item.

Register (direct) Addressing

+------+-----+-----+
| mov  | reg1| reg2| reg1:=reg2
+------+-----+-----+

This "addressing mode" does not have an effective address and is not considered to be an addressing mode on some computers. In this example, all the operands are in registers, and the result is placed in a register. E.q.

mov ax, bx  ; moves contents of register bx into ax

Immediate (literal) Addressing

+------+-----+----------------+
| add  | reg1|    constant    |    reg1 := reg1 + constant;
+------+-----+----------------+

This "addressing mode" does not have an effective address, and is not considered to be an addressing mode on some computers. E.q.

mov ax, 1 ; moves value of 1 into register ax

moves a value of 1 into register ax. Instead of using an operand from memory, the value of the operand is held within the instruction itself.

Direct Memory Addressing

Direct memory mode addressing means that the operand is a location in memory (accessed via an address). This is also referred to as indirection or dereferencing.

mov qword rax, [var] ; copy var content into rax

This instruction will access the memory location of the variable var and retrieve the value stored there. This requires that the CPU wait until the value is retrieved before completing the operation, and thus might take slightly longer to complete than a similar operation using an immediate value.

NOTE:

Direct offset addressing is not currently supported.

Register Indirect Addressing

For example, when accessing arrays, a more generalized method is usually required. Specifically, an address can be placed in a register and indirection performed using the register (instead of the variable name). E.q.

mov rbx, var
mov dword eax, [rbx]

Calling System Services

When calling system services, arguments are placed in the standard argument registers. System services do not typically use stack-based arguments. This limits the arguments of system services to six. To call a system service, the first step is to determine which system service is desired. The general process is that the system service call code is placed in the RAX register. The call code is a number that has been assigned for the specific system service being requested. These are assigned as part of the operating system and cannot be changed by application programs. AMx64 uses a very small subset of system service call codes as a set of constants. If any are needed, the arguments for system services are placed in the RDI, RSI, RDX, RCX, R8 and R9 registers (in that order). The following table shows the argument locations which are consistent with the standard calling convention.

Register

Usage

RAX

Call code

RDI

1st argument

RSI

2nd argument

RDX

3rd argument

RCX

4th argument

R8

5th argument

R9

6th argument

Each system call will use a different number of arguments (from none up to 6). However, the system service call code is always required. After the call code and any arguments are set, the syscall instruction is executed. The syscall instruction will pause the interpret process and will attempt to perform the service specified in the RAX register. When the system service returns, the interpret process will be resumed.

NOTE:

R8 and R9 registers are not currently available for usage.

Return Codes

The system call will return a code in the RAX register. If the value returned is less than 0, that is an indication that an error has occurred. If the operation is successful, the value returned will depend on the specific system service.

Call Code (RAX) System Service Description

0

sys_read

Read characters - If unsuccessful, returns negative value. If successful, returns a count of characters actually read.
RDI - file descriptor
RSI - address of where to store characters
RDX - number of characters to read

1

sys_write

Write characters - If unsuccessful, returns negative value. If successful, returns a count of characters actually written.
RDI - file descriptor
RSI - address of characters where to write
RDX - number of characters to write

60

sys_exit

Terminate executing process.
RDI - exit status

Console Output

The system service to output characters to the console is the system write (sys_write). Like a high-level language, characters are written to standard out (stdout) which is the console. The stdout is the default file descriptor for the console. The file descriptor is already opened and available for use in programs (assembly and high-level languages). The arguments for the write system service are as follows:

Register sys_write
RAX Call code = sys_write (1)
RDI Output location, stdout (1)
RSI Address of characters to output
RDX Number of characters to output

Console Input

The system service to read characters from the console is the system read (sys_read). Like a high-level language, for the console, characters are read from standard input (stdin). The stdin is the default file descriptor for reading characters from the keyboard. The file descriptor is already opened and available for use in program (assembly and high-level languages).

When using the system service to read from the keyboard, much like the write system service, the number of characters to read is required. Of course, we will need to declare an appropriate amount of space to store the characters being read. If we request 10 characters to read and the user types more than 10, the additional characters will be lost.

Register sys_read
RAX Call code = sys_read (0)
RDI Input location, stdin (0)
RSI Address of where to store characters read
RDX Number of characters to read

Proper way to end asm code

No special label or directives are required to terminate the program. However, to terminate asm code properly you should do the following:

mov rax, 60
mov rdi, 0
syscall

These instructions indicate that the program ends correctly. If the program terminates unsuccessfully, it should store value 1 inside the RDI register.

Debug - AMDB

A debugger allows the user to control execution of a program, examine variables and other memory. AMDB is loosely based on GDB. Once the debugger is started, in order to effectively use the debugger, an initial breakpoint must be set. Once a breakpoint is set, the run (or r) command can be performed. The breakpoint is indicated with a red line number on the left, and the current location is indicated with a green asm line (see example below). Specifically, the green line points to the next instruction to be executed. That is, the green asm line has not yet been executed.

Getting Help

You can always ask amdb itself for information on its commands, using the command help. You can use help (abbreviated h) with no arguments to display a short list of commands.

Setting Breakpoints

Breakpoints are set with the break command (abbreviated b). This command tells amdb to pause interpretation of your program at some point to allow you to inspect the value of variables and other memory locations. It will pause interpretation just before the specified line number is interpreted.

break [breakpoints]

Set a breakpoint(s) at the given location(s) (line number(s)). The breakpoint will stop your program just before it executes any of the code in the specified location. E.q. break 2 3 4.

As needed, additional breakpoints can be set. However, the run command will re-start execution from the beginning and stop at the initial breakpoint.

Deleting Breakpoints

It is often necessary to eliminate a breakpoint once it has done its job and you no longer want your interpretation to stop there. This is called deleting the breakpoint. A breakpoint that has been deleted no longer exists. You can delete breakpoints using the d (or delete) command.

delete
Deletes all available breakpoints.
delete [breakpoints]
Deletes all available breakpoints. E.q. delete 2 3 4.

Display Source Code

You can display your source code inside amdb using the l (or list) command. amdb will print 7 lines of source code at a time, with a line number at the start of each line. The current line is always highlighted with a green color.

Examine Memory (Display Memory/Register Contents)

Once you have paused a interpretation, you can use the p (or print) command to print the values of variables, specified memory locations or registers.

print
Shows internal state of all available registers as well as the values of flags inside of the FLAGS register.

print register
Shows value stored in a specified register. E.q. print RAX.
print size variable
Shows value stored in memory starting from the memory location which is referenced using a variable. E.q. print word hello_msg.
print size memory_location
Shows value stored in memory starting from the memory location. E.q. print word 0x000000000000000A. Memory location can be set in a hex format (e.q. 0x000000000000007B) or in a decimal format (e.q. 123).

Continuing and Stepping

Continuing means resuming file interpretation until your it completes normally. In contrast, stepping means executing just one more “step” of your interpreter, where “step” means one line of source code. When continuing, the interpreter may stop even sooner, due to a breakpoint.

continue or c
Continues interpretation until the end of the file or until it reaches the next breakpoint.
step or s
Interprets the current and stops interpretation on the next line.

Quitting

To quit from a amdb session, type q (short for quit) or quit. amdb will ask if you really want to quit. If you do, type y followed by the Enter key. This check may see a little unnecessary, but it helps prevent people quitting accidentally at that crucial moment in a lengthy debugging session.

References

Books

Links

Github projects

Some of the projects that helped me create my project.

To-Do List

  • Add Direct memory addressing.
  • Add Direct offset addressing.
  • Add Register indirect addressing.
  • Implement Stack memory structure.
    • Implement push and pop instructions.
  • Implement 64-bit addressable memory.
  • Implement assembler sections (.data, .bss, .text).
  • Implement C-style character escapes.
  • Implement character constants.
  • Add pseudo-instruction EQU.
  • Build an amdbui.
  • Implement paging memory management.
  • Implement Debugging with step-back.

About

AMx64 is a simulated 64-bit environment that can interpret nasm-like asm code. It allows a usage of different 64-bit registers and 64-bit addressable memory. It also has a build-in debugger called amdb.

Topics

Resources

Stars

Watchers

Forks