QVM

QVM is the first and main target of the qbee compiler. It's a stack machine with multiple separate sections of memory separate from each other. Each section contains a number of memory cells. Memory cells are typed and can contain any primitive QB type, or a reference to another cells (in the same section or another section).

Memory Cells

All memory cells are initialized to a special null value. When a null cell is read for the first time, it returns the zero value of the type it's read as. For example, if a null cell is read by the readl& instruction, it will return a 32-bit signed integer value 0. But if it's read with readidxg$, it's read as an empty string. After the first read, the cell will change type to the type it was first read as. Any attempt to read it as another type, will cause the machine to panic.

This auto-initialization mechanism is the reason why the read instructions in QVM are typed ("readl%", "readl&", etc), but the store instructions are not.

Also, notice that auto-initialization only works for non-reference reads. Attempting to read a reference from a null cell will cause a panic.

Primitive Types

The primitive types that each cell can hold are:

INTEGER: 16 bit integer
LONG: 32 bit integer
SINGLE: 32 bit float
DOUBLE: 64 bit float
STRING: A variable length string. Current string length is stored inside the cell.
FIXED_STRING: A fixed-length string. Max length of the string is stored inside the cell.
REFERENCE: A reference to another cell in the same section or another section.

References

A reference contains the following fields:

Section ID: The ID of the memory section the reference points to
Index: An index to a cell in the specified memory section.

User-defined Types (Structs)

Values of user-defined types are simply contiguous ranges of cells, each potentially holding a value of a different types. If a user-defined type contains other user-defined types, these are simply expanded inside of it. For example, consider the following type:

TYPE Point
   x AS INTEGER
   y as INTEGER
End TYPE

TYPE Rect
   topleft AS Point
   bottomright as Point
END TYPE

A value of type Rect, simply contains four cells of type INTEGER. For reading a the bottomright.x field of such a value, we simply read the third cell in the region of memory where the value is laid out.

Arrays

Arrays are also contiguous ranges of memory, similar to user-defined types. The cells belonging to an array, however, all have the same type.

There are two types of arrays:

$STATIC arrays are simply contiguous ranges of similarly-typed cells. They also have a header at the beginning denoting dimensions of the array.
$DYNAMIC arrays are references to $STATIC arrays.

The header of an array contains the following cells (all have a type of LONG):

Cell 0: Reserved. Always a value of zero.
Cell 1: The number of the dimensions of an array.
Cell 2: Element size for the array (in cells)
Cell 3: The lower bound of the first dimension of the array
Cell 4: The upper bound of the first dimension of the array
Cell 5: The lower bound of the second dimension of the array
Cell 6: The upper bound of the second dimension of the array
...

So the header size depends on the dimensions of the array. A one-dimensional array has a header of five cells, for example.

Memory Sections

The VM contains the following memory sections:

Code (section id=0): This is not an actual memory section, since it can't be referenced by any other reference. Attempting to read from this section or write to it would result in a fatal error. The code section does not contain normal cells. Instead it's a byte stream of VM instructions.
Const (section id=1): Contains string literals. Can only be read from.
Globals (section id=2): Contains global variables.
Stack (section id=3): The machine stack used by most instructions to read operands from and write results to. The stack is special in that it's the only section from which data can be copied to other sections, or written to from other sections. There are no instructions to move data between other sections directly.
Heap (section id=4): This section contains dynamically allocated arrays.
Frames (section id=100..): Each SUB/FUNCTION call has a frame section created for it where arguments and local variables reside. The frame is destroyed when the routine returns and all dynamic arrays pointed to in it (by references) are freed.

Instruction Set

QVM supports the following instructions. Each instruction consists of a one byte op-code, and zero, one, or more operands.

ABS

Pops a numeric value from the stack, calculates its absolute value and pushes it back onto the stack as the same type as input.

ADD

Pops two values from the stack, adds them together and pushes the result back onto the stack. The two values should be of the same type and the result will also be of this type.

ALLOCARR n_dims, element_size

Allocates a dynamic array with n_dims dimensions and put a reference to it onto the stack. The instruction reads n_dims pairs of (lbound, ubound) values from the stack (in reverse order), as well as element size (all LONGs)

stack before: ..., dim1_lbound, dim1_ubound, ..., dimn_lbound, dimn_ubound, element_size

ARRIDX n_indices

Reads a reference to an element in an array. The operand (an unsigned 8-bit integer) is the number of indices. The reference to the array is expected to be on top of the stack and behind that will be the indices in reverse order:

stack before: idx1, idx2, ..., idxn, array_ref

stack after: element_ref

Before calculating the element reference and pushing it onto the stack, this instruction first performs array bounds checking.

The indices are expected to be of INTEGER or LONG types.

AND

Pops two values from the stack, performs bitwise AND on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.

ASC

Reads a STRING from the stack and pushes the ascii code of its first character back onto the stack as an INTEGER. If the string is empty, a trap will occur.

CALL address&

CALL has a LONG operand, denoting where the routine to call is in the code. It pushes the current value of the IP register (pointing to the instruction after CALL) onto the stack as a LONG (32-bit integer) and then sets IP to the "address" operand.

CHR

Reads an INTEGER from the stack, interprets it as an ASCII code, and returns a STRING of length one containing the character with the given code.

CINT

Pops a numeric value from the stack, rounds it and pushes it back onto the stack as an INTEGER.

CLNG

Pops a numeric value from the stack, rounds it and pushes it back onto the stack as a LONG.

CMP

Pops two values from the stack, compares them and pushes the result back onto the stack. The two values should be of the same type. For numeric operands, a numeric comparison is performed. If the first value is smaller than the second, -1% (-1 as a 16-bit integer) is pushed on the stack. If they are equal 0% is pushed and if the first is greater than the second, 1% is pushed. If the two values are STRINGs, then a lexicographic comparison is perform and -1%, 0%, or 1% is pushed onto the stack.

CONV

Performs a primitive type conversion. CONV is not a single instruction. It can take any of the following forms:

conv%&
conv%!
conv%#
conv&%
conv&!
conv&#
conv#%
conv#&
conv#! The two characters after conv determine the source and destination types. For example, conv%# converts a 16-bit INTEGER value to a 64-bit float (DOUBLE) value.

The instruction pops its argument from the stack, performs conversion, and pushes the result back onto the stack.

DEREF{type_char}

Reads a reference from the stack, then reads the value of the cell the reference points to, and pushes the value back onto the stack. This function has five variants, one for each primitive type: DEREF%, DEREF&, DEREF!, DEREF#, and DEREF$. The purpose of having these variants is the same as having them for READ instructions: to auto-initialize variables when they are first read.

DIV

Pops two values from the stack, divides the first by the second, and pushes the result back onto the stack. The two values should be of the same type. If the values are of INTEGER, LONG, or SINGLE types, the result is a SINGLE. If the values are DOUBLEs, the result is a DOUBLE.

DUPL

Duplicates the value on top of the stack.

EQ

Pops one value from the stack. This value MUST be an INTEGER. If it's 0%, -1% is pushed onto the stack, otherwise 0% is pushed.

EQV

Pops two values from the stack, performs bitwise EQV on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.

EXP

Pops two values from the stack, raises the first to the power of the second, and returns the results back onto the stack. The two values should be of the same type.

FRAME params_size, vars_size

Creates and enters a new call frame. params_size is an unsigned 16-bit integer that shows the number of cells the parameters to the sub-routines take up. These are popped from the stack and copied to the frame. vars_size is another unsigned 16-bit int that denotes the size of local variables in cells. The call frame will have params_size + vars_size cells and is linked to the caller frame.

Notice that, when FRAME is called, it is assumed that the return address (pushed by the CALL instruction) is on top of the stack. This value is pushed back onto the stack after reading the arguments.

Stack before: ..., arg1, arg2, ..., argn, return_address

Stack after: ..., return_address

GE

Pops one value from the stack. This value MUST be an INTEGER. If it's 0% or 1%, -1% is pushed onto the stack, otherwise 0% is pushed.

GT

Pops one value from the stack. This value MUST be an INTEGER. If it's 1%, -1% is pushed onto the stack, otherwise 0% is pushed.

HALT

Halts machine execution.

IDIV

Pops two values from the stack, divides the first by the second, and pushes the result back onto the stack. The two values should be of the same type, and can only be of INTEGER or LONG types. The result is of the same type as the two values.

INITARRG var_idx, n_dims, element_size

Initializes a global static array. var_idx is an unsigned short containing the variable index in the globals section. n_dims (an unsigned 8-bit integer) is the number of dimensions in the array, and element_size is a signed 32-bit integer containing the size of each element in the array in cells.

INITARRL var_idx, n_dims, element_size

Initializes a local static array. var_idx is an unsigned short containing the variable index in the current call frame. n_dims (an unsigned 8-bit integer) is the number of dimensions in the array, and element_size is a signed 32-bit integer containing the size of each element in the array in cells.

IJMP

Pops a LONG value from the stack and sets IP register to its value. This effectively performs an indirect jump. This is effectively similar to what the RET instruction does, except IJMP does not do anything with the current call frame.

IMP

Pops two values from the stack, performs bitwise IMP on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.

INT

Pops a numeric value from the stack and pushes back a LONG that is the largest number less than or equal to that value.

IO subsystem%, operation%

Performs an IO operation. "subsystem%" and "operation%" values are 16-bit integers.

JMP address&

Performs an unconditional jump to the given address, that is, sets the IP register to the value of the "address&" operand.

JZ address&

Pops an INTEGER value from the stack. If it's 0%, it jumps to the address given by the operand, otherwise does nothing.

LBOUND

Pops an integer dim_idx and a reference from the stack, interprets the reference as an array, and reads the LBOUND of the dimension dim_idx of the array, and pushes it on the stack (as a LONG).

Stack before: ..., array_ref, dim_idx

Stack after: ..., lbound

dim_idx is one based. If it's not in the valid range for the array, an INDEX_OUT_OF_RANGE trap is raised.

LCASE

Pops a STRING from the stack, converts it to lower case, and pushes back the result back on the stack.

LE

Pops one value from the stack. This value MUST be an INTEGER. If it's -1% or 0%, -1% is pushed onto the stack, otherwise 0% is pushed.

LT

Pops one value from the stack. This value MUST be an INTEGER. If it's -1%, -1% is pushed onto the stack, otherwise 0% is pushed.

MOD

Pops two values from the stack, divides the first by the second, and pushes the remainder back onto the stack. The two values should both be either INTEGER or LONG values. The result is of the same types as these values.

MUL

Pops two values from the stack, multiplies them by each other and pushes the result back onto the stack. The two values should be of the same type and the result will also be of this type.

NE

Pops one value from the stack. This value MUST be an INTEGER. If it's 0%, -1% is pushed onto the stack, otherwise 0% is pushed.

NEG

Pops one value from the stack, negates it and pushes the result back onto the stack. The value must be numeric. The result will be of the same type as that value.

NOP

Does nothing.

NOT

Pops one value from the stack which should either be an INTEGER or a LONG. It performs bitwise NOT on this value and returns the result back onto the stack. The result will be of the same type as the value.

NTOS

Reads a numeric value from the stack and pushes its STRING representation back onto the stack.

OR

Pops two values from the stack, performs bitwise OR on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.

POP

Pops a value from the stack and throws it away.

PUSH value

This instruction can be any of the following variants:

PUSH% value%
PUSH& value&
PUSH! value!
PUSH# value#
PUSH$ value$ It pushes the value of the operand onto the stack. The string value will be a 32-bit INTEGER denoting the index of a string literal in the CONST section.

PUSHc

This instruction can be any of the following variants:

PUSHM2%
PUSHM2&
PUSHM2!
PUSHM2#
PUSHM1%
PUSHM1&
PUSHM1!
PUSHM1#
PUSH0%
PUSH0&
PUSH0!
PUSH0#
PUSH1%
PUSH1&
PUSH1!
PUSH1#
PUSH2%
PUSH2&
PUSH2!
PUSH2#

Each instruction has an intrinsic value and type, and causes the intrinsic value to be pushed onto the stack. The M2 and M1 variants denote negative values, so for example PUSHM1! pushes a value of -1! (-1 as a 32-bit float) onto the stack.

PUSHREF{scope} var

This instruction has two variants: PUSHREFL and PUSHREFG. These instructions push a reference to a given variable onto the stack.

READG{t} var

This instruction has six variants: READG%, READG&, READG!, READG#, READG$, and READ@.

It reads the value of the given variable from globals section and pushes it onto the stack. "var" is a 16-bit unsigned integer, denoting the index of the cell in the globals section to read from.

If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.

READIDXG{t} var, idx

This instruction has six variants: READIDXG%, READIDXG&, READIDXG!, READIDXG#, READIDXG$, and READ@.

It reads the value at index "idx" of the given global variable. This is useful for reading fields inside a struct.

If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.

READIDXL{t} var, idx

This instruction has six variants: READIDXL%, READIDXL&, READIDXL!, READIDXL#, READIDXL$, and READIDXL@.

It reads the value at index "idx" of the given local variable. This is useful for reading fields inside a struct.

If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.

READL{t} var

This instruction has six variants: READL%, READL&, READL!, READL#, READL$, and READ@.

It reads the value of the given variable from the current call frame and pushes it onto the stack. "var" is a 16-bit unsigned integer, denoting the index of the cell in the call frame to read from.

If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.

REFIDX

Reads a index from the stack and then a reference. The index must be an INTEGER or a LONG. It then adds the index to the reference and pushes the result back on the stack.

RET

Performs a return from a SUB (so there is no return value). In particular, it performs the following operations:

Frees all dynamic values (in the heap section) pointed to by references in the current call frame.
Destroys the current frame section.
Pops a LONG value from the stack.
Sets the value of the IP register to the popped value.

RETV

Performs a return from a FUNCTION (so there is a return value). In particular, it performs the following operations:

Frees all dynamic values (in the heap section) pointed to by references in the current call frame.
Destroys the current frame section.
Pops a value from the stack, which will be the return value of the function.
Pops a LONG value from the stack.
Sets the value of the IP register to the popped address.
Pushes the return value onto the stack.

SDBL

Reads a STRING from the stack, converts it to a number and pushes the results back onto the stack as a DOUBLE. White space is stripped from the beginning of the string, and conversion is stopped when we reach the first non-numeric character (or the end of the string).

SIGN

Reads a numeric value from the stack, and pushes 1, 0, or -1 back onto the stack depending its sign. The type of the pushed value will be the same as the type of the read value.

SPACE

Reads an INTEGER n from the stack and pushes a string consisting of n spaces back onto the stack.

SUB

Pops two values from the stack, subtracts the second from the first and pushes the result back onto the stack. The two values should be of the same type and the result will also be of this type.

STOREG var%

Pops a value from the stack and stores it in the given global variable. var% is a 16-bit integer, denoting an index to a cell in the globals section to write to.

STOREIDXG var, idx

Pops a value from the stack and writes it at index "idx" of the given global variable. This is useful for writing to fields inside a struct.

STOREIDXL var, idx

Pops a value from the stack and writes it at index "idx" of the given local variable. This is useful for writing to fields inside a struct.

STOREL var%

Pops a value from the stack and stores it in the given local variable. var% is a 16-bit integer, denoting an index to a cell in the current call frame to write to.

STOREREF

Reads a reference and a value from the stack and writes the value to the cell the reference points to.

Stack before: ..., value, reference

Stack after: ...

STRFIND

Looks for the occurrence of the string str2 in string str1, starting from index "start" in str1:

Stack before: ..., start, str1, str2

Stack after: ..., result_index

The "start" index and the returned index are both LONG and one-based. If the string is not found, 0 is returned.

STRLEFT

Pops a STRING and an INTEGER n from the stack. Pushes back a new string containing the left n characters of the string.

STRLEN

Reads a STRING value from the stack, and pushes its length back onto the stack as a LONG.

STRMID

Reads a STRING and an integer ("start") and either another integer or a long value ("length") from the stack, then it reads a sub-string starting at "start" with the given "length". If "length" is LONG, its value is ignored and a sub-string until the end of the string will be created. The result will be pushed back onto the stack.

Note: The start index is assumed to be 1-based.

Note: Non-positive values for start and length would cause an error.

STRREP

Reads a STRING or ASCII code, as well as an integer n from the stack, and pushes back a string containing the n characters with the given ASCII code or the first character of the given string n times.

Stack before: ..., n, char

Stack after: ..., result

With char=65 and n=3, result will be "AAA". With char="XYZ" and n=3, result will be "XXX".

STRRIGHT

Pops a STRING and an INTEGER n from the stack. Pushes back a new string containing the right n characters of the string.

SWAP

Swaps the two top values on the stack.

Stack before: ..., value1, value2

Stack after: ..., value2, value1

SWAPPREV

Swaps the third top most item on the stack with the second top most one.

Stack before: ..., value1, value2, value3

Stack after: ..., value2, value1, value3

UBOUND

Pops an integer dim_idx and a reference from the stack, interprets the reference as an array, and reads the UBOUND of the dimension dim_idx of the array, and pushes it on the stack (as a LONG).

Stack before: ..., array_ref, dim_idx

Stack after: ..., lbound

dim_idx is one based. If it's not in the valid range for the array, an INDEX_OUT_OF_RANGE trap is raised.

UCASE

Pops a STRING from the stack, converts it to upper case, and pushes back the result back on the stack.

XOR

Pops two values from the stack, performs bitwise XOR on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.