-
Notifications
You must be signed in to change notification settings - Fork 0
QVM
QVM is the first and main target of the qbee compiler. It's a stack machine with multiple separate sections of memory separate from each other. Each section contains a number of memory cells. Memory cells are typed and can contain any primitive QB type, or a reference to another cells (in the same section or another section).
All memory cells are initialized to a special null value. When a null cell is read for the first time, it returns the zero value of the type it's read as. For example, if a null cell is read by the readl& instruction, it will return a 32-bit signed integer value 0. But if it's read with readidxg$, it's read as an empty string. After the first read, the cell will change type to the type it was first read as. Any attempt to read it as another type, will cause the machine to panic.
This auto-initialization mechanism is the reason why the read instructions in QVM are typed ("readl%", "readl&", etc), but the store instructions are not.
Also, notice that auto-initialization only works for non-reference reads. Attempting to read a reference from a null cell will cause a panic.
The primitive types that each cell can hold are:
- INTEGER: 16 bit integer
- LONG: 32 bit integer
- SINGLE: 32 bit float
- DOUBLE: 64 bit float
- STRING: A variable length string. Current string length is stored inside the cell.
- FIXED_STRING: A fixed-length string. Max length of the string is stored inside the cell.
- REFERENCE: A reference to another cell in the same section or another section.
A reference contains the following fields:
- Section ID: The ID of the memory section the reference points to
- Index: An index to a cell in the specified memory section.
Values of user-defined types are simply contiguous ranges of cells, each potentially holding a value of a different types. If a user-defined type contains other user-defined types, these are simply expanded inside of it. For example, consider the following type:
TYPE Point
x AS INTEGER
y as INTEGER
End TYPE
TYPE Rect
topleft AS Point
bottomright as Point
END TYPE
A value of type Rect, simply contains four cells of type INTEGER. For reading a the bottomright.x
field of such a value, we simply read the third cell in the region of memory where the value is laid out.
Arrays are also contiguous ranges of memory, similar to user-defined types. The cells belonging to an array, however, all have the same type.
There are two types of arrays:
- $STATIC arrays are simply contiguous ranges of similarly-typed cells. They also have a header at the beginning denoting dimensions of the array.
- $DYNAMIC arrays are references to $STATIC arrays.
The header of an array contains the following cells (all have a type of LONG):
- Cell 0: Reserved. Always a value of zero.
- Cell 1: The number of the dimensions of an array.
- Cell 2: Element size for the array (in cells)
- Cell 3: The lower bound of the first dimension of the array
- Cell 4: The upper bound of the first dimension of the array
- Cell 5: The lower bound of the second dimension of the array
- Cell 6: The upper bound of the second dimension of the array
- ...
So the header size depends on the dimensions of the array. A one-dimensional array has a header of five cells, for example.
The VM contains the following memory sections:
- Code (section id=0): This is not an actual memory section, since it can't be referenced by any other reference. Attempting to read from this section or write to it would result in a fatal error. The code section does not contain normal cells. Instead it's a byte stream of VM instructions.
- Const (section id=1): Contains string literals. Can only be read from.
- Globals (section id=2): Contains global variables.
- Stack (section id=3): The machine stack used by most instructions to read operands from and write results to. The stack is special in that it's the only section from which data can be copied to other sections, or written to from other sections. There are no instructions to move data between other sections directly.
- Heap (section id=4): This section contains dynamically allocated arrays.
- Frames (section id=100..): Each SUB/FUNCTION call has a frame section created for it where arguments and local variables reside. The frame is destroyed when the routine returns and all dynamic arrays pointed to in it (by references) are freed.
QVM supports the following instructions. Each instruction consists of a one byte op-code, and zero, one, or more operands.
Pops a numeric value from the stack, calculates its absolute value and pushes it back onto the stack as the same type as input.
Pops two values from the stack, adds them together and pushes the result back onto the stack. The two values should be of the same type and the result will also be of this type.
Allocates a dynamic array with n_dims dimensions and put a reference to it onto the stack. The instruction reads n_dims pairs of (lbound, ubound) values from the stack (in reverse order), as well as element size (all LONGs)
stack before: ..., dim1_lbound, dim1_ubound, ..., dimn_lbound, dimn_ubound, element_size
Reads a reference to an element in an array. The operand (an unsigned 8-bit integer) is the number of indices. The reference to the array is expected to be on top of the stack and behind that will be the indices in reverse order:
stack before: idx1, idx2, ..., idxn, array_ref
stack after: element_ref
Before calculating the element reference and pushing it onto the stack, this instruction first performs array bounds checking.
The indices are expected to be of INTEGER or LONG types.
Pops two values from the stack, performs bitwise AND on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.
Reads a STRING from the stack and pushes the ascii code of its first character back onto the stack as an INTEGER. If the string is empty, a trap will occur.
CALL has a LONG operand, denoting where the routine to call is in the code. It pushes the current value of the IP register (pointing to the instruction after CALL) onto the stack as a LONG (32-bit integer) and then sets IP to the "address" operand.
Reads an INTEGER from the stack, interprets it as an ASCII code, and returns a STRING of length one containing the character with the given code.
Pops a numeric value from the stack, rounds it and pushes it back onto the stack as an INTEGER.
Pops a numeric value from the stack, rounds it and pushes it back onto the stack as a LONG.
Pops two values from the stack, compares them and pushes the result back onto the stack. The two values should be of the same type. For numeric operands, a numeric comparison is performed. If the first value is smaller than the second, -1% (-1 as a 16-bit integer) is pushed on the stack. If they are equal 0% is pushed and if the first is greater than the second, 1% is pushed. If the two values are STRINGs, then a lexicographic comparison is perform and -1%, 0%, or 1% is pushed onto the stack.
Performs a primitive type conversion. CONV is not a single instruction. It can take any of the following forms:
- conv%&
- conv%!
- conv%#
- conv&%
- conv&!
- conv&#
- conv#%
- conv#&
- conv#! The two characters after conv determine the source and destination types. For example, conv%# converts a 16-bit INTEGER value to a 64-bit float (DOUBLE) value.
The instruction pops its argument from the stack, performs conversion, and pushes the result back onto the stack.
Reads a reference from the stack, then reads the value of the cell the reference points to, and pushes the value back onto the stack. This function has five variants, one for each primitive type: DEREF%, DEREF&, DEREF!, DEREF#, and DEREF$. The purpose of having these variants is the same as having them for READ instructions: to auto-initialize variables when they are first read.
Pops two values from the stack, divides the first by the second, and pushes the result back onto the stack. The two values should be of the same type. If the values are of INTEGER, LONG, or SINGLE types, the result is a SINGLE. If the values are DOUBLEs, the result is a DOUBLE.
Duplicates the value on top of the stack.
Pops one value from the stack. This value MUST be an INTEGER. If it's 0%, -1% is pushed onto the stack, otherwise 0% is pushed.
Pops two values from the stack, performs bitwise EQV on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.
Pops two values from the stack, raises the first to the power of the second, and returns the results back onto the stack. The two values should be of the same type.
Creates and enters a new call frame. params_size
is an unsigned 16-bit integer that shows the number of cells the parameters to the sub-routines take up. These are popped from the stack and copied to the frame. vars_size
is another unsigned 16-bit int that denotes the size of local variables in cells. The call frame will have params_size + vars_size
cells and is linked to the caller frame.
Notice that, when FRAME is called, it is assumed that the return address (pushed by the CALL instruction) is on top of the stack. This value is pushed back onto the stack after reading the arguments.
Stack before: ..., arg1, arg2, ..., argn, return_address
Stack after: ..., return_address
Pops one value from the stack. This value MUST be an INTEGER. If it's 0% or 1%, -1% is pushed onto the stack, otherwise 0% is pushed.
Pops one value from the stack. This value MUST be an INTEGER. If it's 1%, -1% is pushed onto the stack, otherwise 0% is pushed.
Halts machine execution.
Pops two values from the stack, divides the first by the second, and pushes the result back onto the stack. The two values should be of the same type, and can only be of INTEGER or LONG types. The result is of the same type as the two values.
Initializes a global static array. var_idx is an unsigned short containing the variable index in the globals section. n_dims (an unsigned 8-bit integer) is the number of dimensions in the array, and element_size is a signed 32-bit integer containing the size of each element in the array in cells.
Initializes a local static array. var_idx is an unsigned short containing the variable index in the current call frame. n_dims (an unsigned 8-bit integer) is the number of dimensions in the array, and element_size is a signed 32-bit integer containing the size of each element in the array in cells.
Pops a LONG value from the stack and sets IP register to its value. This effectively performs an indirect jump. This is effectively similar to what the RET instruction does, except IJMP does not do anything with the current call frame.
Pops two values from the stack, performs bitwise IMP on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.
Pops a numeric value from the stack and pushes back a LONG that is the largest number less than or equal to that value.
Performs an IO operation. "subsystem%" and "operation%" values are 16-bit integers.
Performs an unconditional jump to the given address, that is, sets the IP register to the value of the "address&" operand.
Pops an INTEGER value from the stack. If it's 0%, it jumps to the address given by the operand, otherwise does nothing.
Pops an integer dim_idx and a reference from the stack, interprets the reference as an array, and reads the LBOUND of the dimension dim_idx of the array, and pushes it on the stack (as a LONG).
Stack before: ..., array_ref, dim_idx
Stack after: ..., lbound
dim_idx is one based. If it's not in the valid range for the array, an INDEX_OUT_OF_RANGE trap is raised.
Pops a STRING from the stack, converts it to lower case, and pushes back the result back on the stack.
Pops one value from the stack. This value MUST be an INTEGER. If it's -1% or 0%, -1% is pushed onto the stack, otherwise 0% is pushed.
Pops one value from the stack. This value MUST be an INTEGER. If it's -1%, -1% is pushed onto the stack, otherwise 0% is pushed.
Pops two values from the stack, divides the first by the second, and pushes the remainder back onto the stack. The two values should both be either INTEGER or LONG values. The result is of the same types as these values.
Pops two values from the stack, multiplies them by each other and pushes the result back onto the stack. The two values should be of the same type and the result will also be of this type.
Pops one value from the stack. This value MUST be an INTEGER. If it's 0%, -1% is pushed onto the stack, otherwise 0% is pushed.
Pops one value from the stack, negates it and pushes the result back onto the stack. The value must be numeric. The result will be of the same type as that value.
Does nothing.
Pops one value from the stack which should either be an INTEGER or a LONG. It performs bitwise NOT on this value and returns the result back onto the stack. The result will be of the same type as the value.
Reads a numeric value from the stack and pushes its STRING representation back onto the stack.
Pops two values from the stack, performs bitwise OR on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.
Pops a value from the stack and throws it away.
This instruction can be any of the following variants:
- PUSH% value%
- PUSH& value&
- PUSH! value!
- PUSH# value#
- PUSH$ value$ It pushes the value of the operand onto the stack. The string value will be a 32-bit INTEGER denoting the index of a string literal in the CONST section.
This instruction can be any of the following variants:
- PUSHM2%
- PUSHM2&
- PUSHM2!
- PUSHM2#
- PUSHM1%
- PUSHM1&
- PUSHM1!
- PUSHM1#
- PUSH0%
- PUSH0&
- PUSH0!
- PUSH0#
- PUSH1%
- PUSH1&
- PUSH1!
- PUSH1#
- PUSH2%
- PUSH2&
- PUSH2!
- PUSH2#
Each instruction has an intrinsic value and type, and causes the intrinsic value to be pushed onto the stack. The M2 and M1 variants denote negative values, so for example PUSHM1! pushes a value of -1! (-1 as a 32-bit float) onto the stack.
This instruction has two variants: PUSHREFL and PUSHREFG. These instructions push a reference to a given variable onto the stack.
This instruction has six variants: READG%, READG&, READG!, READG#, READG$, and READ@.
It reads the value of the given variable from globals section and pushes it onto the stack. "var" is a 16-bit unsigned integer, denoting the index of the cell in the globals section to read from.
If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.
This instruction has six variants: READIDXG%, READIDXG&, READIDXG!, READIDXG#, READIDXG$, and READ@.
It reads the value at index "idx" of the given global variable. This is useful for reading fields inside a struct.
If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.
This instruction has six variants: READIDXL%, READIDXL&, READIDXL!, READIDXL#, READIDXL$, and READIDXL@.
It reads the value at index "idx" of the given local variable. This is useful for reading fields inside a struct.
If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.
This instruction has six variants: READL%, READL&, READL!, READL#, READL$, and READ@.
It reads the value of the given variable from the current call frame and pushes it onto the stack. "var" is a 16-bit unsigned integer, denoting the index of the cell in the call frame to read from.
If the memory cell is null, the first five variants will initialize the cell with the zero/empty value corresponding to the instruction type, and return it. READ@ requires the cell already contains a reference. Attempting to read a reference from a null cell will cause a panic.
Reads a index from the stack and then a reference. The index must be an INTEGER or a LONG. It then adds the index to the reference and pushes the result back on the stack.
Performs a return from a SUB (so there is no return value). In particular, it performs the following operations:
- Frees all dynamic values (in the heap section) pointed to by references in the current call frame.
- Destroys the current frame section.
- Pops a LONG value from the stack.
- Sets the value of the IP register to the popped value.
Performs a return from a FUNCTION (so there is a return value). In particular, it performs the following operations:
- Frees all dynamic values (in the heap section) pointed to by references in the current call frame.
- Destroys the current frame section.
- Pops a value from the stack, which will be the return value of the function.
- Pops a LONG value from the stack.
- Sets the value of the IP register to the popped address.
- Pushes the return value onto the stack.
Reads a STRING from the stack, converts it to a number and pushes the results back onto the stack as a DOUBLE. White space is stripped from the beginning of the string, and conversion is stopped when we reach the first non-numeric character (or the end of the string).
Reads a numeric value from the stack, and pushes 1, 0, or -1 back onto the stack depending its sign. The type of the pushed value will be the same as the type of the read value.
Reads an INTEGER n from the stack and pushes a string consisting of n spaces back onto the stack.
Pops two values from the stack, subtracts the second from the first and pushes the result back onto the stack. The two values should be of the same type and the result will also be of this type.
Pops a value from the stack and stores it in the given global variable. var% is a 16-bit integer, denoting an index to a cell in the globals section to write to.
Pops a value from the stack and writes it at index "idx" of the given global variable. This is useful for writing to fields inside a struct.
Pops a value from the stack and writes it at index "idx" of the given local variable. This is useful for writing to fields inside a struct.
Pops a value from the stack and stores it in the given local variable. var% is a 16-bit integer, denoting an index to a cell in the current call frame to write to.
Reads a reference and a value from the stack and writes the value to the cell the reference points to.
Stack before: ..., value, reference
Stack after: ...
Looks for the occurrence of the string str2 in string str1, starting from index "start" in str1:
Stack before: ..., start, str1, str2
Stack after: ..., result_index
The "start" index and the returned index are both LONG and one-based. If the string is not found, 0 is returned.
Pops a STRING and an INTEGER n from the stack. Pushes back a new string containing the left n characters of the string.
Reads a STRING value from the stack, and pushes its length back onto the stack as a LONG.
Reads a STRING and an integer ("start") and either another integer or a long value ("length") from the stack, then it reads a sub-string starting at "start" with the given "length". If "length" is LONG, its value is ignored and a sub-string until the end of the string will be created. The result will be pushed back onto the stack.
Note: The start index is assumed to be 1-based.
Note: Non-positive values for start and length would cause an error.
Reads a STRING or ASCII code, as well as an integer n from the stack, and pushes back a string containing the n characters with the given ASCII code or the first character of the given string n times.
Stack before: ..., n, char
Stack after: ..., result
With char=65 and n=3, result will be "AAA". With char="XYZ" and n=3, result will be "XXX".
Pops a STRING and an INTEGER n from the stack. Pushes back a new string containing the right n characters of the string.
Swaps the two top values on the stack.
Stack before: ..., value1, value2
Stack after: ..., value2, value1
Swaps the third top most item on the stack with the second top most one.
Stack before: ..., value1, value2, value3
Stack after: ..., value2, value1, value3
Pops an integer dim_idx and a reference from the stack, interprets the reference as an array, and reads the UBOUND of the dimension dim_idx of the array, and pushes it on the stack (as a LONG).
Stack before: ..., array_ref, dim_idx
Stack after: ..., lbound
dim_idx is one based. If it's not in the valid range for the array, an INDEX_OUT_OF_RANGE trap is raised.
Pops a STRING from the stack, converts it to upper case, and pushes back the result back on the stack.
Pops two values from the stack, performs bitwise XOR on them, and pushes the result back onto the stack. The two values should both be of either type INTEGER or LONG and the result will be of the same type as these.