A simple compiler for a simple imperative language targeting x86-64 assembly (for systems that follow the System V ABI, so most operating systems except Windows).
The compiler relies on OCaml and the ocamlfind library. It is tested with OCaml
versions 4.02.3 and 4.04.0, and ocamlfind 1.5.6 and 1.7.1. Ocamlfind can be
installed via opam. If you have installed anything via Opam (for example,
Merlin, or ocp-indent), ocamlfind is probably installed already. Opam installs
everything into .opam
in your home directory:
opam install ocamlfind
To compile the compiler run make
in the src
directory. This should produce
compile.byte
and interp.byte
executables. Both take a single command-line
argument: a source file name with the .expl
extension. interp.byte
runs the
file, compile.byte
compiles it, generating an x86-46 assembly .s
file in
nasm syntax.
First run make
in the runtime
directory to compile the very simple runtime
library (using gcc).
On Linux, use nasm -f elf64 FILENAME.s
to assemble the compiler's output for
FILENAME. On Mac, use nasm -f macho64 --prefix _ FILENAME.s
. Then use gcc COMPILER_DIR/runtime/io.o FILENAME.o -o FILENAME
to link the program with the
runtime library.
See the tests
directory for some example programs.
Keywords are + - * / | & << < > = || && ! := do while if then else input output true false array return function let bool int
Identifiers are strings of letters, underscores, and digits (not starting with a digit) that are not keywords.
Numbers are sequences of digits that fit into a 64-bit signed integer.
Comments start with '//' and last until the end of the line.
All numerical operations are on signed, 2s complement 64-bit integers.
Multi-dimensional arrays are supported. Array elements are 64-bit integers.
op ::=
| +
--- Addition
| -
--- Subtraction
| *
--- Multiplication
| /
--- Division
| |
--- Bitwise or
| &
--- Bitwise and
| <<
--- Left shift
| <
--- Less than
| >
--- Greater than
| =
--- Equality
| ||
--- Logical or
| &&
--- Logical and
uop ::=
| !
--- Logical negation
| -
--- Unary minus
indices ::=
| [
exp ]
indices
| epsilon
args ::=
| exp
| exp ,
args
atomic_exp ::=
| identifier indices --- Variable use and array indexing
| identifier (
args )
--- Function call
| number --- Integer constant
| true
--- Boolean constant
| false
--- Boolean constant
| uop atomic_exp --- Unary operation
| array
indices --- Array allocation
| (
exp )
--- Parenthesised expression
exp ::=
| atomic_exp op atomic_exp --- Binary operation
| atomic_exp
stmt ::=
| identifier indices :=
exp
| while
exp stmt
| do
stmt while
exp
| if
exp then
stmt else
stmt
| {
stmts }
| input
identifier
| output
identifier
| return
identifier
stmts ::=
| epsilon
| stmt stmts
typ ::=
| int
--- a 64-bit signed integer
| bool
--- a boolean
| array
number --- an n dimensional array of 64-bit signed integers
params ::=
| (
identifier :
type )
| (
identifier :
type )
params
var_decs ::=
| epsilon
| let
identifier :
typ =
exp var_decs
functions ::=
| epsilon
| function
identifier params :
typ {
var_decs stmts }
funcs
program ::=
| var_decs functions
First build the compiler, by running make from the src directory. In utop load
the packages that the compiler uses with the #require
command:
#require "str";;
You can add this line to the .ocamlinit
file in your home directory, so that
you don't have to manually enter it each time you start a new utop session.
The contents of .ocamlinit
are run each time you start a new utop.
The OCaml compilation manager (ocamlbuild) stores all of the compiled OCaml
sources in the _build
directory, with the extension .cmo
. The following
tells utop to look there for source files.
#directory "_build";;
To load a particular module, for example, LineariseCfg, use the #load_rec
command.
#load_rec "lineariseCfg.cmo";;
You can then open the module if you want (open LineariseCfg
), or call
functions directly (LineariseCfg.cfg_to_linear
). Loading a compiled module in
this way only gives you access to the values and functions that are exported in
the corresponding .mli
file. Often a good way to work on a file is to
#load_rec
all of the modules that it depends on, and then #use
the file.