Skip to content

Commit

Permalink
Merge pull request #4 from gabrielzezze/code-generation
Browse files Browse the repository at this point in the history
Code generation
  • Loading branch information
gabrielzezze authored Jun 3, 2021
2 parents eee7d50 + 639f903 commit f2ef23f
Show file tree
Hide file tree
Showing 30 changed files with 579 additions and 245 deletions.
Empty file added Makefile
Empty file.
85 changes: 83 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,86 @@
# z-lang
## Gabriel Zezze
## [email protected]
<br></br>

### How to run:
## Contexto
A z-lang possui este nome graças ao sobrenome de seu criador (Zezze) e assim possui uma predominância de palavras com a letra “z” no início ou final dos símbolos.
<br></br>

## Inspirações e diferenciais
A linguagem possui como maior inspiração a linguagem C mas traz elementos de Python como “and”, “or”, por fim traz suas peculiaridades nos operadores de relação escritos por extenso como por exemplo o operador de igualdade ("is equal to") e declaração de variáveis.
<br></br>

## EBNF
```
MAIN = FUNC;
STRING = "'", [ALPHACHAR], "'";
ALPHACHAR = (a | b | ... | z | A | B | ... | Z);
NUMBER = NUMERIC, { NUMERIC };
NUMERIC = (1 | 2| 3 | 4 | 5 | 6 | 7 | 8 | 9);
IDENTIFIER = ALPHACHAR, { ALPHACHAR | NUMERIC | "_" };
EOL = ".z";
TYPE = "numz" | "bool" | "charz"
FUNC = TYPE "func", IDENTIFIER, "(", { ARG }, ")", BLOCK, EOL;
ARG = TYPE, IDENTIFIER;
BLOCK = "{", [ACTION], "}";
ACTION = ( PRINT | ASSIGNMENT | WHILE | FUNC_CALL | CONDITION | RETURN )
PRINT = "zPrint", "(", (EXPRESION | COMPARISON), ")", EOL;
ASSIGNMENT = "var", IDENTIFIER, "->", TYPE, "=", (EXPRESSION | COMPARISON | NUMBER | STRING), EOL;
WHILE = "While", "(", COMPARISON, ")", BLOCK, EOL;
CONDITION = "If", ( ("(", COMPARISON, ")") | COMPARISON ), BLOCK, ( ELIF | ELSE | EOL );
ELIF = "Elif", "(", COMPARISSON, ")", BLOCK, ( ELIF | ELSE | EOL );
ELSE = "Else", BLOCK, EOL;
FUNC_CALL = IDENTIFIER, "(", (EXPRESSION), {",", (EXPRESSION)}, ")", EOL;
RETURN = "return", (EXPRESSION | COMPARISON), EOL;
EXPRESSION = TERM, { ("+" | "-"), TERM } ;
TERM = FACTOR, { ("*" | "/"), FACTOR } ;
FACTOR = (("+" | "-"), FACTOR) | NUMBER | "(", EXPRESSION, ")" | IDENTIFIER | "true" | "false ;
COMPARISON = EXPRESSION, ("is_equal_to", "is_not_equal_to", "is_greater_than", "is_greater_equal_than", "is_lesser_than", "is_lesser_equal_than", "or", "and"), EXPRESSION;
```
<br></br>

## Processo de compilação
O compilador da z-lang foi feito em Python usando o pacote [sly](https://sly.readthedocs.io/en/latest/sly.html) e [llvmlite](https://pypi.org/project/llvmlite/).
O pacote sly foi usado por seus módulos de Tokenização e Parser para criar o zTokenizer e zParser, os quais respectivamente fazem a tokenização e parsing com criação da AST.

De posse da AST após o processo de parsing, é feito o processo de avaliação desta AST o qual durante é usado o pacote llvmlite para ir adicionando instruções ao arquivo “output.ll” que ao fim possui um módulo llvm pronto para ser compilado para um arquivo “.o” usando o comando “ll” o qual finalmente pode ser compilado para um executável usando qualquer compilador de C ([clang](https://clang.llvm.org/) é recomendado).
<br></br>

## Execução de código .z

Para executar um código “.z” é necessário compila lo usando o compilador da z-lang, para executar o compilador será necessário usar o Pipenv disponível neste repositório e executar os seguintes comandos:

### Compilando código .z
```
pipenv install
pipenv run python main.py <caminho_para_arquivo_.z>
```

### Gerando o arquivo a partir do módulo llvm
```
llc -filetype=obj out/output.ll
gcc out/output.o -o out/output
```

### Execução do executável
```
./out/output
```

pipenv run python main "__path_to_.z_file_"
File renamed without changes.
151 changes: 151 additions & 0 deletions debug/test_tokens.out
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
Token(type='INT', value='numz', lineno=1, index=0)
Token(type='FUNCTION', value='func', lineno=1, index=5)
Token(type='IDENTIFIER', value='add_2', lineno=1, index=10)
Token(type='PARENTHESIS_OPEN', value='(', lineno=1, index=15)
Token(type='INT', value='numz', lineno=1, index=16)
Token(type='IDENTIFIER', value='y', lineno=1, index=21)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=1, index=22)
Token(type='BLOCK_START', value='{', lineno=1, index=24)
Token(type='PRINT', value='zPrint', lineno=2, index=30)
Token(type='PARENTHESIS_OPEN', value='(', lineno=2, index=36)
Token(type='IDENTIFIER', value='y', lineno=2, index=37)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=2, index=38)
Token(type='EOL', value='.z', lineno=2, index=40)
Token(type='RETURN', value='return', lineno=3, index=47)
Token(type='IDENTIFIER', value='y', lineno=3, index=54)
Token(type='PLUS', value='+', lineno=3, index=55)
Token(type='NUMBER', value='2', lineno=3, index=56)
Token(type='EOL', value='.z', lineno=3, index=57)
Token(type='BLOCK_END', value='}', lineno=4, index=60)
Token(type='EOL', value='.z', lineno=4, index=62)
Token(type='INT', value='numz', lineno=6, index=66)
Token(type='FUNCTION', value='func', lineno=6, index=71)
Token(type='IDENTIFIER', value='sub', lineno=6, index=76)
Token(type='PARENTHESIS_OPEN', value='(', lineno=6, index=79)
Token(type='INT', value='numz', lineno=6, index=80)
Token(type='IDENTIFIER', value='h', lineno=6, index=85)
Token(type='INT', value='numz', lineno=6, index=88)
Token(type='IDENTIFIER', value='e', lineno=6, index=93)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=6, index=94)
Token(type='BLOCK_START', value='{', lineno=6, index=96)
Token(type='RETURN', value='return', lineno=7, index=102)
Token(type='IDENTIFIER', value='h', lineno=7, index=109)
Token(type='MINUS', value='-', lineno=7, index=110)
Token(type='IDENTIFIER', value='e', lineno=7, index=111)
Token(type='EOL', value='.z', lineno=7, index=113)
Token(type='BLOCK_END', value='}', lineno=8, index=116)
Token(type='EOL', value='.z', lineno=8, index=118)
Token(type='INT', value='numz', lineno=10, index=122)
Token(type='FUNCTION', value='func', lineno=10, index=127)
Token(type='IDENTIFIER', value='main', lineno=10, index=132)
Token(type='PARENTHESIS_OPEN', value='(', lineno=10, index=136)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=10, index=137)
Token(type='BLOCK_START', value='{', lineno=10, index=139)
Token(type='VAR', value='var', lineno=12, index=150)
Token(type='IDENTIFIER', value='x', lineno=12, index=154)
Token(type='ARROW', value='->', lineno=12, index=156)
Token(type='INT', value='numz', lineno=12, index=159)
Token(type='ASSIGNMENT', value='=', lineno=12, index=164)
Token(type='NUMBER', value='2', lineno=12, index=166)
Token(type='EOL', value='.z', lineno=12, index=168)
Token(type='PRINT', value='zPrint', lineno=14, index=176)
Token(type='PARENTHESIS_OPEN', value='(', lineno=14, index=182)
Token(type='STRING', value="'Pre While'", lineno=14, index=183)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=14, index=194)
Token(type='EOL', value='.z', lineno=14, index=196)
Token(type='WHILE', value='While', lineno=15, index=203)
Token(type='PARENTHESIS_OPEN', value='(', lineno=15, index=208)
Token(type='IDENTIFIER', value='x', lineno=15, index=209)
Token(type='LESSER', value='is_lesser_than', lineno=15, index=211)
Token(type='NUMBER', value='6', lineno=15, index=226)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=15, index=227)
Token(type='BLOCK_START', value='{', lineno=15, index=229)
Token(type='IDENTIFIER', value='x', lineno=16, index=239)
Token(type='ASSIGNMENT', value='=', lineno=16, index=241)
Token(type='IDENTIFIER', value='add_2', lineno=16, index=243)
Token(type='PARENTHESIS_OPEN', value='(', lineno=16, index=248)
Token(type='IDENTIFIER', value='x', lineno=16, index=249)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=16, index=250)
Token(type='EOL', value='.z', lineno=16, index=252)
Token(type='BLOCK_END', value='}', lineno=17, index=259)
Token(type='EOL', value='.z', lineno=17, index=261)
Token(type='PRINT', value='zPrint', lineno=19, index=269)
Token(type='PARENTHESIS_OPEN', value='(', lineno=19, index=275)
Token(type='IDENTIFIER', value='add_2', lineno=19, index=276)
Token(type='PARENTHESIS_OPEN', value='(', lineno=19, index=281)
Token(type='IDENTIFIER', value='x', lineno=19, index=282)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=19, index=283)
Token(type='MINUS', value='-', lineno=19, index=284)
Token(type='NUMBER', value='2', lineno=19, index=285)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=19, index=286)
Token(type='EOL', value='.z', lineno=19, index=288)
Token(type='IF', value='If', lineno=21, index=300)
Token(type='PARENTHESIS_OPEN', value='(', lineno=21, index=303)
Token(type='IDENTIFIER', value='x', lineno=21, index=304)
Token(type='EQUAL', value='is_equal_to', lineno=21, index=306)
Token(type='NUMBER', value='5', lineno=21, index=318)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=21, index=319)
Token(type='BLOCK_START', value='{', lineno=21, index=321)
Token(type='PRINT', value='zPrint', lineno=22, index=331)
Token(type='PARENTHESIS_OPEN', value='(', lineno=22, index=337)
Token(type='NUMBER', value='10', lineno=22, index=338)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=22, index=340)
Token(type='EOL', value='.z', lineno=22, index=342)
Token(type='BLOCK_END', value='}', lineno=23, index=349)
Token(type='ELIF', value='Elif', lineno=24, index=355)
Token(type='PARENTHESIS_OPEN', value='(', lineno=24, index=360)
Token(type='IDENTIFIER', value='x', lineno=24, index=361)
Token(type='GREATER', value='is_greater_than', lineno=24, index=363)
Token(type='IDENTIFIER', value='add_2', lineno=24, index=379)
Token(type='PARENTHESIS_OPEN', value='(', lineno=24, index=384)
Token(type='IDENTIFIER', value='x', lineno=24, index=385)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=24, index=386)
Token(type='MINUS', value='-', lineno=24, index=387)
Token(type='NUMBER', value='2', lineno=24, index=388)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=24, index=389)
Token(type='BLOCK_START', value='{', lineno=24, index=391)
Token(type='PRINT', value='zPrint', lineno=25, index=401)
Token(type='PARENTHESIS_OPEN', value='(', lineno=25, index=407)
Token(type='NUMBER', value='12', lineno=25, index=408)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=25, index=410)
Token(type='EOL', value='.z', lineno=25, index=412)
Token(type='BLOCK_END', value='}', lineno=26, index=419)
Token(type='ELSE', value='Else', lineno=27, index=425)
Token(type='BLOCK_START', value='{', lineno=27, index=430)
Token(type='PRINT', value='zPrint', lineno=28, index=440)
Token(type='PARENTHESIS_OPEN', value='(', lineno=28, index=446)
Token(type='NUMBER', value='11', lineno=28, index=447)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=28, index=449)
Token(type='EOL', value='.z', lineno=28, index=451)
Token(type='BLOCK_END', value='}', lineno=29, index=458)
Token(type='EOL', value='.z', lineno=29, index=460)
Token(type='VAR', value='var', lineno=31, index=468)
Token(type='IDENTIFIER', value='y', lineno=31, index=472)
Token(type='ARROW', value='->', lineno=31, index=474)
Token(type='STRING_TYPE', value='charz', lineno=31, index=477)
Token(type='ASSIGNMENT', value='=', lineno=31, index=483)
Token(type='STRING', value="'AQUI'", lineno=31, index=485)
Token(type='EOL', value='.z', lineno=31, index=492)
Token(type='IDENTIFIER', value='y', lineno=33, index=500)
Token(type='ASSIGNMENT', value='=', lineno=33, index=502)
Token(type='STRING', value="'AQUI 2'", lineno=33, index=504)
Token(type='EOL', value='.z', lineno=33, index=513)
Token(type='PRINT', value='zPrint', lineno=35, index=521)
Token(type='PARENTHESIS_OPEN', value='(', lineno=35, index=527)
Token(type='IDENTIFIER', value='x', lineno=35, index=528)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=35, index=529)
Token(type='EOL', value='.z', lineno=35, index=531)
Token(type='PRINT', value='zPrint', lineno=37, index=539)
Token(type='PARENTHESIS_OPEN', value='(', lineno=37, index=545)
Token(type='IDENTIFIER', value='sub', lineno=37, index=546)
Token(type='PARENTHESIS_OPEN', value='(', lineno=37, index=549)
Token(type='IDENTIFIER', value='x', lineno=37, index=550)
Token(type='IDENTIFIER', value='x', lineno=37, index=553)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=37, index=554)
Token(type='PARENTHESIS_CLOSE', value=')', lineno=37, index=555)
Token(type='EOL', value='.z', lineno=37, index=557)
Token(type='RETURN', value='return', lineno=39, index=565)
Token(type='NUMBER', value='0', lineno=39, index=572)
Token(type='EOL', value='.z', lineno=39, index=574)
Token(type='BLOCK_END', value='}', lineno=40, index=577)
Token(type='EOL', value='.z', lineno=40, index=579)
35 changes: 26 additions & 9 deletions main.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
import sys
from src.Node import Node
from src.Types.TokenTypes import TokenTypes
from src.SymbolTable import SymbolTable
from src.Parser import ZParser
from src.Tokenizer import ZTokenizer
from src.Codegen.CodeGen import CodeGen
from llvmlite import ir


if __name__ == '__main__':

Expand All @@ -22,16 +25,33 @@
parser = ZParser()
codegen = CodeGen()


# Get tokens from file
tokens = lexer.tokenize(file_content)
with open('test_tokens.out', 'w') as tmp:
with open('debug/test_tokens.out', 'w') as tmp:
for token in tokens:
tmp.write(str(token) + '\n')

# Tokenize and parse language
tokens = lexer.tokenize(file_content)
root = parser.parse(tokens)

# Codegen vars
module = codegen.module
builder = codegen.builder
printf = codegen.printf

Node.module = module
Node.builder = builder
Node.printf = printf

fmt = "%i \n\0"
c_fmt = ir.Constant(ir.ArrayType(ir.IntType(8), len(fmt)), bytearray(fmt.encode("utf8")))
Node.global_fmt = ir.GlobalVariable(module, c_fmt.type, name="fstr")
Node.global_fmt.linkage = 'internal'
Node.global_fmt.global_constant = True
Node.global_fmt.initializer = c_fmt

# Set functions declarations to symbol table
symbol_table = SymbolTable()
root.Evaluate(symbol_table)
Expand All @@ -41,12 +61,9 @@
main_func_node = main_func.get("value", None)
if main_func_node is None:
raise ValueError('main function was not defined')
returned_data = main_func_node.statements.Evaluate(symbol_table)

# Check main function returned data type
if returned_data is None:
raise ValueError('main function did not return an int')

returned_type = returned_data[0]
if returned_type != TokenTypes.INT:
raise ValueError('main did not return an int')
symbol_table.functions['main']['pointer'] = codegen.base_func
main_func_node.statements.Evaluate(symbol_table)

codegen.create_ir()
codegen.save_ir('out/output.ll')
Binary file added out/output
Binary file not shown.
Loading

0 comments on commit f2ef23f

Please sign in to comment.