Jared's C Compiler : Build a mini C compiler(x86-64) from scratch.
This is a term-project for NCKU compiler-system course.
All the toolchain you need is GCC on Linux based system. Tested Platform: Ubuntu 16.04(x86-64)(gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) )
git clone https://github.com/JaredCJR/JCC
cd JCC
Then,jump to the specific Phase
to see the guide.
- This is a Token Colorer,it will color the tokens in the target source file,and print in the terminal colorfully.
- Using the lexical analysis to identify tokens for coloring.
- How to demo:
git checkout DEMO_LEX
make clean
make demo
- This is a Syntax Indenter,it will indent the tokens with basic functionalities and Token Colorer.
- Using syntax analysis to check basic grammars , based on
Token Colorer(Phase 1)
.- Indent the code after the end of grammar for demonstrating the
Syntax Analysis
- Indent the code after the end of grammar for demonstrating the
- How to demo:
git checkout DEMO_SYNTAX
make clean
make demo
- This will scan the source files and record the corresponding information into the
symbol table
.Symbol table
contains two stack ->global_sym_stack
andlocal_sym_stack
- The correspoind behavior of the semantics can see the comments inside the source files in the syntax dir and the demo_result.txt,which explains elaborately!
- You can see the explaination in demo_result.txt after
<=================
- JCC's
semantic analysis
is much powerful than the JCC'scode generation
.This is because the time limitted during the semester.- If you would like to know more details,you should trace the stack with
gdb
,the pre-defined gdb tracing srcipt is ready.
- If you would like to know more details,you should trace the stack with
- You can see the explaination in demo_result.txt after
- How to demo:
git checkout DEMO_SEMANTIC
make clean
make demo
- This picture shows part of the demo_result.txt.
- JCC use GDB as the tool for this auto-demo.
- GDB help us to check the
symbol table
without messing up the code! - GDB do the automatic test based on the GDB script
- GDB help us to check the
IMPORTANT
- If you would like to see the stack change on your own source code,the local stack will be destroied after the
index
leavelocal scope
.- We only have one
local stack
forsymbol table
,so thelocal stack
will no longer exist after theindex
move outside the corresponding scope.
- We only have one
JCC does not support the fully function as the lexical and semantic phases
in code generation due to time limitted.
What JCC is able to do,please refer to the next chapter.
- How to demo:
git checkout DEMO_CodeGen
make clean
make demo
./demo/demo_code
- JCC uses a fibonacci sequence with iterative method as the demo code to show what JCC is able to do!
- This picture only shows part of the results.
- Tokens:
- Arithmetic Operators
+
-
*
/
%
- Relation Oerators
==
!=
<
<=
>
>=
- Assignment Operators
=
- Pointer Operators
->
.
&
- Others
(
and)
[
and]
{
and}
;
,
...
- Arithmetic Operators
- Keywords:
- Constants
tk_cINT
(number, ex:999)tk_cCHAR
(single character, ex:'X')tk_cSTR
(characters, ex:This is JCC\n
)
- keywords
int
char
void
struct
if
else
for
continue
break
sizeof
- user defined
- Ex:
int user_ident;
(user_ident
will be added.)
- Ex:
- Constants
IMPORTANT:Not all the tokens will be dealed in the Phase 2 to Phase 4
- JCC divides
Syntax Analysis into 3 sub-phase
Declaration
Expression
Statement
- All the implementing details can be saw in the source code.
Declaration
- In JCC,
- All of the program of JCC must start with declaration.
- Ex:
int x = 10;
- Ex:
int add(int x,int y);
- Ex:
add(1,2);
- Ex:
void main(){ }
- The
{ }
in above example,meaningcompound statement
,and you could usestatement
inside.
- The
- Ex:
- All of the program of JCC must start with declaration.
* <translation_unit>::={<external_declaration>}<tk_EOF>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <external_declaration>::=<function_definition> | <declarator>
* <function_definition>::=<type_specifier><declarator><funcbody>
* <declarator>::=<type_specifier><tk_SEMICOLON> | <type_specifier><init_declarator_list><tk_SEMICOLON>
* <init_declarator_list>::=<init_declarator>{<tk_COMMA><init_declarator>}
* <init_declarator>::=<declarator>{<tk_ASSIGN><initializer>}
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <type_specifier>::=<kw_INT> |
* <kw_CHAR> |
* <kw_VOID> |
* <struct_specifier>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <struct_specifier>::=<kw_STRUCT><IDENTIFIER><tk_BEGIN><struct_declaration_list><tk_END>
* | <kw_STRUCT><IDENTIFIER>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <struct_declaration_list>::=<struct_declaration>{<struct_declaration>}
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <struct_declaration>::=<type_specifier><struct_declaration_list><tk_SEMICOLON>
* <struct_declaration_list>::=<declarator>{<tk_COMMA><declarator>}
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <declarator>::={<pointer>}<direct_declarator>
* <pointer>::=<tk_STAR>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <direct_declarator>::=<IDENTIFIER><direct_declarator_postfix>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <direct_declarator_postfix>::={<tk_openBR><tk_cINT><tk_closeBR>
* | <tk_openBR><tk_closeBR>
* | <tk_openPA><parameter_type_list><tk_closePA>
* | <tk_openPA><tk_closePA>
* }
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <parameter_type_list>::=<parameter_list> | <parameter_list><tk_COMMA><tk_ELLIPSIS>
* <parameter_list>::<parameter_declaration>{<tk_COMMA><parameter_declaration>}
* <parameter_declaration>::=<type_specifier>{<declarator>}
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <funcbody>::=<compound_statement>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <initializer>::=<assignment_expression>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Expression
- In JCC,
Expression
help us estabilish the relationship betweentokens
.- Ex:
a >= b
- Ex:
X.member = 2;
- Ex:
- Help us support the arthimetic rules:
multiplication and division first,then addition and substration
- Ex:
int x = 0-6*6;
x is equal to -36
- Ex:
- Pass arguments
- Ex:
add(x,y,z);
(x,y,z)
is dealed here.
- Ex:
- We do not support bitwise operation.
* <expression>::=<assignment_expression>{<tk_COMMA><assignment_expression>}
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <assignment_expression>::=<equality_expression>
* | <unary_expression><tk_ASSIGN><assignment_expression>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <equality_expression>::=<relational_expression>
* { <tk_EQ><relational_expression>
* | <tk_NEQ><relational_expression>
* }
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <relational_expression>::=<additive_expression>
* {
* <tk_LT><additive_expression>
* | <tk_GT><additive_expression>
* | <tk_LEQ><additive_expression>
* | <tk_GEQ><additive_expression>
* }
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <additive_expression>::=<multiplicative_expression>
* {
* <tk_PLUS><multiplicative_expression>
* | <tk_MINUS><multiplicative_expression>
* }
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <multiplicative_expression>::=<unary_expression>
* {
* <tk_STAR><unary_expression>
* | <tk_DIVIDE><unary_expression>
* | <tk_MOD><unary_expression>
* }
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <unary_expression>::=<postfix_expression>
* | <tk_AND><unary_expression>
* | <tk_STAR><unary_expression>
* | <tk_PLUS><unary_expression>
* | <tk_MINUS><unary_expression>
* | <sizeof_expression>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <sizeof_expression>::=<kw_SIZEOF><tk_openPA><type_specifier><tk_closePA>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <postfix_expression>::=<primary_expression>
* {
* <tk_openBR><expression><tk_closeBR>
* | <tk_openPA><tk_closePA>
* | <tk_openPA><argument_expression_list><tk_closePA>
* | <tk_DOT><IDENTIFIER>
* | <tk_POINTTO><IDENTIFIER>
* }
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <primary_expression>::=<IDENTIFIER>
* | <tk_cINT>
* | <tk_cSTR>
* | <tk_cCHAR>
* | <tk_openPA><expression><tk_closePA>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <argument_expression_list>::=<assignment_expression>{<tk_COMMA><assignment_expression>}
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Statement
- In JCC,
statement
help thekeywords
to be known.- Ex:
if(a == c){ }
if
is identified here.
- Ex:
return add(a,b);
return
is identified here.
- Ex:
* <statement>::=<compound_statement>
* | <if_statement>
* | <return_statement>
* | <break_statement>
* | <continue_statement>
* | <for_statement>
* | <expression_statement>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <compound_statement>::=<tk_BEGIN>{<declaration>}{<statement>}<tk_END>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <expression_statement>::=<tk_SEMICOLON>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <if_statement>::=<kw_IF><tk_openPA><expression><tk_closePA><statement>[<kw_ELSE><statement>]
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <for_statement>::=
* <kw_FOR><tk_openPA><expression_statement><expression_statement><expression><tk_closePA><statement>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <continue_statement>::=<kw_CONTINUE><tk_SEMICOLON>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <break_statement>::=<kw_BREAK><tk_SEMICOLON>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
* <return_statement>::=<kw_RETURN><tk_SEMICOLON>
* | <kw_RETURN><expression><tk_SEMICOLON>
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
- The correspoind behavior of the semantics can see the comments inside the source files in the syntax dir and the demo_result.txt,which explains elaborately!
- You can see the explaination in demo_result.txt after
<=================
- You can see the explaination in demo_result.txt after
-
As the above says,JCC's
code generation
does not support the full functionalities as the previous phases. Supported semantic code generation: -
Variable
supporting:global variable
local variable
- including
int
char
- special
char *
- Ex:
char* string = "TEST string"
- YOU SHOULD NOT MODIFY THE
char *
VARIABLE!- Ex: The behavior of
string = "abc"
is unpredictable.(string
declaration is as the previous example)
- Ex: The behavior of
- Ex:
- including
-
if(expression)
-
if(expression){ statement }else{ statement }
- Supporting recursive
if
orif else
.
- Supporting recursive
-
for(expression;expression;expression)
- Supporting recursive
for
- Supporting recursive
-
declaration
always beforestatement
- declaration
- JCC does not check type for variable assignment,if and only if there is LValue at the left side.
- declaration
-
I/O:
scanf
andprintf
- use
printf
as the C89 <stdio.h> usage. - use
scanf
as the C99 <stdio.h> usage.- Only support at most
4 arguments
. - Need to do:
- passing
return value
from either I/O function.
- passing
- Only support at most
- use
-
Arthimatic
+
and-
can be use togetherEx: int a = 5+100-10000;
Ex: a = b+c-100;
- if
b
andc
are already declared.
- if
*
,/
and%
can be use togetherEx: int a = 1000*5%79
Ex: a = b*c%100;
- if
b
andc
are already declared.
- if
-
Function Call
- Only support at most
4 arguments
. - if function return type is
int
orchar
- return value is accepted in limitted sutuation.
Ex:a = test();
- However,JCC does not support mutiple function call arthimatic.
Ex:a = test() + test(); will cause error!
- However,JCC does not support mutiple function call arthimatic.
return
inif
orfor
will not work.- Therefore,
recursive function call
is imposible in JCC.
- Therefore,
- return value is accepted in limitted sutuation.
- JCC does not support
expression
as parameter.Ex:add(1+2,3); will cause error!
- Only support at most
-
See demo_code.c to know the real situation!
- JCC currently does not support mixing
+
or-
with*
or/
or%
.- If you would like to support this functionality,all you need is to provide temporary registers or creat a space in local stack to store the temporary value.
- JCC currently does not support mixing
- If you would like to print somee information in you terminal
- If you would like to use GNU debugger with JCC
make demo_gdb
- It will call
gdb
orgdbtui
with JCC to load the default demo source file. - You can write a
gdb script
forgdb
family,such as the branchDEMO_SEMANTIC
does.
- It will call
- Jia-Rung Chang(張家榮/Jared)
- email: [email protected]
- Education
- 2016/09 ~ future
- Master in Department of Computer Science,National Chiao Tung University,Taiwan
- 2012/09 ~ 2016/06
- Bachelor in Department of Engineering Science,National Cheng Kung University,Taiwan
- 2016/09 ~ future