Your program should accept C source code on
stdin
and write a heirarchical representation on stdout.
The input format and constraints are the same as for the tokeniser, with the following restrictions:
-
the only data-type that will occur is
int
(so no arrays, structs, floats, void) -
the only control-flow statements are
if
,while
andfor
-
all functions are definitions (i.e. there are no declarations)
These restrictions do not apply to the eventual compiler, they are simply to make this stage more manageable as a milestone. You are encouraged to support these things as well, but the intent here is to get a core set of functionality fully working.
The input format is still C90, and it is worth noting a difference between the C90 ("classic) and C99 ("modern") languages, which is that declarations can only appear at the top of a scope. So all variables declarations will appear before any statements. See the later section for more discussion
The output format will be a heirarchical format, which identifies the functions, scopes, and variables.
For each function declaration the program should print:
FUNCTION : function_name
then for each parameter it should print (with an indent of four spaces):
PARAMETER : parameter_name
At the start of each scope the program should print:
SCOPE
and increase the indentation by four spaces.
For each declared variable in a scope the program should print:
VARIABLE : variable_name
As a simple example, the following function:
int f( int x)
{
int y=x;
return y;
}
should result in:
FUNCTION : f
PARAMETER : x
SCOPE
VARIABLE : y
This code:
int x()
{}
int g;
int zz(int a, int b, int c)
{
if(a==b)
return z;
else{
int fsdfsdfs;
return c;
}
}
would result in:
FUNCTION : x
SCOPE
VARIABLE : g
FUNCTION : zz
PARAMETER : a
PARAMETER : b
PARAMETER : c
SCOPE
SCOPE
VARIABLE : fsdfsdfs
From your top-level directory, doing:
make bin/c_parser
should create a programme called... bin/c_parser
.
In C99 (like C++) you are allowed to interleave declarations and statements:
int f(int i)
{
int x=i+5;
x=x*x;
int y=x+4;
return y;
}
In C90, you must have all declarations before any statements, so that would need to be re-written as:
int f(int i)
{
int x=i+5;
int y;
x=x*x;
y=x+4;
return y;
}
Note that declarations can happen at the start of any scope, so things like:
int f(int i)
{
int x=i+5;
x=x*x;
if(1){
int y=x+4;
return y;
}
}
Thinking about it with your knowledge of parsers and compilers, you might be able to guess why the older language is more restricted, while the newer one relaxes it. You also might guess why I deliberately chose the older syntax for you to implement.
It's worth noting that beyond compiler implementation, there are many arguments for and against both styles. An argument for requiring C90 style declarations is that you have immediate visibility of the total number of variables in a scope. This gives you a sense of the complexity of a function, as you can see how many moving parts there are. For example, Linus Torvalds likes that aspect, as the kernel is an extremely large and complex code-base. There are also rules in the MISRA-C guidelines for automotive and embedded software that state that variables should appear at function scope.
One argument for the newer style is convenience, but also the idea that that it is better to declare a variable at the point that it is initialised. A common error in C is to have a variable which is not initialised on all code paths:
{
int i;
// Much code
if(getc()){
i=1;
}
// So codey
putc(i+1);
}
Modern compilers will put out a warning about i
potentially
being used before being initialised, but people may just
ignore that. The move towards more functional and declarative
styles has also encouraged this process - for example, the
auto
keyword in C++11.