-
Notifications
You must be signed in to change notification settings - Fork 35
Implementing a BNF parser
Per the grammar production rules, csly will generate a concrete syntax tree. We have already defined nodes in csly using the SyntaxNode
type. Nodes can be terminal or non-terminal. Terminal nodes are referred to as leaves.
Grammar rules and visitor methods are defined in a single C# class, the expression parser. Each method that is defined in the clas is associated with a production rule. See grammar definition below for further details.
You can retrieve the whole CST in a number of ways. Once the CST is built, csly provides a way to visit each node programmatically using LINQ syntax. One way to examine the whole tree is by calling the SyntaxTree
property from the ParseResult
object and examining the object with your debugger. SyntaxTree
is implemented as a single root SyntaxNode
which can be traversed by iterating the Children
property. It is also possible to generate a graphical tree representation with a number of tools. See View syntax tree for an example using VizGraph.
The grammar defining the parser is defined using C# attribute [Production("some grammar rule")]
mapped to methods
The rules follow the classical BNF notation : `
<rule_name> : <terminal or non-terminal> <some other terminal or non-terminal> ...
A terminal notation must exactly matches (case sensitive) an enum value. Once the wytaxic tree build, the methods of each rule will be used as a syntaxic tree visitor. Each methods takes as many parameters as rule clauses. Each parameter can be typed according to the return value of the clause :
- for a terminal : the
Token<T>
corresponding to the token - for a non terminal : the result of the evaluation of the non terminal, i.e the value returned by the matching static method. As the parser output is typed (TOut as seen before) , the result of an evaluation for a non terminal is necessarily of type TOut.
When visiting a syntax tree, we sometimes need some context. As an example, imagine an expression parser allowing the use of variables, which can be initialized before parsing.
In this case an additional parameter is passed at the end of each visitor method.
And a dedicated ParseWithContext(source,context)
method is used to start a parse with a context (see below : example of parsing using a parsing context )
This context is a way to ensure that the parser is thread-safe as it avoids use any state in the visitor methods.
a mathematical parser calculate a mathematical expression. It takes a string as input and return a numeric value. So each method of the parser will return a numeric value (an int for simplicity concern)
[Production("primary: INT")]
public int Primary(Token<ExpressionToken> intToken)
{
return intToken.IntValue;
}
[Production("primary: LPAREN expression RPAREN")]
public int Group(object discaredLParen, int groupValue ,object discardedRParen)
{
return groupValue;
}
[Production("expression : term PLUS expression")]
[Production("expression : term MINUS expression")]
public int Expression(int left, Token<ExpressionToken> operatorToken, int right)
{
object result = 0;
switch (operatorToken.TokenID)
{
case ExpressionToken.PLUS:
{
result = left + right;
break;
}
case ExpressionToken.MINUS:
{
result = left - right;
break;
}
default:
{
break;
}
}
return result;
}
as we 've seen above a parser is declared on a single class with methods that address (the lexer is defined in an enum):
- grammar rules (with the
[Production("")]
attribute )
Once the class with all its methods has been written, it can be used to build the effective parser instance calling ParserBuilder.BuildParser. the builder methods takes 3 parameters :
- an instance of the class containing the lexer and parser definition
- the kind of parser. Currently only a recursive descent parsers are available. this implementation is limited to LL grammar by construction (no left recursion).There are 2 possible types :
- the root rule for the parser.
The parser is typed according to the token type and parser return type.
When calling ParserBuilder.BuildParser() you get :
- a flag stating if parser has correctly been built.
- a parser if everything went fine.
- a list of errors if something went bad. Errors contain
- an error code : seed error code.
- an error message.
Parser<ExpressionToken,int> parser = null;
ExpressionParser expressionParserDefinition = new ExpressionParser()
// here see the typing :
// ExpressionToken is the token enum type
// int is the type of a parse evaluation
BuildResult<Parser<WhileToken, WhileAST>> ParserResult = ParserBuilder.BuildParser<ExpressionToken,int>(expressionParserDefinition,
ParserType.LL_RECURSIVE_DESCENT,
"expression");
if (parserResult.IsOk) {
// everythin'fine : we have a configured parser
parser = parserResult.Result;
}
else {
// something's wrong
foreach(var error in parserResult.Errors) {
Console.WriteLine($"{error.Code} : {error.Message}");
}
}
then calling
```C#parser.Parse("some source code")```
will return the evaluation of the syntax tree.
the parser returns a ParseResult instance containing the evaluation value or a list of errors.
```c#
string expression = "1 + 1";
ParseResult<ExpressionToken> r = Parser.Parse(expression);
if (!r.IsError && r.Result != null && r.Result is int)
{
Console.WriteLine($"result of {expression} is {(int)r.Result}");
}
else
{
if (r.Errors !=null && r.Errors.Any())
{
// display errors
r.Errors.ForEach(error => Console.WriteLine(error.ErrorMessage));
}
}
A parse using a context is started using the ParseWithContext
of the Parser
. the context can be of any type. the example below extends the expression parser described above to allow the use of variable valued in a dictionary context.
// --------------------------------------------------
// the additional production rule for a variable
// --------------------------------------------------
[Production("primary_value : IDENTIFIER")]
public int OperandVariable(Token<ExpressionToken> identifier,Dictionary<string,int> context)
{
if (context.ContainsKey(identifier.Value))
{
return context[identifier.Value];
}
else
{
return 0;
}
}
// --------------------------------------------------
// calling a parse
// --------------------------------------------------
var res = parser.ParseWithContext("2 + a", new Dictionary<string, int> {{"a", 2}}); // will return 2
One build a parser expose :
-
a main API through the
Parse(string content)
method (chain lexical analysis, syntax parsing and finally call your parsing methods) -
the lexer through the Lexer property
-
the syntax parser through the SyntaxParser property (which type is a
ISyntaxParser
)
Defining your parser ⬅️ Implementing a BNF Parser ➡️ EBNF-parser