-
Notifications
You must be signed in to change notification settings - Fork 35
Implementing a BNF parser
Per the grammar production rules, csly will generate a concrete syntax tree. In the previous section, we explained how syntax trees are defined in csly using the SyntaxNode
type. Nodes can be terminal or non-terminal. Terminal nodes are referred to as leaves.
Grammar rules and visitor methods are defined in a single C# class that you provide, the expression parser. Each method that is defined in the class is associated with a production rule. See grammar definition below for further details.
You can retrieve the whole CST in a number of ways. Once the CST is built, csly provides a way to visit each node programmatically using LINQ syntax. One way to examine the whole tree is by calling the SyntaxTree
property from the ParseResult
object and examining the object with your debugger. SyntaxTree
is implemented as a single root SyntaxNode
which can be traversed by iterating the Children
property. It is also possible to generate a graphical tree representation with a number of tools. See View syntax tree for an example using VizGraph.
The grammar defining the parser is defined using the C# attribute [Production("some grammar rule")]
mapped to methods on your expression parser.
The rules follow the classical BNF notation:
<rule_name> : <terminal or non-terminal> <some other terminal or non-terminal> ...
A terminal rule name on the attribute string must exactly match the enum value of your expression tokens. Following C# conventions, the rules are case-sensitive. If the tokens don't match those on the attribute string, a NullReferenceException
is thrown on the call to parser.Parse(...)
.
Each method takes as many parameters as rule clauses. Each parameter can be typed according to the return value of the clause:
- for a terminal: the
Token<T>
corresponding to the token, - for a non-terminal: the result of the evaluation of the non-terminal, i.e., the value returned by the matching static method. As the parser output is typed
<TOut>
, as seen in the previous section, the result of an evaluation for a non-terminal is necessarily of typeTOut
.
When traversing a syntax tree, the parser sometimes uses a symbolic context. As an example, imagine an expression parser allowing the use of variables, which can be declared and initialized before parsing. In the case of a visitor with context, an additional parameter is passed at the end of each visitor method. The Parser.ParseWithContext(source, context)
method is thread-safe and ensures that the correct context will be parsed.
The parse-with-context provision ensures the parser will only allow one thread to access the state-dependent implementation in the visitor method at any one time. See below for example of parsing using a parsing context. For more information on thread-safety and concurrency see Joe Albahari or Jon Skeet's excellent expositions on thread-safe C# implementations.
A mathematical parser evaluates and solves mathematical expressions. It takes a string as input and return a numeric value. Each method of the parser will return a numeric value. For simplicity, in this example we'll continue using whole numbers.
[Production("primary: INT")]
public int Primary(Token<ExpressionToken> intToken)
{
return intToken.IntValue;
}
[Production("primary: LPAREN expression RPAREN")]
public int Group(object discaredLParen, int groupValue ,object discardedRParen)
{
return groupValue;
}
[Production("expression : term PLUS expression")]
[Production("expression : term MINUS expression")]
public int Expression(int left, Token<ExpressionToken> operatorToken, int right)
{
object result = 0;
switch (operatorToken.TokenID)
{
case ExpressionToken.PLUS:
{
result = left + right;
break;
}
case ExpressionToken.MINUS:
{
result = left - right;
break;
}
default:
{
break;
}
}
return result;
}
as we 've seen above a parser is declared on a single class with methods that address (the lexer is defined in an enum):
- grammar rules (with the
[Production("")]
attribute )
Once the class with all its methods has been written, it can be used to build the effective parser instance calling ParserBuilder.BuildParser. the builder methods takes 3 parameters :
- an instance of the class containing the lexer and parser definition
- the kind of parser. Currently only a recursive descent parsers are available. this implementation is limited to LL grammar by construction (no left recursion).There are 2 possible types :
- the root rule for the parser.
The parser is typed according to the token type and parser return type.
When calling ParserBuilder.BuildParser() you get :
- a flag stating if parser has correctly been built.
- a parser if everything went fine.
- a list of errors if something went bad. Errors contain
- an error code : seed error code.
- an error message.
Parser<ExpressionToken,int> parser = null;
ExpressionParser expressionParserDefinition = new ExpressionParser()
// here see the typing :
// ExpressionToken is the token enum type
// int is the type of a parse evaluation
BuildResult<Parser<WhileToken, WhileAST>> ParserResult = ParserBuilder.BuildParser<ExpressionToken,int>(expressionParserDefinition,
ParserType.LL_RECURSIVE_DESCENT,
"expression");
if (parserResult.IsOk) {
// everythin'fine : we have a configured parser
parser = parserResult.Result;
}
else {
// something's wrong
foreach(var error in parserResult.Errors) {
Console.WriteLine($"{error.Code} : {error.Message}");
}
}
then calling
```C#parser.Parse("some source code")```
will return the evaluation of the syntax tree.
the parser returns a ParseResult instance containing the evaluation value or a list of errors.
```c#
string expression = "1 + 1";
ParseResult<ExpressionToken> r = Parser.Parse(expression);
if (!r.IsError && r.Result != null && r.Result is int)
{
Console.WriteLine($"result of {expression} is {(int)r.Result}");
}
else
{
if (r.Errors !=null && r.Errors.Any())
{
// display errors
r.Errors.ForEach(error => Console.WriteLine(error.ErrorMessage));
}
}
A parse using a context is started using the ParseWithContext
of the Parser
. the context can be of any type. the example below extends the expression parser described above to allow the use of variable valued in a dictionary context.
// --------------------------------------------------
// the additional production rule for a variable
// --------------------------------------------------
[Production("primary_value : IDENTIFIER")]
public int OperandVariable(Token<ExpressionToken> identifier,Dictionary<string,int> context)
{
if (context.ContainsKey(identifier.Value))
{
return context[identifier.Value];
}
else
{
return 0;
}
}
// --------------------------------------------------
// calling a parse
// --------------------------------------------------
var res = parser.ParseWithContext("2 + a", new Dictionary<string, int> {{"a", 2}}); // will return 2
One build a parser expose :
-
a main API through the
Parse(string content)
method (chain lexical analysis, syntax parsing and finally call your parsing methods) -
the lexer through the Lexer property
-
the syntax parser through the SyntaxParser property (which type is a
ISyntaxParser
)
Defining your parser ⬅️ Implementing a BNF Parser ➡️ EBNF-parser