Skip to content

Implementing a BNF parser

Aldo Salzberg edited this page Oct 29, 2021 · 16 revisions

General approach

Concrete Syntax Tree (CST) generation

Per the grammar production rules, csly will generate a concrete syntax tree. In the previous section, we explained how syntax trees are defined in csly using the SyntaxNode type. Nodes can be terminal or non-terminal. Terminal nodes are referred to as leaves.

Grammar definition and CST traversal

Grammar rules and visitor methods are defined in a single C# class that you provide, the expression parser. Each method that is defined in the class is associated with a production rule. See grammar definition below for further details.

Inspecting the CST

You can retrieve the whole CST in a number of ways. Once the CST is built, csly provides a way to visit each node programmatically using LINQ syntax. One way to examine the whole tree is by calling the SyntaxTree property from the ParseResult object and examining the object with your debugger. SyntaxTree is implemented as a single root SyntaxNode which can be traversed by iterating the Children property. It is also possible to generate a graphical tree representation with a number of tools. See View syntax tree for an example using VizGraph.

Grammar definition

The grammar defining the parser is defined using C# attribute [Production("some grammar rule")] mapped to methods on your expression parser.

The rules follow the classical BNF notation:

<rule_name> : <terminal or non-terminal> <some other terminal or non-terminal> ...

A terminal rule name on the attribute string must exactly match the enum value of your expression tokens. Following C# conventions, the rules are case-sensitive. If the tokens don't match those on the attribute string, a NullReferenceException is thrown on the call to parser.Parse(...).

Each method takes as many parameters as rule clauses. Each parameter can be typed according to the return value of the clause:

  • for a terminal: the Token<T> corresponding to the token,
  • for a non-terminal: the result of the evaluation of the non-terminal, i.e., the value returned by the matching static method. As the parser output is typed (<TOut> as seen in the previous section) , the result of an evaluation for a non terminal is necessarily of type .

Using a context during syntax tree traversal

When visiting a syntax tree, we sometimes need symbolic context. As an example, imagine an expression parser allowing the use of variables, which can be declared and initialized before parsing. In this case an additional parameter is passed at the end of each visitor method. The ParseWithContext(source,context) method is thread-safe and ensures that the correct context will be parsed. The parse-with-context provision ensures thread safety, since the parser will prevent multiple threads from accessing the state-dependent implementation in the visitor methods at the same time. See below for example of parsing using a parsing context. For more information on thread-safety and concurrency see Joe Albahari or Jon Skeet's excellent expositions on thread-safe C# implementations.

partial example for a mathematical expression evaluator

a mathematical parser calculate a mathematical expression. It takes a string as input and return a numeric value. So each method of the parser will return a numeric value (an int for simplicity concern)

         [Production("primary: INT")]
        public int Primary(Token<ExpressionToken> intToken)
        {
            return intToken.IntValue;
        }

        [Production("primary: LPAREN expression RPAREN")]
        public int Group(object discaredLParen, int groupValue ,object discardedRParen)
        {
            return groupValue;
        }



        [Production("expression : term PLUS expression")]
        [Production("expression : term MINUS expression")]

        public int Expression(int left, Token<ExpressionToken> operatorToken, int  right)
        {
            object result = 0;
            

            switch (operatorToken.TokenID)
            {
                case ExpressionToken.PLUS:
                    {
                        result = left + right;
                        break;
                    }
                case ExpressionToken.MINUS:
                    {
                        result = left - right;
                        break;
                    }
                default:
                    {
                        break;
                    }
            }
            return result;
        }

       

Building a parser and using it

as we 've seen above a parser is declared on a single class with methods that address (the lexer is defined in an enum):

  • grammar rules (with the [Production("")] attribute )

Once the class with all its methods has been written, it can be used to build the effective parser instance calling ParserBuilder.BuildParser. the builder methods takes 3 parameters :

  1. an instance of the class containing the lexer and parser definition
  2. the kind of parser. Currently only a recursive descent parsers are available. this implementation is limited to LL grammar by construction (no left recursion).There are 2 possible types :
    • ParserType.LL_RECURSIVE_DESCENT : a BNF notation grammar parser
    • ParserType.EBNF_LL_RECURSIVE_DESCENT : a EBNF notation grammar parser. EBNF notation provides additional multiplier notation (* and + for now)
  3. the root rule for the parser.

The parser is typed according to the token type and parser return type.

When calling ParserBuilder.BuildParser() you get :

  • a flag stating if parser has correctly been built.
  • a parser if everything went fine.
  • a list of errors if something went bad. Errors contain
    • an error code : seed error code.
    • an error message.
Parser<ExpressionToken,int> parser = null;

ExpressionParser expressionParserDefinition = new ExpressionParser()
// here see the typing :
//  ExpressionToken is the token enum type
//  int is the type of a parse evaluation
BuildResult<Parser<WhileToken, WhileAST>> ParserResult = ParserBuilder.BuildParser<ExpressionToken,int>(expressionParserDefinition,
                                                                            ParserType.LL_RECURSIVE_DESCENT,
                                                                            "expression");
if (parserResult.IsOk) {
    // everythin'fine : we have a configured parser
    parser = parserResult.Result;
}
else {
    // something's wrong
    foreach(var error in parserResult.Errors) {
        Console.WriteLine($"{error.Code} : {error.Message}");
    }    
}



then calling 
```C#parser.Parse("some source code")``` 
will return the evaluation of the syntax tree.
the parser returns a ParseResult instance containing the evaluation value or a list of errors.

```c#

	string expression = "1 + 1";

    ParseResult<ExpressionToken> r = Parser.Parse(expression);


    if (!r.IsError && r.Result != null && r.Result is int)
    {
        Console.WriteLine($"result of {expression}  is {(int)r.Result}");
    }
    else
    {
        if (r.Errors !=null && r.Errors.Any())
        {
        	// display errors
            r.Errors.ForEach(error => Console.WriteLine(error.ErrorMessage));
        }
    }

example of parsing using a parsing context

A parse using a context is started using the ParseWithContext of the Parser. the context can be of any type. the example below extends the expression parser described above to allow the use of variable valued in a dictionary context.

// --------------------------------------------------
//       the additional production rule for  a variable
// --------------------------------------------------

  [Production("primary_value : IDENTIFIER")]
        public int OperandVariable(Token<ExpressionToken> identifier,Dictionary<string,int> context)
        {
            if (context.ContainsKey(identifier.Value))
            {
                return context[identifier.Value];
            }
            else
            {
                return 0;
            }
        }

// --------------------------------------------------
//       calling a parse
// --------------------------------------------------

var res = parser.ParseWithContext("2 + a", new Dictionary<string, int> {{"a", 2}}); // will return 2

access lexer and parsers

One build a parser expose :

  • a main API through the Parse(string content) method (chain lexical analysis, syntax parsing and finally call your parsing methods)

  • the lexer through the Lexer property

  • the syntax parser through the SyntaxParser property (which type is a ISyntaxParser)

Defining your parser ⬅️ Implementing a BNF Parser ➡️ EBNF-parser