Skip to content

Implementing a BNF parser

Aldo Salzberg edited this page Oct 24, 2021 · 16 revisions

General approach

Concrete Syntax Tree (CST) generation

Per the grammar production rules, csly will generate a concrete syntax tree. We have already defined nodes in csly using the SyntaxNode type. Nodes can be terminal or non-terminal. Terminal nodes are referred to as leaves.

Grammar definition and CST traversal

Grammar rules and visitor methods are defined in a single C# class, the expression parser. Each method that is defined in the clas is associated with a production rule. See grammar definition below for further details.

Inspecting the CST

You can retrieve the whole CST in a number of ways. Once the CST is built, csly provides a way to visit each node programmatically using LINQ syntax. One way to examine the whole tree is by calling the SyntaxTree property from the ParseResult object and examining the object with your debugger. SyntaxTree is implemented as a single root SyntaxNode which can be traversed by iterating the Children property. It is also possible to generate a graphical tree representation with a number of tools. See View syntax tree for an example using VizGraph.

Grammar definition

The grammar defining the parser is defined using C# attribute [Production("some grammar rule")] mapped to methods

The rules follow the classical BNF notation : `

<rule_name> : <terminal or non-terminal> <some other terminal or non-terminal> ...

A terminal notation must exactly matches (case sensitive) an enum value. Once the wytaxic tree build, the methods of each rule will be used as a syntaxic tree visitor. Each methods takes as many parameters as rule clauses. Each parameter can be typed according to the return value of the clause :

  • for a terminal : the Token<T> corresponding to the token
  • for a non terminal : the result of the evaluation of the non terminal, i.e the value returned by the matching static method. As the parser output is typed (TOut as seen before) , the result of an evaluation for a non terminal is necessarily of type TOut.

Using a context during syntax tree visiting

When visiting a syntax tree, we sometimes need some context. As an example, imagine an expression parser allowing the use of variables, which can be initialized before parsing. In this case an additional parameter is passed at the end of each visitor method. And a dedicated ParseWithContext(source,context) method is used to start a parse with a context (see below : example of parsing using a parsing context ) This context is a way to ensure that the parser is thread-safe as it avoids use any state in the visitor methods.

partial example for a mathematical expression evaluator

a mathematical parser calculate a mathematical expression. It takes a string as input and return a numeric value. So each method of the parser will return a numeric value (an int for simplicity concern)

         [Production("primary: INT")]
        public int Primary(Token<ExpressionToken> intToken)
        {
            return intToken.IntValue;
        }

        [Production("primary: LPAREN expression RPAREN")]
        public int Group(object discaredLParen, int groupValue ,object discardedRParen)
        {
            return groupValue;
        }



        [Production("expression : term PLUS expression")]
        [Production("expression : term MINUS expression")]

        public int Expression(int left, Token<ExpressionToken> operatorToken, int  right)
        {
            object result = 0;
            

            switch (operatorToken.TokenID)
            {
                case ExpressionToken.PLUS:
                    {
                        result = left + right;
                        break;
                    }
                case ExpressionToken.MINUS:
                    {
                        result = left - right;
                        break;
                    }
                default:
                    {
                        break;
                    }
            }
            return result;
        }

       

Building a parser and using it

as we 've seen above a parser is declared on a single class with methods that address (the lexer is defined in an enum):

  • grammar rules (with the [Production("")] attribute )

Once the class with all its methods has been written, it can be used to build the effective parser instance calling ParserBuilder.BuildParser. the builder methods takes 3 parameters :

  1. an instance of the class containing the lexer and parser definition
  2. the kind of parser. Currently only a recursive descent parsers are available. this implementation is limited to LL grammar by construction (no left recursion).There are 2 possible types :
    • ParserType.LL_RECURSIVE_DESCENT : a BNF notation grammar parser
    • ParserType.EBNF_LL_RECURSIVE_DESCENT : a EBNF notation grammar parser. EBNF notation provides additional multiplier notation (* and + for now)
  3. the root rule for the parser.

The parser is typed according to the token type and parser return type.

When calling ParserBuilder.BuildParser() you get :

  • a flag stating if parser has correctly been built.
  • a parser if everything went fine.
  • a list of errors if something went bad. Errors contain
    • an error code : seed error code.
    • an error message.
Parser<ExpressionToken,int> parser = null;

ExpressionParser expressionParserDefinition = new ExpressionParser()
// here see the typing :
//  ExpressionToken is the token enum type
//  int is the type of a parse evaluation
BuildResult<Parser<WhileToken, WhileAST>> ParserResult = ParserBuilder.BuildParser<ExpressionToken,int>(expressionParserDefinition,
                                                                            ParserType.LL_RECURSIVE_DESCENT,
                                                                            "expression");
if (parserResult.IsOk) {
    // everythin'fine : we have a configured parser
    parser = parserResult.Result;
}
else {
    // something's wrong
    foreach(var error in parserResult.Errors) {
        Console.WriteLine($"{error.Code} : {error.Message}");
    }    
}



then calling 
```C#parser.Parse("some source code")``` 
will return the evaluation of the syntax tree.
the parser returns a ParseResult instance containing the evaluation value or a list of errors.

```c#

	string expression = "1 + 1";

    ParseResult<ExpressionToken> r = Parser.Parse(expression);


    if (!r.IsError && r.Result != null && r.Result is int)
    {
        Console.WriteLine($"result of {expression}  is {(int)r.Result}");
    }
    else
    {
        if (r.Errors !=null && r.Errors.Any())
        {
        	// display errors
            r.Errors.ForEach(error => Console.WriteLine(error.ErrorMessage));
        }
    }

example of parsing using a parsing context

A parse using a context is started using the ParseWithContext of the Parser. the context can be of any type. the example below extends the expression parser described above to allow the use of variable valued in a dictionary context.

// --------------------------------------------------
//       the additional production rule for  a variable
// --------------------------------------------------

  [Production("primary_value : IDENTIFIER")]
        public int OperandVariable(Token<ExpressionToken> identifier,Dictionary<string,int> context)
        {
            if (context.ContainsKey(identifier.Value))
            {
                return context[identifier.Value];
            }
            else
            {
                return 0;
            }
        }

// --------------------------------------------------
//       calling a parse
// --------------------------------------------------

var res = parser.ParseWithContext("2 + a", new Dictionary<string, int> {{"a", 2}}); // will return 2

access lexer and parsers

One build a parser expose :

  • a main API through the Parse(string content) method (chain lexical analysis, syntax parsing and finally call your parsing methods)

  • the lexer through the Lexer property

  • the syntax parser through the SyntaxParser property (which type is a ISyntaxParser)

Defining your parser ⬅️ Implementing a BNF Parser ➡️ EBNF-parser