-
Notifications
You must be signed in to change notification settings - Fork 35
expression parsing
Writing an expression grammar is a tedious task. A recursive descent parser is hard to maintain when parsing expressions with multiple precedence levels. So CSLY offers a way to express expression parsing using only operator tokens and precedence level. CSLY will then generates production rules to parse expressions.
EBNF_LL_RECURSIVE_DESCENT
type as :
Parser = builder.BuildParser(parserInstance, ParserType.EBNF_LL_RECURSIVE_DESCENT, StartingRule);
An expression operation is noted using the [Operation(token, affix, associativity, precedence)]
attribute where
-
token
is- the int value of a token ( C# does not support generics on attributes. May be coming with C# 8)
- or the string representation of the token.
-
affix
is the affix of the expression :-
Affix.PreFix
: for prefix operators such as :- 3
-
Affix.PostFix
: for postfix operators such as :i ++
-
Affix.InFix
: for infix operators such as :2 + 2
-
-
associativity
is an Associativity enum member-
Associativity.Right
for right associativity -
Associativity.Left
for left associativity
-
-
precedence
is an int stating the precedence level : the higher the int is the higher the precedence is. Each attribute is associated to a method that will act as the syntax tree visitor for the matching operation.
There exist shorter attributes for infix, prefix and postfix operations :
-
Infix(token, associativity, precedence)
: infix operation -
Prefix(token, associativity, precedence)
: prefix operation -
Postfix(token, associativity, precedence)
: postfix operation
for example here is a basic example for a numeric expression parser :
public class SimpleExpressionParser
{
// using token int value
[Operation((int)ExpressionToken.PLUS, Affix.InFix, Associativity.Right, 10)]
// using token string representation and Infix attribute
[Infix("MINUS", Associativity.Left, 10)]
public int binaryTermExpression(int left, Token<ExpressionToken> operation, int right)
{
int result = 0;
switch (operation.TokenID)
{
case ExpressionToken.PLUS:
{
result = left + right;
break;
}
case ExpressionToken.MINUS:
{
result = left - right;
break;
}
}
return result;
}
// using token int value
[Operation((int)ExpressionToken.TIMES, Affix.InFix,, Associativity.Right, 50)]
// using token string representation
[Operation("DIVIDE", Affix.InFix, Associativity.Left, 50)]
public int binaryFactorExpression(int left, Token<ExpressionToken> operation, int right)
{
int result = 0;
switch (operation.TokenID)
{
case ExpressionToken.TIMES:
{
result = left * right;
break;
}
case ExpressionToken.DIVIDE:
{
result = left / right;
break;
}
}
return result;
}
[Prefix((int)ExpressionToken.MINUS, Associativity.Right, 100)]
public int unaryExpression(Token<ExpressionToken> operation, int value)
{
return -value;
}
The expression parser generator need to identify a production rule for operands (int bool or string scalar values for example). the [Operand] attribute must indicate the Production rule producing basic operands
[Operand]
[Production("operand : INT")]
public int operand(Token<ExpressionToken> value)
{
return value.IntValue;
}
You can define many [Operand]
visitor methods. As in :
[Operand]
[Production("operand_int : INT")]
public int operand(Token<ExpressionToken> value)
{
return value.IntValue;
}
[Operand]
[Production("operand_double : double")]
public int operand(Token<ExpressionToken> value)
{
return value.IntValue;
}
CSLY generates a non terminal named <ParserClassName>_expressions
where <ParserClassName>
is the C# parser class name (case sensitive). SimpleExpressionParser_expressions
for the above example.
Under the hood, for each precedence level CSLY generates dedicated production rules. Let say we have 2 precedence level P1 (operator "+") and P2(operator "*") with P2 > P1 then CSLY generates the following rules :
> > ** P1 ***
> > P1 : P2 "+" P1
> > P1 : P2
> > ** P2 **
> > P2 : operand "*" operand
> > P2: operand
This scheme can be reproduced with as many precedence level as needed.
As stated above LL grammar (the ones handled by CSLY) cannot managed left associative operator. Hence for a naive expression parser 1 - 2 - 3 will be understood as 1 - (2 - 3) = 0 and not (1 - 2) - 3 = -4 as expected.
To cope with this CSLY acts like this :
- produce a syntaxic tree (right associative)
- if needed (i.e. a left associative operation is found in the syntaxic tree) : rearrange the tree to produce a left associative tree.