Keywords `token` and `layout` are apparently not compatible #372

MatthewDaggitt · 2021-06-28T08:28:10Z

When using the parser generated by the grammar fragment

position token TokLet      {"let"};
position token TokIn       {"in"};

ELet.       Expr  ::= TokLet "{" [Expr] "}" TokIn Expr;

layout "let";
layout stop "in";

to parse:

test =
  let y : Real
      y = f x ! 0
  in y

I get syntax error at line 2, column 8 before 'y'.

If instead I use the grammar:

ELet.       Expr  ::= "let" "{" [Expr] "}" "in" Expr;

layout "let";
layout stop "in";

then everything parses fine. The main problem with this is that I think(?) it means that I'm unable to get position information for the "let" and "in" tokens.

The text was updated successfully, but these errors were encountered:

andreasabel · 2021-06-28T09:45:57Z

The BNFC philosophy is that tokens have some content you are interested in, whereas keywords like "let" have none. So the latter are not represented in the AST, and consequently have no position information.

One could boost the layout mechanism by allowing also declared tokens as layout keywords, i.e.,

layout TokLet;
layout stop TokIn;

Beyond that, one could detect choice-less tokens such as TokLet and represent them as unit types in the AST...

In general, it would be nice to have BNFC produce a concrete syntax tree (CST) and a procedure to extract the AST from the CST. But this is a quite radical redesign.

MatthewDaggitt · 2021-06-29T06:50:04Z

Thanks for the reply @andreasabel. So, without making keywords such as let a token, what is the procedure to extract the location of the let expression in the source code for error reporting purposes whilst adhering to the BNFC philosophy? At the moment my main motivating example is trying to work around #371, i.e. throw a meaningful error with source code locations if the list of let bindings is empty.

However, I think I must be misunderstanding about how to go about extracting positions generally. For example, if one were implementing a language server one would need to get positions for parse errors to highlight the location where the parsing fell over. But the pX functions generated in the Par module return Either String a with the positions already converted to the String. Am I misunderstanding something or is it something that BNFC does not support?

andreasabel · 2021-07-01T09:03:03Z

But the pX functions generated in the Par module return Either String a with the positions already converted to the String.

I think the error reporting API also needs a overhaul. The error location shouldn't be mangled into the error message.

In general, BNFC in its present state should be thought of as a wizard generating a parser and printer for prototyping rather than professional use. In the worst case, you can get a happy parser from BNFC that sort of does the thing and then hand-edit it to your satisfaction.

That said, in general I am looking to make BNFC more professional without breaking backwards-compatibility if possible, so a better API to the parser and better position information in the syntax tree are definitely a way to go forward.

andreasabel · 2021-07-13T09:20:42Z

One could boost the layout mechanism by allowing also declared tokens as layout keywords, i.e.,
layout TokLet;
layout stop TokIn;

I am getting doubts whether this would be a common use case or just feature bloat. It seems that the OP wants something like that only for getting position information, but providing more position information should rather be addressed more systematically.

wenkokke · 2021-10-19T11:19:45Z

Perhaps a BNFC flag which adds the full range of input tokens parsed to create this node to each CST node?

andreasabel · 2021-10-19T16:00:11Z

Perhaps a BNFC flag which adds the full range of input tokens parsed to create this node to each CST node?

Atm, we do not produce CSTs. We currently only attach the starting position to each node (with --functor).

I understand your proposal in this way: instead of just attaching a single position to each node, attach a list of start positions of each production item (be it terminal or nonterminal). E.g., a node for

ELet.  Expr ::= "let" Bind "in" Expr;

would look like

ELet :: [Position] -> Bind -> Expr -> Expr

example = ELet [posLet, posBind, posIn, posExpr] bind expr

This could be happening under a flag that addresses the parser.

wenkokke · 2021-10-22T08:42:05Z

More or less, though instead of only providing the first position of each token it'd be better to provide the entire range, either as a pair of Position or using Data.Range.

The reason is that you don't know the length of every token, so this information is difficult to reconstruct.

MatthewDaggitt · 2021-10-22T10:16:49Z

That would certainly address our current problems. I can't help but wonder though if it's a bit hacky though, for example you would have to pattern match on the list and there's no static guarantee that it would have the right number of elements. I guess it could be made a tuple of the correct arity rather than a list... I don't know, offering the full CST seems like a much more principled way to go, although I realise it's a lot more effort.

MatthewDaggitt mentioned this issue Jun 28, 2021

Tokens let and in have no provenance information in the frontend language vehicle-lang/vehicle#6

Closed

andreasabel added layout Concerning layout (keywords) token Concerning token categories. labels Jun 28, 2021

andreasabel added AST Concerning the generated abstract syntax position Concerning position information in parsed AST labels Oct 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keywords `token` and `layout` are apparently not compatible #372

Keywords `token` and `layout` are apparently not compatible #372

MatthewDaggitt commented Jun 28, 2021

andreasabel commented Jun 28, 2021

MatthewDaggitt commented Jun 29, 2021 •

edited

Loading

andreasabel commented Jul 1, 2021

andreasabel commented Jul 13, 2021

wenkokke commented Oct 19, 2021

andreasabel commented Oct 19, 2021

wenkokke commented Oct 22, 2021

MatthewDaggitt commented Oct 22, 2021

Keywords token and layout are apparently not compatible #372

Keywords token and layout are apparently not compatible #372

Comments

MatthewDaggitt commented Jun 28, 2021

andreasabel commented Jun 28, 2021

MatthewDaggitt commented Jun 29, 2021 • edited Loading

andreasabel commented Jul 1, 2021

andreasabel commented Jul 13, 2021

wenkokke commented Oct 19, 2021

andreasabel commented Oct 19, 2021

wenkokke commented Oct 22, 2021

MatthewDaggitt commented Oct 22, 2021

Keywords `token` and `layout` are apparently not compatible #372

Keywords `token` and `layout` are apparently not compatible #372

MatthewDaggitt commented Jun 29, 2021 •

edited

Loading