Get rid of (negative) lookahead (somehow) #14

eddyb · 2018-08-21T19:16:26Z

used right now for lexing, e.g. ensuring that an identifier ([a-zA-Z][a-zA-Z0-9]* in regex) is not followed by more identifier characters ((?![a-zA-Z0-9]) in some regex dialects), forcing the rule to match the longest valid identifier (just like most regex semantics out there)
it makes the grammar context-sensitive (when it maybe doesn't need to)
incremental reparsing (Incremental reparsing #12) would need to track larger input ranges than the SPPF uses
bidirectional parsing (Bidirectional parsing #13) would prefer symmetrical lookbehind, which worsens the problem
- either both direction checks both lookbehind and lookahead, or we do an analysis to determine that one or both is unnecessary given its contexts
we could transform what the user writes as look{ahead,behind} into "static context-sensitivity", propagating it up and specializing callers, generating a (much?) larger but context-free grammar, which only accesses input within the success range of each rule invocation

The text was updated successfully, but these errors were encountered:

eddyb · 2018-08-25T06:02:49Z

Found a solution while talking to @eternaleye, and it involves #27:

// Existing Rust grammar:
ForLoop = "for" Pat "in" Expr Block;
// Scannerless CFG:
ForLoop = "for" {
    Pat(/*allow_ident_left=*/"0", /*allow_ident_right=*/"0") |
    // NB: WS is *mandatory* whitespace here:
    WS Pat(/*allow_ident_left=*/"1", /*allow_ident_right=*/"0") |
    Pat(/*allow_ident_left=*/"0", /*allow_ident_right=*/"1") WS |
    WS Pat(/*allow_ident_left=*/"1", /*allow_ident_right=*/"1") WS
} "in" {
    Expr(/*allow_ident_left=*/"0", /*allow_ident_right=*/"1") |
    WS Pat(/*allow_ident_left=*/"1", /*allow_ident_right=*/"1")
} WS? Block;

Note that this is a first approximation and we could find some sugar for it, perhaps, especially to avoid having to write WS? everywhere else in the grammar.

One possibility could be "for" Pat(WS?, WS?) "in" Expr(WS?, "1") Block. That is, writing WS? explicitly removes the implicit WS? and expands the parametric/macro rule invocation to pass "0" or "1 depending on whether there is whitespace to the left (or right, for the second WS?).

EDIT: We probably don't need that WS? trick (it doesn't work for "as" anyway), and instead we could use e.g. 4 rules, named ExprI{L,}{R,} (or, alternatively, Expr{{After,Before,Between}KW,}). Not sure how to control whitespace insertion though.
Maybe any rule that can start/end with explicit whitespace doesn't get whitespace also inserted before/after it at its use sites?

eddyb · 2018-09-11T08:50:36Z

While we might be able to implement the machinery to allow writing the explicit version sooner, it's becoming increasingly clear how unergonomic it would be to write such a grammar.

A good middle-ground could be taking lookaround and propagating it up the grammar, and require that it cleanly "intersects" with existing terminals and "dissolves" away, leaving a CFG behind.

eddyb added the help wanted Extra attention is needed label Aug 21, 2018

eddyb mentioned this issue Aug 25, 2018

Parametric ("macro"?) grammar rules. #27

Open

eddyb removed the help wanted Extra attention is needed label Aug 25, 2018

eddyb mentioned this issue Oct 31, 2018

Lexical specification? rust-lang/wg-grammar#3

Open

eddyb mentioned this issue May 14, 2019

Parse grammars with proc_macro tokens and remove "negative lookahead". #113

Merged

eddyb closed this as completed in #113 May 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of (negative) lookahead (somehow) #14

Get rid of (negative) lookahead (somehow) #14

eddyb commented Aug 21, 2018

eddyb commented Aug 25, 2018 •

edited

Loading

eddyb commented Sep 11, 2018

Get rid of (negative) lookahead (somehow) #14

Get rid of (negative) lookahead (somehow) #14

Comments

eddyb commented Aug 21, 2018

eddyb commented Aug 25, 2018 • edited Loading

eddyb commented Sep 11, 2018

eddyb commented Aug 25, 2018 •

edited

Loading