Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reporting and recovery #16

Open
eddyb opened this issue Aug 21, 2018 · 2 comments
Open

Error reporting and recovery #16

eddyb opened this issue Aug 21, 2018 · 2 comments

Comments

@eddyb
Copy link
Member

eddyb commented Aug 21, 2018

One of the simplest things we could do is keep a buffer of "attempted input matches" at the "most advanced input location", as we parse, keeping only the entries with the largest starting point.
Then they could be reported as an "expected one of ..." error. rustc itself does something similar:

use x.
error: expected one of `::`, `;`, or `as`, found `.`
 --> src/lib.rs:1:6
  |
1 | use x.
  |      ^ expected one of `::`, `;`, or `as` here

An optimization over this would be to not keep that buffer until there's an error, and then only redo the bit of the parse that errored, this time buffering input matches.


Another technique that would help localize and constrain an error, is to use backward parsing (from #13) after an error in forward parsing, to find the "longest valid prefix and suffix" of the input, and if the most advanced failures in both directions get close together, then the syntax error can be localized to even one token/character.

Error recovery could be done for a localized error by either:

  • splicing in one of the expected inputs and using an incremental reparse (Incremental reparsing #12) to redo the parse, in the hope that it will get farther (this can be repeated as long as it keeps shrinking the error)
  • faking success on the most advanced failures, without actual input being used up by the parse, e.g. f(x.) could recover as Call(Var("f"), Field(Var("x"), "")), instead of showing up as f(x.a) or similar (from picking a character that'd work) - this would be useful to IDEs

All approaches to recovery for a GLL parser can involve some amount of non-determinism, allowing multiple recovery possibilities to continue through, and picking the best outcome through heuristics.

@eddyb
Copy link
Member Author

eddyb commented Sep 18, 2018

For custom diagnostics, @eternaleye is suggesting using "spines", made up of vertebrae, each standing in for an incomplete rule/SPPF node, and providing access to its completed children (on the left of the error) and child vertebra (overlapping the error).
(this has interactions with #17 - for efficiency, we'd want to build these spines while parsing but only keep the rightmost ones, and discard the rest, including the orphan SPPF nodes they kept alive)
That way, hand-written recursive descent diagnostic logic can be ported over, with the main distinction (other than exact API in use) being that the choice of diagnostic itself can be ambiguous.

We can even provide some typed APIs, although the level of detail (i.e. type safety) has to be balanced with ergonomics. We could have, for each Foo rule, a partial::Foo version, using, for fields:

enum /*partial::*/Child<T: ?Sized> {
    Complete(super::Handle<T>),
    Partial(/*partial::*/Handle<T> /* aka Verterbra<T> */),
    NotStarted,
}

This way, the custom diagnostics could pattern-match on e.g.:

Expr::Cast { expr: Child::Complete(expr), ty: Child::Partial(ty) }

@eddyb
Copy link
Member Author

eddyb commented Oct 18, 2018

Potentially useful paper to look into: https://arxiv.org/abs/1804.07133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant