Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use AstGen in analysis #551

Open
SuperAuguste opened this issue Jul 19, 2022 · 5 comments
Open

Use AstGen in analysis #551

SuperAuguste opened this issue Jul 19, 2022 · 5 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@SuperAuguste
Copy link
Member

AstGen generates the fancy ZIR untyped stage2 intermediate representation format that is a nice bit of padding between AST and analysis. We can't use AIR (the typed stage2 intermediate representation format that comes after ZIR) if the AST is invalid but we can use ZIR, so it's a really nice middle ground. AstGen is fast, accurate, and officially supported and it would simplify our analysis logic a bunch!

@SuperAuguste
Copy link
Member Author

Note: AstGen is slightly less flexible than using AST, for better or for worse. Essentially, AST can help us analyze all sorts of nightmare scenarios, albeit rather inaccurately, whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

For example, AST analysis can complete the following but AstGen will need to wait for you to resolve the uninitialized variable error first:

var amogus: BasedType;
amogus.

On the other hand, using AstGen means reducing zls' code footprint, properly following the (eventual) Zig spec, and matching the Zig compiler's feedback one-to-one. We get actual ast-check diagnostics, we can analyze all sorts of scenarios without a million massive switch statements, and more!

  1. Thoughts on this?
  2. Anybody want to explore attempting to "flexibilize" AstGen to be able to parse invalid Ast?

@matu3ba
Copy link

matu3ba commented Jul 27, 2022

See https://rdambrosio016.github.io/rust/2020/09/18/pure-ast-based-linting-sucks.html#so-how-does-a-linter-work
for another approach.

I do see 2 potential main problems in the design space of Zig parser:

    1. Makes error recovery for a parser very constrained
    1. Incremental reparsing is impossible

Recovery code and tests may be sufficient, but handling multiple faults requires more complex state handling.

Next steps

    1. To decide if it is worth the complexity, the perf cost (mostly latency from different steps) and expected complexity for worst case supported use case need to be analyzed and compared to expected gain
    1. How feasible is it to "parse 1 step and undo 1 step" ie via invalidation of AST fields (of the slice) and copying them over to the reduced form (like the steps to fill a mtree via offset array)?
    1. What is the expected frequency and cost of source location access?
    1. How fast is lookup via source location into the AST to modify it (user puts cursor somewhere to add/remove things)?
    1. How do we prevent feature creep? Limit analysis, when {} and () are not matching?
    1. Find good use case for analysis.

@SuperAuguste
Copy link
Member Author

SuperAuguste commented Jul 27, 2022

Really interesting article @matu3ba! Not sure if this is the right place to discuss this though! Perhaps opening a new issue would make sense?

@matu3ba
Copy link

matu3ba commented Jul 27, 2022

Perhaps opening a new issue would make sense?

I'll make one once I have perf numbers up to AST (from my zig-reduce stuff), such that we are not only theory crafting. I hope to have them in 7 days (busy on job etc).

@Techatrix Techatrix mentioned this issue Sep 29, 2022
24 tasks
@matklad
Copy link
Contributor

matklad commented Dec 20, 2022

whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

My gut feeling here is that the first-best solution is to teach AstGen to gracefully handle invalid AST. Handling partial input in the parser/AST is the actually hard bit, where you need some explicit code. Above that, it's straightforward to fix up all missing names/expressions/types etc with a special "unknown" value, and the code there "writes itself", as you just propagate the unknowns (eg, amogus. AST is translated to amogus.<unknown> zir which resolves to <unknown> field which has <unknown> type).

@Techatrix Techatrix self-assigned this Jan 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Development

No branches or pull requests

4 participants