Use `AstGen` in analysis #551

SuperAuguste · 2022-07-19T14:23:17Z

AstGen generates the fancy ZIR untyped stage2 intermediate representation format that is a nice bit of padding between AST and analysis. We can't use AIR (the typed stage2 intermediate representation format that comes after ZIR) if the AST is invalid but we can use ZIR, so it's a really nice middle ground. AstGen is fast, accurate, and officially supported and it would simplify our analysis logic a bunch!

SuperAuguste · 2022-07-20T22:06:45Z

Note: AstGen is slightly less flexible than using AST, for better or for worse. Essentially, AST can help us analyze all sorts of nightmare scenarios, albeit rather inaccurately, whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

For example, AST analysis can complete the following but AstGen will need to wait for you to resolve the uninitialized variable error first:

var amogus: BasedType;
amogus.

On the other hand, using AstGen means reducing zls' code footprint, properly following the (eventual) Zig spec, and matching the Zig compiler's feedback one-to-one. We get actual ast-check diagnostics, we can analyze all sorts of scenarios without a million massive switch statements, and more!

Thoughts on this?
Anybody want to explore attempting to "flexibilize" AstGen to be able to parse invalid Ast?

matu3ba · 2022-07-27T18:29:24Z

See https://rdambrosio016.github.io/rust/2020/09/18/pure-ast-based-linting-sucks.html#so-how-does-a-linter-work
for another approach.

I do see 2 potential main problems in the design space of Zig parser:

1. Makes error recovery for a parser very constrained
1. Incremental reparsing is impossible

Recovery code and tests may be sufficient, but handling multiple faults requires more complex state handling.

Next steps

1. To decide if it is worth the complexity, the perf cost (mostly latency from different steps) and expected complexity for worst case supported use case need to be analyzed and compared to expected gain
1. How feasible is it to "parse 1 step and undo 1 step" ie via invalidation of AST fields (of the slice) and copying them over to the reduced form (like the steps to fill a mtree via offset array)?
1. What is the expected frequency and cost of source location access?
1. How fast is lookup via source location into the AST to modify it (user puts cursor somewhere to add/remove things)?
1. How do we prevent feature creep? Limit analysis, when {} and () are not matching?
1. Find good use case for analysis.

SuperAuguste · 2022-07-27T18:55:09Z

Really interesting article @matu3ba! Not sure if this is the right place to discuss this though! Perhaps opening a new issue would make sense?

matu3ba · 2022-07-27T20:06:12Z

Perhaps opening a new issue would make sense?

I'll make one once I have perf numbers up to AST (from my zig-reduce stuff), such that we are not only theory crafting. I hope to have them in 7 days (busy on job etc).

matklad · 2022-12-20T17:09:36Z

whereas AstGen can complete only valid AST (per-block) but a lot more accurately.

My gut feeling here is that the first-best solution is to teach AstGen to gracefully handle invalid AST. Handling partial input in the parser/AST is the actually hard bit, where you need some explicit code. Above that, it's straightforward to fix up all missing names/expressions/types etc with a special "unknown" value, and the code there "writes itself", as you just propagate the unknowns (eg, amogus. AST is translated to amogus.<unknown> zir which resolves to <unknown> field which has <unknown> type).

SuperAuguste added enhancement New feature or request help wanted Extra attention is needed labels Jul 19, 2022

SuperAuguste self-assigned this Jul 19, 2022

SuperAuguste mentioned this issue Jul 19, 2022

Use Sema when possible #552

Open

SuperAuguste added this to Stage2ification of the Zig Language Server Jul 19, 2022

SuperAuguste moved this to In Progress in Stage2ification of the Zig Language Server Jul 19, 2022

Techatrix mentioned this issue Sep 29, 2022

Code Actions & Autofix #689

Open

24 tasks

Techatrix self-assigned this Jan 25, 2023

Techatrix mentioned this issue Dec 10, 2023

Missing Diagnostics #1659

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `AstGen` in analysis #551

Use `AstGen` in analysis #551

SuperAuguste commented Jul 19, 2022

SuperAuguste commented Jul 20, 2022

matu3ba commented Jul 27, 2022

SuperAuguste commented Jul 27, 2022 •

edited

Loading

matu3ba commented Jul 27, 2022

matklad commented Dec 20, 2022

Use AstGen in analysis #551

Use AstGen in analysis #551

Comments

SuperAuguste commented Jul 19, 2022

SuperAuguste commented Jul 20, 2022

matu3ba commented Jul 27, 2022

SuperAuguste commented Jul 27, 2022 • edited Loading

matu3ba commented Jul 27, 2022

matklad commented Dec 20, 2022

Use `AstGen` in analysis #551

Use `AstGen` in analysis #551

SuperAuguste commented Jul 27, 2022 •

edited

Loading