Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] support invalid syntax without panics #13778

Closed
carljm opened this issue Oct 16, 2024 · 11 comments
Closed

[red-knot] support invalid syntax without panics #13778

carljm opened this issue Oct 16, 2024 · 11 comments
Assignees
Labels
red-knot Multi-file analysis & type inference
Milestone

Comments

@carljm
Copy link
Contributor

carljm commented Oct 16, 2024

The ruff parser is error-resilient and will generate a best-effort AST for any input, but red-knot currently assumes a valid AST in some places, and this can cause panics if it's run over an invalid AST.

We should do best-effort type-checking for invalid ASTs, and never panic.

The trick is to do this while, as much as feasible, still maintaining some useful internal invariants, and failing fast (panic is good in this case!) if those invariants are violated, rather than allowing programming errors to pass silently and result in type-checking bugs.

@carljm
Copy link
Contributor Author

carljm commented Oct 16, 2024

#13701 is a useful reference on what changes were needed as of a few days ago to avoid panics on the Python files present in the ruff repo. At least some of those are due to invalid syntax.

@carljm carljm added this to the Red Knot 2024 milestone Nov 7, 2024
@carljm
Copy link
Contributor Author

carljm commented Nov 7, 2024

In addition to "not panic", we also want to avoid useless or confusing diagnostics.

Inferring useful types in the face of syntax errors is less of a priority.

@sharkdp
Copy link
Contributor

sharkdp commented Nov 14, 2024

We can now run red knot on all invalid-syntax examples in ruffs repository without panicking. We can also run it on a fuzzing corpus with ~4000 invalid-syntax files (#13448). The next step could be to look into direct fuzzing (#14157), or to look for other corpuses (corpi? corpora?) with invalid syntax examples.

@MichaReiser
Copy link
Member

@sharkdp could we extend our corpus tests to run over all ruff tests to ensure we catch regressions early?

@sharkdp
Copy link
Contributor

sharkdp commented Nov 14, 2024

@sharkdp could we extend our corpus tests to run over all ruff tests to ensure we catch regressions early?

Yes — I'll do that.

@MichaReiser
Copy link
Member

One specific case that we should handle is that we don't put names synthesized by the parser during recovery into the symbol table (or try to look them up). There's an AST method that allows testing for whether the name is valid.

@AlexWaygood
Copy link
Member

We can now run red knot on all invalid-syntax examples in ruffs repository without panicking. We can also run it on a fuzzing corpus with ~4000 invalid-syntax files (#13448). The next step could be to look into direct fuzzing (#14157), or to look for other corpuses (corpi? corpora?) with invalid syntax examples.

@sharkdp just to note -- the unique thing about the fuzzer in #14157 is that the randomly generated source-code files it produces are guaranteed to always be valid Python syntax (at least, assuming there isn't a bug in the fuzzer). That's an extremely useful property of the fuzzer in some ways, but it does mean that the fuzzer is somewhat useless for testing how good red-knot is at handling invalid syntax. We'll have to look into other fuzzing libraries if we want to have some automated fuzzing for that as well.

@dhruvmanila
Copy link
Member

I've been exploring ways to create panics on source code with syntax errors but I haven't found any that requires any source code that's raising syntax errors which isn't already documented which is #14672 and

// related to circular references in type aliases (salsa cycle panic):
("crates/ruff_python_parser/resources/inline/err/type_alias_invalid_value_expr.py", true, true),
// related to circular references in f-string annotations (invalid syntax)
("crates/ruff_linter/resources/test/fixtures/pyflakes/F821_15.py", true, true),
("crates/ruff_linter/resources/test/fixtures/pyflakes/F821_14.py", false, true),
.

I've also ran the fuzzer which produces source code with invalid syntax (#14678) multiple times and it always finds the one as linked above. The corpus is initiated with most of the Python files in the Ruff code base and all Python files in the CPython code base. The fuzzer runs on them and then starts shuffling the bytes around producing new source code which is then filtered to avoid the ones that doesn't produce syntax errors as per our parser. This has an obvious limitation that the syntax errors that's raised by the compiler

I looked at #13778 (comment) as well but I've been unable to create a source code that could panic. I think that might be because of lack of type inference for positions where that could occur like type parameters, function parameters in combination with the AST key that includes the node range. So, technically even though that behavior might be incorrect, it doesn't create a panic in red knot, at least not today.

Additionally, I also extracted all the examples from the syntax error test file in CPython code base (https://github.com/python/cpython/blob/bf21e2160d1dc6869fb230b90a23ab030835395b/Lib/test/test_syntax.py) as it's a doctest and ran red knot on top of it and got no panics.

There are certain changes that can be done but I've been unable to create a source code related to those changes that would make red knot panic. I'm going to list them down here for posterity:

  • Currently, for an invalid Identifier, the ExprAttribute doesn't set an Invalid expression context. We use this information to avoid recording a use of a symbol and / or creating the definition. Carl did mention previously that an attribute expression will be handled in a different manner and wouldn't create a definition.
  • The parser recovers from a syntax error where a keyword is used instead of a name by creating a name expression with the keyword: https://play.ruff.rs/cb616a58-855a-431e-8ee3-3ecd6320e45d. The downstream tools have no way of knowing whether this identifier was recovered from this syntax error. Regardless, this shouldn't cause any panic.
  • Avoid adding the symbol for an ExprName which has an invalid expression context as that indicates a syntax error from which the parser recovered

That said, I think we should periodically run the fuzzer and iteratively fix any errors that the model encounters. The model is still a work in progress so there are missing pieces which might prove necessary to create these panics if any.

@MichaReiser
Copy link
Member

Thanks for writing this up.

Do you think it would be possible and useful to have mdtests for invalid identifiers and keywords used as identifiers?

sharkdp added a commit that referenced this issue Dec 2, 2024
## Summary

Seeing the fuzzing results from @dhruvmanila in #13778, I think we can
re-enable these tests. We also had one regression that would have been
caught by these tests, so there is some value in having them enabled.
@dhruvmanila
Copy link
Member

Do you think it would be possible and useful to have mdtests for invalid identifiers and keywords used as identifiers?

I think they should be covered by the invalid parser tests:

Although, I think they don't have complete coverage especially around "keywords used as identifiers", I'll add them.

@MichaReiser
Copy link
Member

That makes sense, thanks.

We could also assert that red knot doesn't generate useless diagnostics for missing identifiers. For example. Red Knot should not emit a undefined-variable diagnostic for

5 + 

dhruvmanila added a commit that referenced this issue Dec 5, 2024
## Summary

This is related to #13778, more specifically
#13778 (comment).

This PR adds various test cases where a keyword is being where an
identifier is expected. The tests are to make sure that red knot doesn't
panic, raises the syntax error and the identifier is added to the symbol
table. The final part allows editor related features like renaming the
symbol.
@carljm carljm closed this as completed Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

No branches or pull requests

5 participants