[red-knot] support invalid syntax without panics #13778

carljm · 2024-10-16T16:03:52Z

The ruff parser is error-resilient and will generate a best-effort AST for any input, but red-knot currently assumes a valid AST in some places, and this can cause panics if it's run over an invalid AST.

We should do best-effort type-checking for invalid ASTs, and never panic.

The trick is to do this while, as much as feasible, still maintaining some useful internal invariants, and failing fast (panic is good in this case!) if those invariants are violated, rather than allowing programming errors to pass silently and result in type-checking bugs.

carljm · 2024-10-16T16:11:44Z

#13701 is a useful reference on what changes were needed as of a few days ago to avoid panics on the Python files present in the ruff repo. At least some of those are due to invalid syntax.

carljm · 2024-11-07T16:23:19Z

In addition to "not panic", we also want to avoid useless or confusing diagnostics.

Inferring useful types in the face of syntax errors is less of a priority.

sharkdp · 2024-11-14T14:51:49Z

We can now run red knot on all invalid-syntax examples in ruffs repository without panicking. We can also run it on a fuzzing corpus with ~4000 invalid-syntax files (#13448). The next step could be to look into direct fuzzing (#14157), or to look for other corpuses (corpi? corpora?) with invalid syntax examples.

MichaReiser · 2024-11-14T14:55:20Z

@sharkdp could we extend our corpus tests to run over all ruff tests to ensure we catch regressions early?

sharkdp · 2024-11-14T15:28:18Z

@sharkdp could we extend our corpus tests to run over all ruff tests to ensure we catch regressions early?

Yes — I'll do that.

MichaReiser · 2024-11-15T16:32:29Z

One specific case that we should handle is that we don't put names synthesized by the parser during recovery into the symbol table (or try to look them up). There's an AST method that allows testing for whether the name is valid.

AlexWaygood · 2024-11-25T12:00:42Z

We can now run red knot on all invalid-syntax examples in ruffs repository without panicking. We can also run it on a fuzzing corpus with ~4000 invalid-syntax files (#13448). The next step could be to look into direct fuzzing (#14157), or to look for other corpuses (corpi? corpora?) with invalid syntax examples.

@sharkdp just to note -- the unique thing about the fuzzer in #14157 is that the randomly generated source-code files it produces are guaranteed to always be valid Python syntax (at least, assuming there isn't a bug in the fuzzer). That's an extremely useful property of the fuzzer in some ways, but it does mean that the fuzzer is somewhat useless for testing how good red-knot is at handling invalid syntax. We'll have to look into other fuzzing libraries if we want to have some automated fuzzing for that as well.

dhruvmanila · 2024-12-02T12:40:27Z

I've been exploring ways to create panics on source code with syntax errors but I haven't found any that requires any source code that's raising syntax errors which isn't already documented which is #14672 and

ruff/crates/red_knot_workspace/tests/check.rs

Lines 273 to 277 in b63c2e1

    
           // related to circular references in type aliases (salsa cycle panic): 
        
           ("crates/ruff_python_parser/resources/inline/err/type_alias_invalid_value_expr.py", true, true), 
        
           // related to circular references in f-string annotations (invalid syntax) 
        
           ("crates/ruff_linter/resources/test/fixtures/pyflakes/F821_15.py", true, true), 
        
           ("crates/ruff_linter/resources/test/fixtures/pyflakes/F821_14.py", false, true),

.

I've also ran the fuzzer which produces source code with invalid syntax (#14678) multiple times and it always finds the one as linked above. The corpus is initiated with most of the Python files in the Ruff code base and all Python files in the CPython code base. The fuzzer runs on them and then starts shuffling the bytes around producing new source code which is then filtered to avoid the ones that doesn't produce syntax errors as per our parser. This has an obvious limitation that the syntax errors that's raised by the compiler

I looked at #13778 (comment) as well but I've been unable to create a source code that could panic. I think that might be because of lack of type inference for positions where that could occur like type parameters, function parameters in combination with the AST key that includes the node range. So, technically even though that behavior might be incorrect, it doesn't create a panic in red knot, at least not today.

Additionally, I also extracted all the examples from the syntax error test file in CPython code base (https://github.com/python/cpython/blob/bf21e2160d1dc6869fb230b90a23ab030835395b/Lib/test/test_syntax.py) as it's a doctest and ran red knot on top of it and got no panics.

There are certain changes that can be done but I've been unable to create a source code related to those changes that would make red knot panic. I'm going to list them down here for posterity:

Currently, for an invalid Identifier, the ExprAttribute doesn't set an Invalid expression context. We use this information to avoid recording a use of a symbol and / or creating the definition. Carl did mention previously that an attribute expression will be handled in a different manner and wouldn't create a definition.
The parser recovers from a syntax error where a keyword is used instead of a name by creating a name expression with the keyword: https://play.ruff.rs/cb616a58-855a-431e-8ee3-3ecd6320e45d. The downstream tools have no way of knowing whether this identifier was recovered from this syntax error. Regardless, this shouldn't cause any panic.
Avoid adding the symbol for an ExprName which has an invalid expression context as that indicates a syntax error from which the parser recovered

That said, I think we should periodically run the fuzzer and iteratively fix any errors that the model encounters. The model is still a work in progress so there are missing pieces which might prove necessary to create these panics if any.

MichaReiser · 2024-12-02T16:39:11Z

Thanks for writing this up.

Do you think it would be possible and useful to have mdtests for invalid identifiers and keywords used as identifiers?

@dhruvmanila

## Summary Seeing the fuzzing results from @dhruvmanila in #13778, I think we can re-enable these tests. We also had one regression that would have been caught by these tests, so there is some value in having them enabled.

dhruvmanila · 2024-12-03T04:59:32Z

Do you think it would be possible and useful to have mdtests for invalid identifiers and keywords used as identifiers?

I think they should be covered by the invalid parser tests:

Keyword assignment target: https://github.com/astral-sh/ruff/blob/5137fcc9c81610f687b6cb45413ef83c2c5eea73/crates/ruff_python_parser/resources/inline/err/assign_stmt_keyword_target.py
Invalid assignment target: https://github.com/astral-sh/ruff/blob/5137fcc9c81610f687b6cb45413ef83c2c5eea73/crates/ruff_python_parser/resources/invalid/statements/invalid_assignment_targets.py
Invalid augmented assignment target: https://github.com/astral-sh/ruff/blob/5137fcc9c81610f687b6cb45413ef83c2c5eea73/crates/ruff_python_parser/resources/invalid/statements/invalid_augmented_assignment_target.py

Although, I think they don't have complete coverage especially around "keywords used as identifiers", I'll add them.

MichaReiser · 2024-12-03T07:20:52Z

That makes sense, thanks.

We could also assert that red knot doesn't generate useless diagnostics for missing identifiers. For example. Red Knot should not emit a undefined-variable diagnostic for

5 +

## Summary This is related to #13778, more specifically #13778 (comment). This PR adds various test cases where a keyword is being where an identifier is expected. The tests are to make sure that red knot doesn't panic, raises the syntax error and the identifier is added to the symbol table. The final part allows editor related features like renaming the symbol.

carljm added the red-knot Multi-file analysis & type inference label Oct 16, 2024

This was referenced Oct 16, 2024

[red-knot] red-knot panics by looking at certain Python files within ruff's repo #13710

Closed

[red-knot] Avoid panicking when hitting failures looking up AST information #13701

Closed

carljm added this to the Red Knot 2024 milestone Nov 7, 2024

carljm assigned dhruvmanila Nov 7, 2024

dhruvmanila mentioned this issue Nov 13, 2024

Red Knot panics when analyzing black #13478

Closed

4 tasks

sharkdp mentioned this issue Nov 22, 2024

[red-knot] Panics when 'break' is used in nested definitions inside loops #14342

Closed

sharkdp mentioned this issue Dec 2, 2024

[red-knot] Re-enable linter corpus tests #14736

Merged

This was referenced Dec 3, 2024

Use Invalid context when keyword used as identifier #14753

Closed

Add tests for "keyword as identifier" syntax errors #14754

Merged

carljm closed this as completed Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[red-knot] support invalid syntax without panics #13778

[red-knot] support invalid syntax without panics #13778

carljm commented Oct 16, 2024 •

edited

Loading

carljm commented Oct 16, 2024

carljm commented Nov 7, 2024

sharkdp commented Nov 14, 2024

MichaReiser commented Nov 14, 2024

sharkdp commented Nov 14, 2024

MichaReiser commented Nov 15, 2024

AlexWaygood commented Nov 25, 2024

dhruvmanila commented Dec 2, 2024

MichaReiser commented Dec 2, 2024

dhruvmanila commented Dec 3, 2024

MichaReiser commented Dec 3, 2024

[red-knot] support invalid syntax without panics #13778

[red-knot] support invalid syntax without panics #13778

Comments

carljm commented Oct 16, 2024 • edited Loading

carljm commented Oct 16, 2024

carljm commented Nov 7, 2024

sharkdp commented Nov 14, 2024

MichaReiser commented Nov 14, 2024

sharkdp commented Nov 14, 2024

MichaReiser commented Nov 15, 2024

AlexWaygood commented Nov 25, 2024

dhruvmanila commented Dec 2, 2024

MichaReiser commented Dec 2, 2024

dhruvmanila commented Dec 3, 2024

MichaReiser commented Dec 3, 2024

carljm commented Oct 16, 2024 •

edited

Loading