Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Avoid panicking when hitting failures looking up AST information #13701

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rtpg
Copy link
Contributor

@rtpg rtpg commented Oct 10, 2024

This is a set of changes that allow for cargo --bin red_knot to not panic (though report many failures across the repo).

The main idea here is to avoid panicking if, instead, type inference can choose to have "less information". For example, if some entry is not found, we often can return Unknown. In a stable system this can lead to Unknown spreading in a nasty way, but here this could allow for easier post-mortem debugging (for example, querying the salsa DB ad-hoc to figure out why certain expectations didn't hold).

I tried adding tracing logs in places that felt like indications of real failures upstream (in particular partial AST traversal). I'm not well versed in tracing, but in Python land I would probably group all of these into a specific logger so that they can be treated as "road to 1.0" stuff.

In this process I changed some ID lookup methods to return Option. I believe that the goal here is that this wouldn't be needed (as the inference DB should cover "everything"), but the changeset is so small that it feels worthwhile until AST failures are inbound.

Part of the failures I found were "invalid AST but still constructed by Ruff"-style errors.

An example is x, y: int = 1, 2.

red_knot sees an annotated assignment, and so figures out the types of 1 and 2, but then trying to assign 1 and 2 to the left side (as a tuple), but an awkward interaction between annotated assignments and tuple assignments (like x, y = 1, 2) leads to x, y getting defined by 1 and 2. This hits the "shouldn't have more than one definition for an expression" problem.

Ultimately this is a syntax error to begin with, so then we probably shouldn't traverse this. But if we don't traverse it and "try our best" with the above, suddenly x and y don't have anything associated.

I think the "right thing" to do above is to try harder (for example, erase the signature in the type inference treatment). But I think this branch shows that the cost of trying to keep the system running instead doesn't have too high of a cost.

In any case, this branch is hopefully an indicator of what kinds of places in the code are triggering panics when running red_knot against the ruff repo (as artificial of an example that might be).

if self.current_assignment.is_some() {
// This can happen if we have something like x = y := 2
// which is invalid syntax but still is provided in the AST
tracing::warn!("Current assignment is unexpectedly set");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We limit the use of tracing::warn for messages that need the user's attention and they need to address (e.g. when setting up file watching for a directory failed). Here, there's not much a user can do about this and invalid syntax is very common in an LSP. Logging a warning would be very noisy for users. We should use either the debug or possibly even trace level instead.

@MichaReiser
Copy link
Member

In this process I changed some ID lookup methods to return Option. I believe that the goal here is that this wouldn't be needed (as the inference DB should cover "everything"), but the changeset is so small that it feels worthwhile until AST failures are inbound.

Could you talk me through the benefit of returning an Option in more detail? To me it's not clear how panicking at the call side is much different than panicking in the e.g. expression method directly because the call site is visible in the stack frame.

@@ -212,7 +213,9 @@ impl<'db> SemanticIndexBuilder<'db> {
let existing_definition = self
.definitions_by_node
.insert(definition_node.key(), definition);
debug_assert_eq!(existing_definition, None);
if existing_definition.is_some() {
tracing::warn!("Existing definition was unexpectedly evicted");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a bunch of warnings I added just to try and get things working.

In total there are 228 warnings (of which this one is only hit 5 times), so it's pretty surmountable all things considered

*node.target,
Expr::Attribute(_) | Expr::Subscript(_) | Expr::Name(_)
);
if valid_target {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we visit with an invalid target we get downstream errors on re-defining things. There's likely a way to visit here with an invalid target that doesn't generate issues

lookup.unwrap_or(Type::Unknown)
} else {
tracing::warn!("Couldn't find expression ID");
Type::Unknown
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would make a lot of sense for there to be something like Type::Unknown(SourcingInformation), where each callsite could embed where that specific one came from.

That way, if reveal_type gives you an Any, you could probably trace back why you got that.

@@ -96,6 +96,27 @@ fn infer_definition_types_cycle_recovery<'db>(
inference
}

/// Cycle recovery for [`infer_deferred_types()`]: for now, just [`Type::Unknown`]
/// TODO fixpoint iteration
fn infer_deferred_types_cycle_recovery<'db>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copied what infer_definition_types did here.

}
// pub(crate) fn expression_ty(&self, expression: ScopedExpressionId) -> Type<'db> {
// self.expressions[&expression]
// }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here I went with leaving only try_expression_ty, other places I Option'd the existing methods. Not sure what pattern makes sense (if any)

tracing::warn!(
"name in a type expression is always 'load' but got: '{:?}'",
name.ctx
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another place where the failure is around an invalid AST

Copy link
Contributor

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@rtpg
Copy link
Contributor Author

rtpg commented Oct 10, 2024

@MichaReiser the main point of using Option is that almost all of the call sites end up having a good answer for what to do if they couldn’t find the ID (often, in inference, say “well I guess the type is unknown”). And in cases where you don’t have a good answer, there’s still the fallback option of panicking.

The core thing with panicking is you lose all context except the backtrace, right? And stuff gets torn down etc. Meanwhile if you power through as much as possible you can have a partially filled database, and then do something like dump everything that seems missing during a debug session.

I think that pairing this with things like Type::Todo feels like a comfortable strategy up until there’s stability. And panicking due to inference failures means that big reports are “this crashes on my codebase” rather than “this code reports the type as Any”.

But maybe everything I’m seeing is all solvable by a handful of changes (in which case “fail hard” will make sure that the DB doesn’t get filled up with spurious inference failures in the future).

@carljm
Copy link
Contributor

carljm commented Oct 16, 2024

This is very useful, thanks for putting it up! I've created #13778 to track the overall issue of error resilience.

I don't think I want to move forward with landing this exact approach. I'd like to take this project in a more incremental fashion, fixing one cause of panics at a time (so we can consider the tradeoffs in each case, and look at relevant code samples), and ensuring we add a test case (not just "this Python file exists in the ruff repo", but an actual test case that runs in CI) for each case that we fix.

But regardless this PR is a great resource and reference point to have on hand for that work.

@carljm
Copy link
Contributor

carljm commented Oct 16, 2024

I'm also aware that some issues addressed in this PR might be bugs on valid syntax, not panics on invalid syntax; those should also be addressed in separate PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants