Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Use the right scope when considering class bases #13766

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

rtpg
Copy link
Contributor

@rtpg rtpg commented Oct 16, 2024

Summary

PEP 695 Generics introduce a scope inside a class statement's arguments and keywords.

class C[T](A[T]):  # the T in A[T] is not from the global scope but from a type-param-specfic scope
   ...

When doing inference on the class bases, we currently have been doing base class expression lookups in the global scope. Not an issue without generics (since a scope is only created when generics are present).

This change instead makes sure to stop the global scope inference from going into expressions within this sub-scope. Since there is a separate scope, check_file and friends will trigger inference on these expressions still.

Another change as a part of this is making sure that ClassType looks up its bases in the right scope. I do not believe the way I do the lookup in this change is the most precise way, and would appreciate comments on that fragment.

Test Plan

cargo test --package red_knot_python_semantic generics will run the markdown test that previously would panic due to scope lookup issues

 In particular, introducing PEP 695 generics introduces a new scope
@rtpg rtpg changed the title Use the right scope when considering class bases [red-knot] Use the right scope when considering class bases Oct 16, 2024
match self.expressions_map.get(key) {
Some(result) => *result,
None => {
panic!("Could not find expression ID for {key:?}");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very helpful because debug output on ExpressionNodeKey includes the range, which is usually enough in context to identify what text fragment is causing issues.

I put this in under the idea that this doesn't cost much but I might be wrong

// we need to use the type param'd scope
let type_param_scope = index
.node_scope(NodeWithScopeRef::ClassTypeParameters(class_stmt_node))
.to_scope_id(db, file);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me what the cost of the semantic_index lookup would be in practice. I just want to know what scope I need to be looking at. While ClassType does have body_scope, that is a separate scope from the type parameter scope.

Would it make sense to add bases_scope to ClassType here to avoid needing this index build up?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that we could reuse the self.file and self.index because all the expressions in the class's base should be in the same file and from the same index.

However, we probably don't want to call infer_scope_types because inferring the class's bases then suddenly becomes dependent of all types in that other scope, resulting in poor incremental caching.

I'm not quiet sure but maybe definition_expression_ty is a better fit?

Copy link
Contributor

@carljm carljm Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can use definition_expression_ty because, in a type params scope, the class bases aren't "part of a definition" (the class definition is in the outer scope.)

We could mark the class bases as a "standalone expression" (or set of them) so we can query for their types directly, but I don't think this is worth the extra Salsa tracked structs. The "type params scope" is automatically generated and implicit and contains nothing but definitions for type parameters, and the class bases/keywords. There isn't "all types in that scope" to be worried about, because there are no other types in that scope. So I think infer_scope_types is the best option here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that we could reuse the self.file and self.index

We aren't in TypeInferenceBuilder here, so those aren't available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think perhaps the simpler way to get the type params scope is that it must always be the parent scope of the body scope (for a class with type params.) We should be able to add a ScopeId::parent method (it would need to use the symbol table, but not the semantic index.)

I'm also not opposed to adding bases_scope (or optional type_params_scope) to ClassType.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also use index.scopes_by_node when the query already depends on the index anyway (which parent would as well?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ScopeId::parent would rely on just the symbol table, which is part of the index, but in principle could be backdated independently if it didn't change? Unlikely though; I don't think depending on the index here is a problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up storing the bases scope in ClassType (which gets built up in the inference builder so we have access to the index right there). Definitely feels right after having written it up.

} else {
definition_expression_ty(db, definition, base_expr)
}
};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mapper is pulled out here because if we build up separate closures in different if branches, we hit the "no two closures are alike" issue. Hence the type_scope_info Option song and dance.

Copy link
Contributor

github-actions bot commented Oct 16, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+1 -0 violations, +0 -0 fixes in 1 projects; 53 projects unchanged)

pandas-dev/pandas (+1 -0 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview

+ pandas/core/internals/blocks.py:1664:9: F841 Local variable `icond` is assigned to but never used

Changes by rule (1 rules affected)

code total + violation - violation + fix - fix
F841 1 1 0 0 0

@MichaReiser MichaReiser added the red-knot Multi-file analysis & type inference label Oct 16, 2024
Comment on lines 37 to 43
let key = &key.into();
match self.expressions_map.get(key) {
Some(result) => *result,
None => {
panic!("Could not find expression ID for {key:?}");
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this. You could do

Suggested change
let key = &key.into();
match self.expressions_map.get(key) {
Some(result) => *result,
None => {
panic!("Could not find expression ID for {key:?}");
}
}
let key = ;
match self.expressions_map.get(&key.into()).unwrap_or_else(|| {
panic!("Could not find expression ID for {key:?}");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with this, thanks for the suggestion!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, went with that

// we need to use the type param'd scope
let type_param_scope = index
.node_scope(NodeWithScopeRef::ClassTypeParameters(class_stmt_node))
.to_scope_id(db, file);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect that we could reuse the self.file and self.index because all the expressions in the class's base should be in the same file and from the same index.

However, we probably don't want to call infer_scope_types because inferring the class's bases then suddenly becomes dependent of all types in that other scope, resulting in poor incremental caching.

I'm not quiet sure but maybe definition_expression_ty is a better fit?

Comment on lines 1435 to 1437
if has_type_params {
// Safety: we calculated the inference if has_type_params
let (type_param_scope, inferences) = type_scope_info.unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use has_type_params and then type_scope_info.unwrap() rather than just if let Some(...) = type_scope_info { ? They rely on the same invariants, but the latter seems simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... your suggestion is cleaner and this code isn't big, but I dislike the hiding of the "the branching is due to type parameter-ness".

I believe my concerns can be resolved with a more well thought out variable name for the scope info. Will ruminate on it overnight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this up to bases_scope_info. It's still an Option (instead of doing something like checking if bases_scope is equal to some other scope to determine specialized-ness), but good enough for me to feel comfortable with the cleaner if let solution.

Copy link
Contributor

@carljm carljm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, thank you!! Comments aren't anything major, but there are a few things to address, so I'll request changes for now.

reveal_type(MyBox.box_model_number) # revealed: Literal[695]
```

## Subclassing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to also add a test that in a stub file (pyi) we can have a generic class that refers to itself in its own bases. We have a similar test already for non-generic classes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

if let Some((bases_scope, inferences)) = bases_scope_info {
// when we have a specialized scope, we'll look up the inference
// within that scope
inferences.expression_ty(base_expr.scoped_ast_id(db, bases_scope))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We could consider to just call base_expr.ty(...) over repeating the entire ceremony of storing the base specialized scope, fingering the scope, and calling expression_ty. However, it would be slightly less efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going down that path let me remove all the base scope song and dance, at (by my read) not much cost. The resulting if statement is probably very hard to look at though now, since there's no longer quite a clear reason for why I am calling .ty (even with the comment). But simpler is simpler, thanks for the suggestion

crates/red_knot_python_semantic/src/types.rs Outdated Show resolved Hide resolved
Comment on lines 876 to 880
let bases_specialized_scope = type_params.as_ref().map(|_| {
self.index
.node_scope(NodeWithScopeRef::ClassTypeParameters(class))
.to_scope_id(self.db, self.file)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I suggest making this a method on ClassType. It doesn't seem worth storing (it's very cheap to recompute). Unless we think that it helps with reducing the blast radius of file changes (e.g. so that calling (class A).bases where class A is defined in foo.py from bar.py doesn't get invalidated when foo.py changes). However, I do think that we might want to start caching bases` once we add MRO resolution (which seems expensive)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to your suggestion on using ty I don't even need this anymore, fortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants