Add some basic subscript type inference #13562

charliermarsh · 2024-09-30T02:51:48Z

Summary

Just for tuples and strings -- the easiest cases. I think most of the rest require generic support?

crates/red_knot_python_semantic/src/types/infer.rs

charliermarsh · 2024-09-30T02:52:17Z

crates/red_knot_python_semantic/src/types/infer.rs

+                }
+            }
+            // TODO some slices on strings should resolve to type errors rather than literals
+            (Type::StringLiteral(_), _) => Type::LiteralString,


What's the general philosophy around when we should error in a case like this?

There's a few things to note here.

Issues with LiteralString

According to a strict reading of PEP 675, I don't think this is a legal place where we can infer LiteralString, since the PEP states:

As per the Rationale, we also infer LiteralString in the following cases:

[...]

Literal-preserving methods: In Appendix C, we have provided an exhaustive list of str methods that preserve the LiteralString type.

str.__getitem__ is not listed in Appendix C of the PEP, so I suppose that means that we're not allowed to infer LiteralString for x, y or z in a snippet like this:

def foo(a: Literal["foo"], b: LiteralString, c: int): x = a[c] y = b[42] z = b[c]

I'm not sure what the status of Appendix C is, though... it doesn't seem to have been copied into the relevant section of the typing spec, so perhaps it isn't canonical anymore? Or maybe that was just an omission when writing the spec? There's also the question of typeshed's stubs: typeshed doesn't include a LiteralString overload in its stub for str.__getitem__, although it includes LiteralString overloads for many of its other str methods. I'm not sure if there's a reason for that or not -- even if not, it may be desirable to be in sync with typeshed here.

For now I think I'd just delete this branch -- continuing to fallback to Unknown is ~fine for now.

Emitting diagnostics

In general, if we can't infer that an operation creates a Literal type (whether that's LiteralString or StringLiteral), we should fall back to typeshed's stubs. Here that means falling back to builtins.str.__getitem__ -- you'd do something like value_ty.to_meta_type(db).member(db, "__getitem__").call(&[value_ty, slice_ty]) to figure out what the type of the subscript expression is. Except there are some complications ;) Take a look at Type::iterate for a more complete example of how we lookup dunder methods elsewhere, try to call them, and emit a diagnostic if the dunder method eithere doesn't exist or isn't callable.

ruff/crates/red_knot_python_semantic/src/types.rs

Lines 629 to 684 in 5118166

/// Given the type of an object that is iterated over in some way,

/// return the type of objects that are yielded by that iteration.

///

/// E.g., for the following loop, given the type of `x`, infer the type of `y`:

/// ```python

/// for y in x:

/// pass

/// ```

fn iterate(self, db: &'db dyn Db) -> IterationOutcome<'db> {

if let Type::Tuple(tuple_type) = self {

return IterationOutcome::Iterable {

element_ty: UnionType::from_elements(db, &**tuple_type.elements(db)),

};

}

// `self` represents the type of the iterable;

// `__iter__` and `__next__` are both looked up on the class of the iterable:

let iterable_meta_type = self.to_meta_type(db);

let dunder_iter_method = iterable_meta_type.member(db, "__iter__");

if !dunder_iter_method.is_unbound() {

let CallOutcome::Callable {

return_ty: iterator_ty,

} = dunder_iter_method.call(db, &[])

else {

return IterationOutcome::NotIterable {

not_iterable_ty: self,

};

};

let dunder_next_method = iterator_ty.to_meta_type(db).member(db, "__next__");

return dunder_next_method

.call(db, &[])

.return_ty(db)

.map(|element_ty| IterationOutcome::Iterable { element_ty })

.unwrap_or(IterationOutcome::NotIterable {

not_iterable_ty: self,

});

}

// Although it's not considered great practice,

// classes that define `__getitem__` are also iterable,

// even if they do not define `__iter__`.

//

// TODO(Alex) this is only valid if the `__getitem__` method is annotated as

// accepting `int` or `SupportsIndex`

let dunder_get_item_method = iterable_meta_type.member(db, "__getitem__");

dunder_get_item_method

.call(db, &[])

.return_ty(db)

.map(|element_ty| IterationOutcome::Iterable { element_ty })

.unwrap_or(IterationOutcome::NotIterable {

not_iterable_ty: self,

})

}

I don't agree with the pedantic reading of PEP 675 (or typeshed) as a rationale for removing this branch. Many useful things are simply overlooked. If a more precise type inference is clearly correct in terms of the runtime semantics, and there's nothing in the typing spec or conformance suite explicitly prohibiting it, then I think we should feel totally free to go ahead and do it.

It is clearly correct wrt the runtime semantics to infer LiteralString as the result of indexing into a LiteralString or StringLiteral.

On second thought, given the security rationale for LiteralString, there may be an argument that untrusted-user-supplied indices could allow turning a "safe" LiteralString into an "unsafe" one via careful cropping. I have mixed feelings about this, because there's also type-system value to LiteralString (we know it is really a built-in str, not a subclass), and in type-system terms, the inference of LiteralString is correct. And the security issue seems fairly contrived. But that might be why inference of LiteralString on __getitem__ is not in the PEP?

It is clearly correct wrt the runtime semantics to infer LiteralString as the result of indexing into a LiteralString or StringLiteral.

I agree, but the fact that PEP-675 is so clear that the list of string methods in Appendix C is valid is meant to be "exhaustive" made me worried that there was something I was misunderstanding :-) and until recently, PEP-675 was the spec -- it's not clear to me whether Appendix C was deliberately kept out of the typing spec or not.

This branch was anyway incorrect -- StringLiteral cannot be indexed into with objects of arbitraty types, which is what this branch allowed before Charlie removed it.

So to clarify... We need to add a branch for __getitem__ (in general -- I plan to do this in a separate PR), and slices into StringLiteral (with non-int literal indexes) should be handled by that branch? Or do we want to resolve to LiteralString?

After further thought, I don't really buy the possible security argument I mentioned above for not inferring LiteralString: you can easily construct many other cases where the contents of a LiteralString, though always originating from a literal string in the source code, could be influenced in some way by untrusted user input (e.g. determining whether some concatenation does or doesn't occur.)

But that said, I don't think this is critical, and it is probably best to just rely on typeshed for this. So we could at some point consider proposing a change to typeshed with a LiteralString overload for str.__getitem__, but I don't think it's worth an extra special case in red-knot to diverge from typeshed here.

So yeah, I would say let's just add the branch that uses the return type from __getitem__, as Alex suggested. (And yeah, fine for that to be a separate follow-up if you prefer.

(I suspect, but don't know, that the reason Appendix C isn't in the type spec is that it was considered to already be reflected in the typeshed change.)

github-actions · 2024-09-30T10:53:50Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

AlexWaygood · 2024-09-30T11:17:37Z

crates/red_knot_python_semantic/src/types/infer.rs

+                }
+            }
+            // TODO some slices on strings should resolve to type errors rather than literals
+            (Type::StringLiteral(_), _) => Type::LiteralString,


There's a few things to note here.

Issues with LiteralString

According to a strict reading of PEP 675, I don't think this is a legal place where we can infer LiteralString, since the PEP states:

As per the Rationale, we also infer LiteralString in the following cases:

[...]

Literal-preserving methods: In Appendix C, we have provided an exhaustive list of str methods that preserve the LiteralString type.

str.__getitem__ is not listed in Appendix C of the PEP, so I suppose that means that we're not allowed to infer LiteralString for x, y or z in a snippet like this:

def foo(a: Literal["foo"], b: LiteralString, c: int): x = a[c] y = b[42] z = b[c]

I'm not sure what the status of Appendix C is, though... it doesn't seem to have been copied into the relevant section of the typing spec, so perhaps it isn't canonical anymore? Or maybe that was just an omission when writing the spec? There's also the question of typeshed's stubs: typeshed doesn't include a LiteralString overload in its stub for str.__getitem__, although it includes LiteralString overloads for many of its other str methods. I'm not sure if there's a reason for that or not -- even if not, it may be desirable to be in sync with typeshed here.

For now I think I'd just delete this branch -- continuing to fallback to Unknown is ~fine for now.

Emitting diagnostics

In general, if we can't infer that an operation creates a Literal type (whether that's LiteralString or StringLiteral), we should fall back to typeshed's stubs. Here that means falling back to builtins.str.__getitem__ -- you'd do something like value_ty.to_meta_type(db).member(db, "__getitem__").call(&[value_ty, slice_ty]) to figure out what the type of the subscript expression is. Except there are some complications ;) Take a look at Type::iterate for a more complete example of how we lookup dunder methods elsewhere, try to call them, and emit a diagnostic if the dunder method eithere doesn't exist or isn't callable.

ruff/crates/red_knot_python_semantic/src/types.rs

Lines 629 to 684 in 5118166

/// Given the type of an object that is iterated over in some way,

/// return the type of objects that are yielded by that iteration.

///

/// E.g., for the following loop, given the type of `x`, infer the type of `y`:

/// ```python

/// for y in x:

/// pass

/// ```

fn iterate(self, db: &'db dyn Db) -> IterationOutcome<'db> {

if let Type::Tuple(tuple_type) = self {

return IterationOutcome::Iterable {

element_ty: UnionType::from_elements(db, &**tuple_type.elements(db)),

};

}

// `self` represents the type of the iterable;

// `__iter__` and `__next__` are both looked up on the class of the iterable:

let iterable_meta_type = self.to_meta_type(db);

let dunder_iter_method = iterable_meta_type.member(db, "__iter__");

if !dunder_iter_method.is_unbound() {

let CallOutcome::Callable {

return_ty: iterator_ty,

} = dunder_iter_method.call(db, &[])

else {

return IterationOutcome::NotIterable {

not_iterable_ty: self,

};

};

let dunder_next_method = iterator_ty.to_meta_type(db).member(db, "__next__");

return dunder_next_method

.call(db, &[])

.return_ty(db)

.map(|element_ty| IterationOutcome::Iterable { element_ty })

.unwrap_or(IterationOutcome::NotIterable {

not_iterable_ty: self,

});

}

// Although it's not considered great practice,

// classes that define `__getitem__` are also iterable,

// even if they do not define `__iter__`.

//

// TODO(Alex) this is only valid if the `__getitem__` method is annotated as

// accepting `int` or `SupportsIndex`

let dunder_get_item_method = iterable_meta_type.member(db, "__getitem__");

dunder_get_item_method

.call(db, &[])

.return_ty(db)

.map(|element_ty| IterationOutcome::Iterable { element_ty })

.unwrap_or(IterationOutcome::NotIterable {

not_iterable_ty: self,

})

}

crates/red_knot_python_semantic/src/types/infer.rs

charliermarsh · 2024-09-30T14:18:22Z

Okay thanks, will re-request review once I've addressed feedback.

charliermarsh · 2024-09-30T14:20:54Z

Actually, it should be fine to merge in its current state. I'll look at doing the dunder fallback and diagnostics in a separate PR to keep this small.

crates/red_knot_python_semantic/src/types/infer.rs

charliermarsh · 2024-09-30T16:26:36Z

Added diagnostics, though unsure if there are established patterns around the rule names, granularity, etc.

AlexWaygood · 2024-09-30T16:28:15Z

Added diagnostics, though unsure if there are established patterns around the rule names, granularity, etc.

It's a bit of a wild west right now, I wouldn't worry about that too much! I'm sure we'll clean them up at some point (and make the diagnostics less stringly typed etc.). Micha's proposals he shared during the on-site go some way to addressing this.

charliermarsh · 2024-09-30T16:29:09Z

Sounds good, I figured as much :)

crates/red_knot_python_semantic/src/types/infer.rs

carljm

Looks great, thank you! One minor naming nit, and the issue of negative indexing.

crates/red_knot_python_semantic/src/types/infer.rs

carljm · 2024-09-30T17:16:04Z

crates/red_knot_python_semantic/src/types/infer.rs

+            // TODO handle variable-length tuples
+            (Type::Tuple(tuple_ty), Type::IntLiteral(int)) => {
+                let elements = tuple_ty.elements(self.db);
+                usize::try_from(int)


Python allows negative indexing, which indexes from the end of the container:

>>> t = (1, 2, 3) >>> t[-1] 3

So we should handle that here. I would say we don't have to do it in this PR, and we can just add a TODO, but I don't like that in the meantime this would mean we'd wrongly emit an out-of-bounds error.

I'll handle it here, thanks.

carljm

Nice! LGTM.

I think there would be a possible implementation where we extract a helper function for the actual indexing-or-None for each type, and then the negative case could just add .len() to the index and try that function. But there's really not that much duplication in this version, so I'm not sure that would be worth it, happy with this as-is.

charliermarsh · 2024-09-30T20:51:00Z

Thanks for the thorough reviews, much appreciated! I'll look at the __getitem__ part next.

## Summary Follow-up to #13562, to add support for "arbitrary" subscript operations.

charliermarsh requested review from carljm, MichaReiser and AlexWaygood as code owners September 30, 2024 02:51

charliermarsh added the red-knot Multi-file analysis & type inference label Sep 30, 2024

charliermarsh commented Sep 30, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved

charliermarsh commented Sep 30, 2024

View reviewed changes

AlexWaygood requested changes Sep 30, 2024

View reviewed changes

charliermarsh requested a review from AlexWaygood September 30, 2024 14:20

AlexWaygood reviewed Sep 30, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved

charliermarsh force-pushed the charlie/sub branch 2 times, most recently from ba9ee03 to 13561b1 Compare September 30, 2024 16:21

charliermarsh requested a review from AlexWaygood September 30, 2024 16:21

charliermarsh force-pushed the charlie/sub branch 2 times, most recently from 3e425a7 to ef5e252 Compare September 30, 2024 16:22

AlexWaygood reviewed Sep 30, 2024

View reviewed changes

crates/red_knot_python_semantic/src/types/infer.rs Outdated Show resolved Hide resolved

charliermarsh added 3 commits September 30, 2024 13:06

Add some basic subscript type inference

4cb6580

Small nits

70583e0

Remove OOB diagnostic

8b9710c

charliermarsh force-pushed the charlie/sub branch from ef5e252 to 8b9710c Compare September 30, 2024 17:11

carljm reviewed Sep 30, 2024

View reviewed changes

charliermarsh added 2 commits September 30, 2024 14:08

Rename diagnostic methods

ef6a3e4

Add support for negative indexes

5dbaf0f

charliermarsh requested a review from carljm September 30, 2024 18:34

carljm approved these changes Sep 30, 2024

View reviewed changes

charliermarsh merged commit c9c748a into main Sep 30, 2024
20 checks passed

charliermarsh deleted the charlie/sub branch September 30, 2024 20:50

charliermarsh mentioned this pull request Oct 1, 2024

Support __getitem__ type inference for subscripts #13579

Merged

charliermarsh added a commit that referenced this pull request Oct 1, 2024

Support __getitem__ type inference for subscripts (#13579)

0a6dc8e

## Summary Follow-up to #13562, to add support for "arbitrary" subscript operations.

MichaReiser mentioned this pull request Oct 9, 2024

[red-knot] Infer subscript expression types #13689

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add some basic subscript type inference #13562

Add some basic subscript type inference #13562

charliermarsh commented Sep 30, 2024

charliermarsh Sep 30, 2024

AlexWaygood Sep 30, 2024

carljm Sep 30, 2024 •

edited

Loading

carljm Sep 30, 2024

AlexWaygood Sep 30, 2024

charliermarsh Sep 30, 2024

carljm Sep 30, 2024 •

edited

Loading

github-actions bot commented Sep 30, 2024 •

edited

Loading

AlexWaygood Sep 30, 2024

charliermarsh commented Sep 30, 2024

charliermarsh commented Sep 30, 2024

charliermarsh commented Sep 30, 2024

AlexWaygood commented Sep 30, 2024

charliermarsh commented Sep 30, 2024

carljm left a comment

carljm Sep 30, 2024

charliermarsh Sep 30, 2024

carljm left a comment

charliermarsh commented Sep 30, 2024

	/// Given the type of an object that is iterated over in some way,
	/// return the type of objects that are yielded by that iteration.
	///
	/// E.g., for the following loop, given the type of `x`, infer the type of `y`:
	/// ```python
	/// for y in x:
	/// pass
	/// ```
	fn iterate(self, db: &'db dyn Db) -> IterationOutcome<'db> {
	if let Type::Tuple(tuple_type) = self {
	return IterationOutcome::Iterable {
	element_ty: UnionType::from_elements(db, &**tuple_type.elements(db)),
	};
	}

	// `self` represents the type of the iterable;
	// `__iter__` and `__next__` are both looked up on the class of the iterable:
	let iterable_meta_type = self.to_meta_type(db);

	let dunder_iter_method = iterable_meta_type.member(db, "__iter__");
	if !dunder_iter_method.is_unbound() {
	let CallOutcome::Callable {
	return_ty: iterator_ty,
	} = dunder_iter_method.call(db, &[])
	else {
	return IterationOutcome::NotIterable {
	not_iterable_ty: self,
	};
	};

	let dunder_next_method = iterator_ty.to_meta_type(db).member(db, "__next__");
	return dunder_next_method
	.call(db, &[])
	.return_ty(db)
	.map(\|element_ty\| IterationOutcome::Iterable { element_ty })
	.unwrap_or(IterationOutcome::NotIterable {
	not_iterable_ty: self,
	});
	}

	// Although it's not considered great practice,
	// classes that define `__getitem__` are also iterable,
	// even if they do not define `__iter__`.
	//
	// TODO(Alex) this is only valid if the `__getitem__` method is annotated as
	// accepting `int` or `SupportsIndex`
	let dunder_get_item_method = iterable_meta_type.member(db, "__getitem__");

	dunder_get_item_method
	.call(db, &[])
	.return_ty(db)
	.map(\|element_ty\| IterationOutcome::Iterable { element_ty })
	.unwrap_or(IterationOutcome::NotIterable {
	not_iterable_ty: self,
	})
	}

Add some basic subscript type inference #13562

Add some basic subscript type inference #13562

Conversation

charliermarsh commented Sep 30, 2024

Summary

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Issues with LiteralString

Emitting diagnostics

carljm Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Sep 30, 2024 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Choose a reason for hiding this comment

Issues with LiteralString

Emitting diagnostics

charliermarsh commented Sep 30, 2024

charliermarsh commented Sep 30, 2024

charliermarsh commented Sep 30, 2024

AlexWaygood commented Sep 30, 2024

charliermarsh commented Sep 30, 2024

carljm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carljm left a comment

Choose a reason for hiding this comment

charliermarsh commented Sep 30, 2024

Issues with `LiteralString`

carljm Sep 30, 2024 •

edited

Loading

carljm Sep 30, 2024 •

edited

Loading

github-actions bot commented Sep 30, 2024 •

edited

Loading

`ruff-ecosystem` results

Issues with `LiteralString`