-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic end-to-end support for Literal types #5947
Add basic end-to-end support for Literal types #5947
Conversation
This pull request primarily adds support for string, int, bool, and None Literal types in the parsing and semantic analysis layer. It also adds a small smidge of logic to the subtyping and meet parts of the typechecking layer -- just enough so we can start using Literal types end-to-end and write some tests. Basically, my previous diff made a bunch of gross changes in a horizontal way to lay out a foundation; this diff makes a bunch of gross changes in a vertical way to create a proof-of-concept of the proof-of-concept. I'll probably need to submit a few follow-up PRs cleaning up some stuff in the semantic analysis layer, but my hope is to switch focus on fleshing out the type checking layer shortly after this PR lands. Specific changes made: 1. I added a new 'RawLiteralType' synthetic type meant to represent any literal expression that appears in the earliest stages of semantic analysis. If these 'RawLiteralTypes' appear in a "Literal[...]" context, they transformed into regular 'LiteralTypes' during phase 2 of semantic analysis (turning UnboundTypes into actual types). If they appear outside of the correct context, we report an error instead. I also added a string field to UnboundType to keep track of similar information. Basically, if we have `Foo["bar"]`, we don't actually know immediately whether or not that "bar" is meant to be string literal vs a forward reference. (Foo could be a regular class or an alias for 'Literal'). Note: I wanted to avoid having to introduce yet another type, so looked into perhaps attaching even more metadata to UnboundType or perhaps to LiteralType. Both of those approaches ended up looking pretty messy though, so I went with this. 2. As a consequence, some of the syntax checking logic had to move from the parsing layer to the semantic analysis layer, and some of the existing error messages changed slightly. 3. The PEP, at some point, provisionally rejected having Literals contain other Literal types. For example, something like this: RegularIds = Literal[1, 2, 3, 4] AdminIds = Literal[100, 101, 102] AllIds = Literal[RegularIds, AdminIds, 30, 32] This was mainly because I thought this would be challenging to implement, but it turned out to be easier then expected -- it happened almost by accident while I was implementing support for 'Literal[1, 2, 3]'. I can excise this part out if we think supporting this kind of thing is a fundamentally bad idea. 4. I also penciled in some minimal code to the subtyping and meet logic.
(Also: any ideas for more tests would be super-appreciated. I tried adding a bunch, but I'm pretty unfamiliar with the semantic analysis layer in general and so probably don't have a good appreciation for all the different ways things can break there. For example, I'm pretty sure this code has a few bugs related to handling forward references I still need to flush out.) |
This is annoying -- it looks like this function is mangling the types of any Literals that contain a forward or backward slash depending on whether the tests are run on windows vs linux. I could probably hack in some kind of fix to make that function ignore Literal types, but I wonder if it might be better to instead just change the sections of the code responsible for printing out the paths so we can just delete this function. We could hide this behind a secret command line flag or something if we're worried about backwards-compatibility, maybe. I'll think about this more tomorrow and see if an easier solution comes to mind. |
For your #3, I don't have a use case in mind immediately, but it seems like a natural feature and I can easily imagine use cases. For example, perhaps we'll define Agree that it'd be better to print out paths correctly in the first place so we don't need the global replacement that you found. |
This commit lets us specify we want to skip path normalization on some given test case, side-stepping the path-related errors we were seeing. Specifically, the problem was that mypy attempts to normalize paths by replacing all instances of '\' with '/' so that the output when running the tests on Windows matches the specified errors. This is what we want to most of the time, except for a few tests with Literals containing slashes. I thought about maybe changing the output of mypy in general so it always uses '/' for paths in error outputs, even on Windows: this would mean we would no longer have to do path normalization. However, I wasn't convinced this was the right thing to do: using '\' on Windows technically *is* the right thing to do, and I didn't want to complicate the codebase by forcing us to keep track of when to use os.sep vs '/'. I don't think I'll add too many of these test cases, so I decided to just go with a localized solution instead of changing mypy's error output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR here are some comments. Only two larger ones are RawLiteralType
vs RawLiteral
, and error messages that are opposite to what I would expect (excessively suggestive where not necessary, and not enough detailed where they should be).
ListExpr, StrExpr, BytesExpr, UnicodeExpr, EllipsisExpr, CallExpr, | ||
get_member_expr_fullname | ||
) | ||
from mypy.fastparse import parse_type_comment | ||
from mypy.types import ( | ||
Type, UnboundType, TypeList, EllipsisType, AnyType, Optional, CallableArgument, TypeOfAny | ||
Type, UnboundType, TypeList, EllipsisType, AnyType, Optional, CallableArgument, TypeOfAny, | ||
LiteralType, RawLiteralType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two large scale questions:
- Would it be better to call this just
RawLiteral
? - Does it need to be a subclass of
Type
?
It is OK to still put it in types.py
, but wouldn't it be simpler to update UnboundType.args
to be List[Union[Type, RawLiteral]]
? I understand this may need some work, but currently we must implement several visitors and this type can sneak into type checking phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into doing this, but I believe we currently reference UnboundType.args in approximately 60 difference places throughout our codebase. I'm worried that propagating this change won't be cheap: we'd likely end up having to change a bunch of type signatures so we can pass through Union[Type, RawLiteral]
and/or add a bunch of runtime checks to filter out RawLiteral
.
(We'd also need to modify both fastparse.TypeConverter
and exprtotype
so they return a Union[Type, RawLiteral]
, but I think this is a more tractable problem: we could just implement some sort of wrapper around them that reports an error and returns AnyType
if the top-level type is RawLiteral
)
If we're worried about RawLiterals and similar types sneaking through, I'd rather just add another layer to semantic analysis that does a final scan of all types to make sure that RawLiteral, UnboundType, and any other "bad" types are fully removed. That way, would need to scan and check each type just once, instead of in potentially multiple places whenever we manipulate UnboundType.args
.
Or even better, we could add a new base type named TypeLike
that Type
, UnboundType
, and RawLiteral
all inherit from (along with any other fake types we don't want sneaking into the typechecking phase). UnboundType
would then be modified so that its args contain instances of TypeLike
rather then Type
.
The semantic analysis layer would always work with TypeLike
, and we'd modify TypeAnalyzer
so it traverses over TypeLike
and only ever returns Type
-- this would probably let us skip the extra traversal while still giving us the guarantees we want in a typesafe way.
The last solution would probably be just as much work, if not more, as changing UnboundType.args
to be of type Union[Type, RawLiteral]
, but I think it would be more principled and help us eliminate the root problem in a more fundamental way. (I'd also be willing to tackle making this change if you + the others think it's a good idea.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into doing this, but I believe we currently reference UnboundType.args in approximately 60 difference places throughout our codebase. I'm worried that propagating this change won't be cheap
I would recommend just trying this, if it will be soon obvious that it is too hard, then don't spend time on this. The current approach is also OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, changing the type of UnboundType.args
seems a bit ad hoc to me. We already have some Type
subclasses such as TypeList
that are only meaningful during semantic analysis, and I think that adding another one doesn't make things significantly worse if we are careful. Most visitors will crash if they encounter the new type subclass, so issues will likely be found soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most visitors will crash if they encounter the new type subclass, so issues will likely be found soon.
Fair point.
[case testLiteralDisallowComplexNumbers] | ||
from typing_extensions import Literal | ||
a: Literal[3j] # E: invalid type comment or annotation | ||
b: Literal[3j + 2] # E: invalid type comment or annotation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for floats, but I think I want to add better error messages for complex numbers and arbitrary expressions in general in a separate diff. I'll add it to the literal types todo list.
test-data/unit/check-literal.test
Outdated
from typing_extensions import Literal | ||
# TODO: Currently, this test case causes a crash. Fix the crash then re-enable. | ||
# (We currently aren't handling forward references + type aliases very gracefully, | ||
# if at all.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although no 100% perfect, we definitely do, something is wrong with your implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turned out the issue was due to the call to UnionTypes.make_simplified_union
-- changing back to UnionTypes.make_union
resolved this issue.
(It also means we no longer remove duplicate entries from unions -- but it seems like we don't attempt doing that with normal types anyways? It's not something I think is worth worrying about either way, so eh)
…erals This commit: 1. Performs a bunch of cleanup suggested by Ivan 2. Adds a note clarifying that negative numbers are *allowed* 3. Adds a note reiterating that PEP 484 expects type checkers to understand enum values when performing type inference (and suggests this can be implemented by treating enums as roughly equal to the union of their types) 4. Moves "nested literals" into the "supported" category -- as I discovered in python/mypy#5947, implementing support for this is not as bad as I thought. 5. Adds an explicit warning to the "literals and generics" section. 6. Modifies some text to become more firm about disallowing 'Literal[Any]' and related constructs. 7. Deletes discussion about TypeScript's "index types" and "keyof" operator -- I mostly included that only because I got some feedback earlier to discuss TypeScript. It felt pretty shoehorned in, anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a very quick pass and left a few comments.
test-data/unit/semanal-errors.test
Outdated
@@ -877,7 +877,7 @@ A[TypeVar] # E: Invalid type "typing.TypeVar" | |||
from typing import TypeVar, Generic | |||
t = TypeVar('t') | |||
class A(Generic[t]): pass | |||
A[1] # E: Type expected within [...] | |||
A[1] # E: Invalid type. Try using Literal[1] instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency, it's better not to use periods in error messages. Maybe split into an error and a note, which we use for various other suggestions? Here's one way to do it:
x.py:4: error: Invalid type "1"
x.py:4: note: Suggestion: use Literal[1] instead of 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to go with messages like:
x.py:4: error: Invalid type: try using Literal[1] instead?
This seems to match the format of several of the other error messages generated by typeanal.py
(e.g. "Invalid type: ClassVar nested inside other type").
[builtins fixtures/complex.pyi] | ||
[out] | ||
|
||
[case testLiteralDisallowComplexExpressions] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A somewhat meta comment: Since test runtimes are already somewhat painful, I often prefer merging multiple related test cases into a single one, especially if they are pretty straightforward. There is significant per-test case overhead. In the future it may make sense to support explicitly subdividing tests into multiple sections to make it easier to maintain longer test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged a few of the existing tests together, though don't think I was able to for this specific one (some uses of complex numbers raise an error at the parsing stage while others raise an error at the semantic analysis stage -- errors at the parsing stage appear suppress any subsequent one).
In any case, I'll be sure to keep this in mind as I add new tests + tune the existing ones.
This commit makes a variety of changes addressing feedback given regarding the test cases. It does *not* address the "maybe we can remove RawLiteraType from the Type hierarchy" feedback: I'll take a closer look at that tomorrow.
As an update, I believe I've addressed all feedback regarding error messages and tests. I haven't had a chance to seriously try making |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a bunch of minor comments, feel free to merge after you consider them (it is also OK to fix them in the next PR).
mypy/typeanal.py
Outdated
# if the user attempts use an explicit Any as a parameter, if the user | ||
# is trying to use an enum value imported from a module with no type hints, | ||
# giving it an an implicit type of 'Any', or if there's some other underlying | ||
# problem with the parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make make it an itemized list using *
? (maybe also note which TypeOfAny
kind corresponds to each situation?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I turned this comment into a list. I think I want to avoid noting which TypeOfAny corresponds to which case for now though -- I haven't actually investigated deeply which kind corresponds to which case and want to avoid documenting false info.
Probably I'll flesh this comment out more once I circle back around to handling enums and do the thing you suggest then. I left a TODO about this.
@@ -980,7 +980,7 @@ e = TypeVar('e', int, str, x=1) # E: Unexpected argument to TypeVar(): x | |||
f = TypeVar('f', (int, str), int) # E: Type expected | |||
g = TypeVar('g', int) # E: TypeVar cannot have only a single constraint | |||
h = TypeVar('h', x=(int, str)) # E: Unexpected argument to TypeVar(): x | |||
i = TypeVar('i', bound=1) # E: TypeVar 'bound' must be a type | |||
i = TypeVar('i', bound=1) # E: Invalid type: try using Literal[1] instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like there is a common trend: if there is an invalid type, it is hard to give a more detailed error "upstream". A potential solution for this may be to introduce a new TypeOfAny.invalid_type
, so that caller can still choose to give an error for this Any
. Could you please open an issue for this? (Maybe also link #4030 there.)
W00t! Can't wait to start using this. Is it okay to update typeshed with the overload for |
Shouldn't we wait for other type checkers to gain at least some level of support for literal types? |
The point is apparently moot, since typing_extensions doesn't actually define Literal yet. :-( And Michael still has a long list of unchecked boxes at #5935. But I don't think we should necessarily wait for other type checkers, since AFAIK they all have a process for "vetting" new versions of typeshed. In fact, mypy does too -- it's called "sync typeshed". :-) But mypy is a bit special since typeshed's CI tests with the latest mypy. |
Maybe we need to a way of specifically indicating certain stubs or signatures are for certain type checkers only? E.g. some sort of |
…erals This commit: 1. Performs a bunch of cleanup suggested by Ivan 2. Adds a note clarifying that negative numbers are *allowed* 3. Adds a note reiterating that PEP 484 expects type checkers to understand enum values when performing type inference (and suggests this can be implemented by treating enums as roughly equal to the union of their types) 4. Moves "nested literals" into the "supported" category -- as I discovered in python/mypy#5947, implementing support for this is not as bad as I thought. 5. Adds an explicit warning to the "literals and generics" section. 6. Modifies some text to become more firm about disallowing 'Literal[Any]' and related constructs. 7. Deletes discussion about TypeScript's "index types" and "keyof" operator -- I mostly included that only because I got some feedback earlier to discuss TypeScript. It felt pretty shoehorned in, anyways.
This pull request primarily adds support for string, int, bool, and None Literal types in the parsing and semantic analysis layer. It also adds a small smidge of logic to the subtyping and meet parts of the
typechecking layer -- just enough so we can start using Literal types end-to-end and write some tests.
Basically, my previous diff made a bunch of gross changes in a horizontal way to lay out a foundation; this diff makes a bunch of gross changes in a vertical way to create a proof-of-concept of the proof-of-concept.
I'll probably need to submit a few follow-up PRs cleaning up some stuff in the semantic analysis layer, but my hope is to switch focus on fleshing out the type checking layer shortly after this PR lands.
Specific changes made:
I added a new
RawLiteralType
synthetic type meant to represent any literal expression that appears in the earliest stages of semantic analysis. If theseRawLiteralTypes
appear in aLiteral[...]
context, they transformed into regularLiteralTypes
during phase 2 of semantic analysis (turning UnboundTypes into actual types). If they appear outside of the correct context, we report an error instead.I also added a string field to UnboundType to keep track of similar information. Basically, if we have
Foo["bar"]
, we don't actually know immediately whether or not that "bar" is meant to be string literal vs a forward reference. (Foo could be a regular class or an alias for 'Literal').Note: I wanted to avoid having to introduce yet another type, so looked into perhaps attaching even more metadata to UnboundType or perhaps to LiteralType. Both of those approaches ended up looking pretty messy though, so I went with this.
As a consequence, some of the syntax checking logic had to move from the parsing layer to the semantic analysis layer, and some of the existing error messages changed slightly.
The PEP, at some point, provisionally rejected having Literals contain other Literal types. For example, something like this:
This was mainly because I thought this would be challenging to implement, but it turned out to be easier then expected -- it happened almost by accident while I was implementing support for
Literal[1, 2, 3]
.I can excise this part out if we think supporting this kind of thing is a fundamentally bad idea.
I also penciled in some minimal code to the subtyping and meet logic.