Add basic end-to-end support for Literal types #5947

Michael0x2a · 2018-11-24T03:55:59Z

This pull request primarily adds support for string, int, bool, and None Literal types in the parsing and semantic analysis layer. It also adds a small smidge of logic to the subtyping and meet parts of the
typechecking layer -- just enough so we can start using Literal types end-to-end and write some tests.

Basically, my previous diff made a bunch of gross changes in a horizontal way to lay out a foundation; this diff makes a bunch of gross changes in a vertical way to create a proof-of-concept of the proof-of-concept.

I'll probably need to submit a few follow-up PRs cleaning up some stuff in the semantic analysis layer, but my hope is to switch focus on fleshing out the type checking layer shortly after this PR lands.

Specific changes made:

I added a new RawLiteralType synthetic type meant to represent any literal expression that appears in the earliest stages of semantic analysis. If these RawLiteralTypes appear in a Literal[...] context, they transformed into regular LiteralTypes during phase 2 of semantic analysis (turning UnboundTypes into actual types). If they appear outside of the correct context, we report an error instead.

I also added a string field to UnboundType to keep track of similar information. Basically, if we have Foo["bar"], we don't actually know immediately whether or not that "bar" is meant to be string literal vs a forward reference. (Foo could be a regular class or an alias for 'Literal').

Note: I wanted to avoid having to introduce yet another type, so looked into perhaps attaching even more metadata to UnboundType or perhaps to LiteralType. Both of those approaches ended up looking pretty messy though, so I went with this.
As a consequence, some of the syntax checking logic had to move from the parsing layer to the semantic analysis layer, and some of the existing error messages changed slightly.
The PEP, at some point, provisionally rejected having Literals contain other Literal types. For example, something like this:
```
RegularIds = Literal[1, 2, 3, 4]
AdminIds = Literal[100, 101, 102]
AllIds = Literal[RegularIds, AdminIds, 30, 32]
```
This was mainly because I thought this would be challenging to implement, but it turned out to be easier then expected -- it happened almost by accident while I was implementing support for Literal[1, 2, 3].

I can excise this part out if we think supporting this kind of thing is a fundamentally bad idea.
I also penciled in some minimal code to the subtyping and meet logic.

This pull request primarily adds support for string, int, bool, and None Literal types in the parsing and semantic analysis layer. It also adds a small smidge of logic to the subtyping and meet parts of the typechecking layer -- just enough so we can start using Literal types end-to-end and write some tests. Basically, my previous diff made a bunch of gross changes in a horizontal way to lay out a foundation; this diff makes a bunch of gross changes in a vertical way to create a proof-of-concept of the proof-of-concept. I'll probably need to submit a few follow-up PRs cleaning up some stuff in the semantic analysis layer, but my hope is to switch focus on fleshing out the type checking layer shortly after this PR lands. Specific changes made: 1. I added a new 'RawLiteralType' synthetic type meant to represent any literal expression that appears in the earliest stages of semantic analysis. If these 'RawLiteralTypes' appear in a "Literal[...]" context, they transformed into regular 'LiteralTypes' during phase 2 of semantic analysis (turning UnboundTypes into actual types). If they appear outside of the correct context, we report an error instead. I also added a string field to UnboundType to keep track of similar information. Basically, if we have `Foo["bar"]`, we don't actually know immediately whether or not that "bar" is meant to be string literal vs a forward reference. (Foo could be a regular class or an alias for 'Literal'). Note: I wanted to avoid having to introduce yet another type, so looked into perhaps attaching even more metadata to UnboundType or perhaps to LiteralType. Both of those approaches ended up looking pretty messy though, so I went with this. 2. As a consequence, some of the syntax checking logic had to move from the parsing layer to the semantic analysis layer, and some of the existing error messages changed slightly. 3. The PEP, at some point, provisionally rejected having Literals contain other Literal types. For example, something like this: RegularIds = Literal[1, 2, 3, 4] AdminIds = Literal[100, 101, 102] AllIds = Literal[RegularIds, AdminIds, 30, 32] This was mainly because I thought this would be challenging to implement, but it turned out to be easier then expected -- it happened almost by accident while I was implementing support for 'Literal[1, 2, 3]'. I can excise this part out if we think supporting this kind of thing is a fundamentally bad idea. 4. I also penciled in some minimal code to the subtyping and meet logic.

Michael0x2a · 2018-11-24T03:58:04Z

(Also: any ideas for more tests would be super-appreciated. I tried adding a bunch, but I'm pretty unfamiliar with the semantic analysis layer in general and so probably don't have a good appreciation for all the different ways things can break there. For example, I'm pretty sure this code has a few bugs related to handling forward references I still need to flush out.)

Michael0x2a · 2018-11-24T05:31:54Z

This is annoying -- it looks like this function is mangling the types of any Literals that contain a forward or backward slash depending on whether the tests are run on windows vs linux.

I could probably hack in some kind of fix to make that function ignore Literal types, but I wonder if it might be better to instead just change the sections of the code responsible for printing out the paths so we can just delete this function. We could hide this behind a secret command line flag or something if we're worried about backwards-compatibility, maybe.

I'll think about this more tomorrow and see if an easier solution comes to mind.

JelleZijlstra · 2018-11-24T15:15:45Z

For your #3, I don't have a use case in mind immediately, but it seems like a natural feature and I can easily imagine use cases. For example, perhaps we'll define _OpenBytesFlags = Literal["rb", "rwb", ...], and some hypothetical function that takes these flags plus a few more could take Literal[_OpenBytesFlags, "some_other_flag"].

Agree that it'd be better to print out paths correctly in the first place so we don't need the global replacement that you found.

This commit lets us specify we want to skip path normalization on some given test case, side-stepping the path-related errors we were seeing. Specifically, the problem was that mypy attempts to normalize paths by replacing all instances of '\' with '/' so that the output when running the tests on Windows matches the specified errors. This is what we want to most of the time, except for a few tests with Literals containing slashes. I thought about maybe changing the output of mypy in general so it always uses '/' for paths in error outputs, even on Windows: this would mean we would no longer have to do path normalization. However, I wasn't convinced this was the right thing to do: using '\' on Windows technically *is* the right thing to do, and I didn't want to complicate the codebase by forcing us to keep track of when to use os.sep vs '/'. I don't think I'll add too many of these test cases, so I decided to just go with a localized solution instead of changing mypy's error output.

ilevkivskyi

Thanks for the PR here are some comments. Only two larger ones are RawLiteralType vs RawLiteral, and error messages that are opposite to what I would expect (excessively suggestive where not necessary, and not enough detailed where they should be).

ilevkivskyi · 2018-11-26T11:39:11Z

mypy/exprtotype.py

    ListExpr, StrExpr, BytesExpr, UnicodeExpr, EllipsisExpr, CallExpr,
    get_member_expr_fullname
 )
 from mypy.fastparse import parse_type_comment
 from mypy.types import (
-    Type, UnboundType, TypeList, EllipsisType, AnyType, Optional, CallableArgument, TypeOfAny
+    Type, UnboundType, TypeList, EllipsisType, AnyType, Optional, CallableArgument, TypeOfAny,
+    LiteralType, RawLiteralType,


Two large scale questions:

Would it be better to call this just RawLiteral?

Does it need to be a subclass of Type?

It is OK to still put it in types.py, but wouldn't it be simpler to update UnboundType.args to be List[Union[Type, RawLiteral]]? I understand this may need some work, but currently we must implement several visitors and this type can sneak into type checking phase.

I can look into doing this, but I believe we currently reference UnboundType.args in approximately 60 difference places throughout our codebase. I'm worried that propagating this change won't be cheap: we'd likely end up having to change a bunch of type signatures so we can pass through Union[Type, RawLiteral] and/or add a bunch of runtime checks to filter out RawLiteral.

(We'd also need to modify both fastparse.TypeConverter and exprtotype so they return a Union[Type, RawLiteral], but I think this is a more tractable problem: we could just implement some sort of wrapper around them that reports an error and returns AnyType if the top-level type is RawLiteral)

If we're worried about RawLiterals and similar types sneaking through, I'd rather just add another layer to semantic analysis that does a final scan of all types to make sure that RawLiteral, UnboundType, and any other "bad" types are fully removed. That way, would need to scan and check each type just once, instead of in potentially multiple places whenever we manipulate UnboundType.args.

Or even better, we could add a new base type named TypeLike that Type, UnboundType, and RawLiteral all inherit from (along with any other fake types we don't want sneaking into the typechecking phase). UnboundType would then be modified so that its args contain instances of TypeLike rather then Type.

The semantic analysis layer would always work with TypeLike, and we'd modify TypeAnalyzer so it traverses over TypeLike and only ever returns Type -- this would probably let us skip the extra traversal while still giving us the guarantees we want in a typesafe way.

The last solution would probably be just as much work, if not more, as changing UnboundType.args to be of type Union[Type, RawLiteral], but I think it would be more principled and help us eliminate the root problem in a more fundamental way. (I'd also be willing to tackle making this change if you + the others think it's a good idea.)

I can look into doing this, but I believe we currently reference UnboundType.args in approximately 60 difference places throughout our codebase. I'm worried that propagating this change won't be cheap

I would recommend just trying this, if it will be soon obvious that it is too hard, then don't spend time on this. The current approach is also OK.

FWIW, changing the type of UnboundType.args seems a bit ad hoc to me. We already have some Type subclasses such as TypeList that are only meaningful during semantic analysis, and I think that adding another one doesn't make things significantly worse if we are careful. Most visitors will crash if they encounter the new type subclass, so issues will likely be found soon.

Most visitors will crash if they encounter the new type subclass, so issues will likely be found soon.

Fair point.

mypy/fastparse.py

mypy/subtypes.py

mypy/typeanal.py

test-data/unit/check-literal.test

ilevkivskyi · 2018-11-26T12:56:16Z

test-data/unit/check-literal.test

+[case testLiteralDisallowComplexNumbers]
+from typing_extensions import Literal
+a: Literal[3j]       # E: invalid type comment or annotation
+b: Literal[3j + 2]   # E: invalid type comment or annotation


I did this for floats, but I think I want to add better error messages for complex numbers and arbitrary expressions in general in a separate diff. I'll add it to the literal types todo list.

test-data/unit/check-literal.test

ilevkivskyi · 2018-11-26T12:59:47Z

test-data/unit/check-literal.test

+from typing_extensions import Literal
+# TODO: Currently, this test case causes a crash. Fix the crash then re-enable.
+# (We currently aren't handling forward references + type aliases very gracefully,
+# if at all.)


Although no 100% perfect, we definitely do, something is wrong with your implementation.

It turned out the issue was due to the call to UnionTypes.make_simplified_union -- changing back to UnionTypes.make_union resolved this issue.

(It also means we no longer remove duplicate entries from unions -- but it seems like we don't attempt doing that with normal types anyways? It's not something I think is worth worrying about either way, so eh)

test-data/unit/check-literal.test

…erals This commit: 1. Performs a bunch of cleanup suggested by Ivan 2. Adds a note clarifying that negative numbers are *allowed* 3. Adds a note reiterating that PEP 484 expects type checkers to understand enum values when performing type inference (and suggests this can be implemented by treating enums as roughly equal to the union of their types) 4. Moves "nested literals" into the "supported" category -- as I discovered in python/mypy#5947, implementing support for this is not as bad as I thought. 5. Adds an explicit warning to the "literals and generics" section. 6. Modifies some text to become more firm about disallowing 'Literal[Any]' and related constructs. 7. Deletes discussion about TypeScript's "index types" and "keyof" operator -- I mostly included that only because I got some feedback earlier to discuss TypeScript. It felt pretty shoehorned in, anyways.

JukkaL

Did a very quick pass and left a few comments.

test-data/unit/semanal-errors.test

JukkaL · 2018-11-28T14:25:29Z

test-data/unit/semanal-errors.test

@@ -877,7 +877,7 @@ A[TypeVar] # E: Invalid type "typing.TypeVar"
 from typing import TypeVar, Generic
 t = TypeVar('t')
 class A(Generic[t]): pass
-A[1] # E: Type expected within [...]
+A[1] # E: Invalid type. Try using Literal[1] instead?


For consistency, it's better not to use periods in error messages. Maybe split into an error and a note, which we use for various other suggestions? Here's one way to do it:

x.py:4: error: Invalid type "1" x.py:4: note: Suggestion: use Literal[1] instead of 1?

I decided to go with messages like:

x.py:4: error: Invalid type: try using Literal[1] instead?

This seems to match the format of several of the other error messages generated by typeanal.py (e.g. "Invalid type: ClassVar nested inside other type").

JukkaL · 2018-11-28T14:29:02Z

test-data/unit/check-literal.test

+[builtins fixtures/complex.pyi]
+[out]
+
+[case testLiteralDisallowComplexExpressions]


A somewhat meta comment: Since test runtimes are already somewhat painful, I often prefer merging multiple related test cases into a single one, especially if they are pretty straightforward. There is significant per-test case overhead. In the future it may make sense to support explicitly subdividing tests into multiple sections to make it easier to maintain longer test cases.

I merged a few of the existing tests together, though don't think I was able to for this specific one (some uses of complex numbers raise an error at the parsing stage while others raise an error at the semantic analysis stage -- errors at the parsing stage appear suppress any subsequent one).

In any case, I'll be sure to keep this in mind as I add new tests + tune the existing ones.

mypy/subtypes.py

This commit makes a variety of changes addressing feedback given regarding the test cases. It does *not* address the "maybe we can remove RawLiteraType from the Type hierarchy" feedback: I'll take a closer look at that tomorrow.

Michael0x2a · 2018-11-29T09:50:50Z

As an update, I believe I've addressed all feedback regarding error messages and tests. I haven't had a chance to seriously try making UnboundType.arg's type be Union[UnboundType, RawLiteral] yet -- I'll take a swing at that tomorrow (or we can just land this now and I can try doing that in a separate diff, don't really care which).

ilevkivskyi

There is a bunch of minor comments, feel free to merge after you consider them (it is also OK to fix them in the next PR).

mypy/server/astmerge.py

mypy/indirection.py

mypy/test/testtypes.py

mypy/typeanal.py

ilevkivskyi · 2018-12-02T17:55:54Z

mypy/typeanal.py

+            # if the user attempts use an explicit Any as a parameter, if the user
+            # is trying to use an enum value imported from a module with no type hints,
+            # giving it an an implicit type of 'Any', or if there's some other underlying
+            # problem with the parameter.


Make make it an itemized list using *? (maybe also note which TypeOfAny kind corresponds to each situation?)

I turned this comment into a list. I think I want to avoid noting which TypeOfAny corresponds to which case for now though -- I haven't actually investigated deeply which kind corresponds to which case and want to avoid documenting false info.

Probably I'll flesh this comment out more once I circle back around to handling enums and do the thing you suggest then. I left a TODO about this.

ilevkivskyi · 2018-12-02T18:03:31Z

test-data/unit/semanal-errors.test

@@ -980,7 +980,7 @@ e = TypeVar('e', int, str, x=1)   # E: Unexpected argument to TypeVar(): x
 f = TypeVar('f', (int, str), int) # E: Type expected
 g = TypeVar('g', int)             # E: TypeVar cannot have only a single constraint
 h = TypeVar('h', x=(int, str))    # E: Unexpected argument to TypeVar(): x
-i = TypeVar('i', bound=1)         # E: TypeVar 'bound' must be a type
+i = TypeVar('i', bound=1)         # E: Invalid type: try using Literal[1] instead?


It looks like there is a common trend: if there is an invalid type, it is hard to give a more detailed error "upstream". A potential solution for this may be to introduce a new TypeOfAny.invalid_type, so that caller can still choose to give an error for this Any. Could you please open an issue for this? (Maybe also link #4030 there.)

test-data/unit/check-literal.test

gvanrossum · 2018-12-03T00:34:10Z

W00t! Can't wait to start using this. Is it okay to update typeshed with the overload for open() yet, and then to delete the plugin treatment?

JelleZijlstra · 2018-12-03T00:37:10Z

Shouldn't we wait for other type checkers to gain at least some level of support for literal types?

gvanrossum · 2018-12-03T00:44:08Z

The point is apparently moot, since typing_extensions doesn't actually define Literal yet. :-( And Michael still has a long list of unchecked boxes at #5935.

But I don't think we should necessarily wait for other type checkers, since AFAIK they all have a process for "vetting" new versions of typeshed. In fact, mypy does too -- it's called "sync typeshed". :-) But mypy is a bit special since typeshed's CI tests with the latest mypy.

Michael0x2a · 2018-12-03T00:47:17Z

But I don't think we should necessarily wait for other type checkers, since AFAIK they all have a process for "vetting" new versions of typeshed. In fact, mypy does too -- it's called "sync typeshed".

Maybe we need to a way of specifically indicating certain stubs or signatures are for certain type checkers only? E.g. some sort of if MYPY: directive. Or maybe instead of casing by type checker, we case by features instead and add some sort of if SUPPORTS_LITERAL_TYPES directive? That would be a bit less ad-hoc.

…erals This commit: 1. Performs a bunch of cleanup suggested by Ivan 2. Adds a note clarifying that negative numbers are *allowed* 3. Adds a note reiterating that PEP 484 expects type checkers to understand enum values when performing type inference (and suggests this can be implemented by treating enums as roughly equal to the union of their types) 4. Moves "nested literals" into the "supported" category -- as I discovered in python/mypy#5947, implementing support for this is not as bad as I thought. 5. Adds an explicit warning to the "literals and generics" section. 6. Modifies some text to become more firm about disallowing 'Literal[Any]' and related constructs. 7. Deletes discussion about TypeScript's "index types" and "keyof" operator -- I mostly included that only because I got some feedback earlier to discuss TypeScript. It felt pretty shoehorned in, anyways.

Michael0x2a added 2 commits November 23, 2018 20:56

Correct error strings to what they *should* be

4ff4294

Remove lingering NewType import

4b62235

Michael0x2a added 3 commits November 24, 2018 20:31

Fix lint error

75891b9

Add a test case for when 'Literal' is imported with a different name

9271810

ilevkivskyi reviewed Nov 26, 2018

View reviewed changes

JukkaL reviewed Nov 28, 2018

View reviewed changes

Michael0x2a added 3 commits November 29, 2018 01:33

Respond to feedback regarding tests

c206a1b

This commit makes a variety of changes addressing feedback given regarding the test cases. It does *not* address the "maybe we can remove RawLiteraType from the Type hierarchy" feedback: I'll take a closer look at that tomorrow.

Merge branch 'master' into add-preliminary-literal-types-semantic-logic

293d835

Hoisted by my own petard

fc34c6a

ilevkivskyi approved these changes Dec 2, 2018

View reviewed changes

Michael0x2a added 2 commits December 2, 2018 15:35

Respond to next wave of code review

6aca386

Add an extra error check

73e058a

Michael0x2a mentioned this pull request Dec 2, 2018

Regression: error messages for literal expressions in invalid places are worse #5989

Closed

Michael0x2a merged commit cb0728c into python:master Dec 3, 2018

Michael0x2a deleted the add-preliminary-literal-types-semantic-logic branch December 3, 2018 00:48

Michael0x2a mentioned this pull request Dec 3, 2018

Literal types tracking issue #5935

Closed

42 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add basic end-to-end support for Literal types #5947

Add basic end-to-end support for Literal types #5947

Michael0x2a commented Nov 24, 2018 •

edited

Loading

Michael0x2a commented Nov 24, 2018 •

edited

Loading

Michael0x2a commented Nov 24, 2018

JelleZijlstra commented Nov 24, 2018

ilevkivskyi left a comment

ilevkivskyi Nov 26, 2018

Michael0x2a Nov 27, 2018

ilevkivskyi Nov 27, 2018

JukkaL Nov 29, 2018

ilevkivskyi Nov 29, 2018

ilevkivskyi Nov 26, 2018

Michael0x2a Nov 29, 2018

ilevkivskyi Nov 26, 2018

Michael0x2a Nov 29, 2018

JukkaL left a comment

JukkaL Nov 28, 2018

Michael0x2a Nov 29, 2018

JukkaL Nov 28, 2018

Michael0x2a Nov 29, 2018

Michael0x2a commented Nov 29, 2018

ilevkivskyi left a comment

ilevkivskyi Dec 2, 2018

Michael0x2a Dec 2, 2018

ilevkivskyi Dec 2, 2018

gvanrossum commented Dec 3, 2018

JelleZijlstra commented Dec 3, 2018

gvanrossum commented Dec 3, 2018

Michael0x2a commented Dec 3, 2018 •

edited

Loading

Add basic end-to-end support for Literal types #5947

Add basic end-to-end support for Literal types #5947

Conversation

Michael0x2a commented Nov 24, 2018 • edited Loading

Michael0x2a commented Nov 24, 2018 • edited Loading

Michael0x2a commented Nov 24, 2018

JelleZijlstra commented Nov 24, 2018

ilevkivskyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JukkaL left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Michael0x2a commented Nov 29, 2018

ilevkivskyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gvanrossum commented Dec 3, 2018

JelleZijlstra commented Dec 3, 2018

gvanrossum commented Dec 3, 2018

Michael0x2a commented Dec 3, 2018 • edited Loading

Michael0x2a commented Nov 24, 2018 •

edited

Loading

Michael0x2a commented Nov 24, 2018 •

edited

Loading

Michael0x2a commented Dec 3, 2018 •

edited

Loading