Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic end-to-end support for Literal types #5947

Conversation

Michael0x2a
Copy link
Collaborator

@Michael0x2a Michael0x2a commented Nov 24, 2018

This pull request primarily adds support for string, int, bool, and None Literal types in the parsing and semantic analysis layer. It also adds a small smidge of logic to the subtyping and meet parts of the
typechecking layer -- just enough so we can start using Literal types end-to-end and write some tests.

Basically, my previous diff made a bunch of gross changes in a horizontal way to lay out a foundation; this diff makes a bunch of gross changes in a vertical way to create a proof-of-concept of the proof-of-concept.

I'll probably need to submit a few follow-up PRs cleaning up some stuff in the semantic analysis layer, but my hope is to switch focus on fleshing out the type checking layer shortly after this PR lands.

Specific changes made:

  1. I added a new RawLiteralType synthetic type meant to represent any literal expression that appears in the earliest stages of semantic analysis. If these RawLiteralTypes appear in a Literal[...] context, they transformed into regular LiteralTypes during phase 2 of semantic analysis (turning UnboundTypes into actual types). If they appear outside of the correct context, we report an error instead.

    I also added a string field to UnboundType to keep track of similar information. Basically, if we have Foo["bar"], we don't actually know immediately whether or not that "bar" is meant to be string literal vs a forward reference. (Foo could be a regular class or an alias for 'Literal').

    Note: I wanted to avoid having to introduce yet another type, so looked into perhaps attaching even more metadata to UnboundType or perhaps to LiteralType. Both of those approaches ended up looking pretty messy though, so I went with this.

  2. As a consequence, some of the syntax checking logic had to move from the parsing layer to the semantic analysis layer, and some of the existing error messages changed slightly.

  3. The PEP, at some point, provisionally rejected having Literals contain other Literal types. For example, something like this:

    RegularIds = Literal[1, 2, 3, 4]
    AdminIds = Literal[100, 101, 102]
    AllIds = Literal[RegularIds, AdminIds, 30, 32]
    

    This was mainly because I thought this would be challenging to implement, but it turned out to be easier then expected -- it happened almost by accident while I was implementing support for Literal[1, 2, 3].

    I can excise this part out if we think supporting this kind of thing is a fundamentally bad idea.

  4. I also penciled in some minimal code to the subtyping and meet logic.

This pull request primarily adds support for string, int, bool, and
None Literal types in the parsing and semantic analysis layer. It also
adds a small smidge of logic to the subtyping and meet parts of the
typechecking layer -- just enough so we can start using Literal types
end-to-end and write some tests.

Basically, my previous diff made a bunch of gross changes in a
horizontal way to lay out a foundation; this diff makes a bunch of
gross changes in a vertical way to create a proof-of-concept of the
proof-of-concept.

I'll probably need to submit a few follow-up PRs cleaning up some
stuff in the semantic analysis layer, but my hope is to switch focus
on fleshing out the type checking layer shortly after this PR lands.

Specific changes made:

1. I added a new 'RawLiteralType' synthetic type meant to represent any
   literal expression that appears in the earliest stages of semantic
   analysis. If these 'RawLiteralTypes' appear in a "Literal[...]" context,
   they transformed into regular 'LiteralTypes' during phase 2 of semantic
   analysis (turning UnboundTypes into actual types). If they appear
   outside of the correct context, we report an error instead.

   I also added a string field to UnboundType to keep track of similar
   information. Basically, if we have `Foo["bar"]`, we don't actually
   know immediately whether or not that "bar" is meant to be string
   literal vs a forward reference. (Foo could be a regular class or
   an alias for 'Literal').

   Note: I wanted to avoid having to introduce yet another type, so
   looked into perhaps attaching even more metadata to UnboundType
   or perhaps to LiteralType. Both of those approaches ended up looking
   pretty messy though, so I went with this.

2. As a consequence, some of the syntax checking logic had to move
   from the parsing layer to the semantic analysis layer, and some of
   the existing error messages changed slightly.

3. The PEP, at some point, provisionally rejected having Literals
   contain other Literal types. For example, something like this:

       RegularIds = Literal[1, 2, 3, 4]
       AdminIds = Literal[100, 101, 102]
       AllIds = Literal[RegularIds, AdminIds, 30, 32]

   This was mainly because I thought this would be challenging to
   implement, but it turned out to be easier then expected -- it
   happened almost by accident while I was implementing support
   for 'Literal[1, 2, 3]'.

   I can excise this part out if we think supporting this kind
   of thing is a fundamentally bad idea.

4. I also penciled in some minimal code to the subtyping and meet logic.
@Michael0x2a
Copy link
Collaborator Author

Michael0x2a commented Nov 24, 2018

(Also: any ideas for more tests would be super-appreciated. I tried adding a bunch, but I'm pretty unfamiliar with the semantic analysis layer in general and so probably don't have a good appreciation for all the different ways things can break there. For example, I'm pretty sure this code has a few bugs related to handling forward references I still need to flush out.)

@Michael0x2a
Copy link
Collaborator Author

This is annoying -- it looks like this function is mangling the types of any Literals that contain a forward or backward slash depending on whether the tests are run on windows vs linux.

I could probably hack in some kind of fix to make that function ignore Literal types, but I wonder if it might be better to instead just change the sections of the code responsible for printing out the paths so we can just delete this function. We could hide this behind a secret command line flag or something if we're worried about backwards-compatibility, maybe.

I'll think about this more tomorrow and see if an easier solution comes to mind.

@JelleZijlstra
Copy link
Member

For your #3, I don't have a use case in mind immediately, but it seems like a natural feature and I can easily imagine use cases. For example, perhaps we'll define _OpenBytesFlags = Literal["rb", "rwb", ...], and some hypothetical function that takes these flags plus a few more could take Literal[_OpenBytesFlags, "some_other_flag"].

Agree that it'd be better to print out paths correctly in the first place so we don't need the global replacement that you found.

This commit lets us specify we want to skip path normalization
on some given test case, side-stepping the path-related errors
we were seeing.

Specifically, the problem was that mypy attempts to normalize
paths by replacing all instances of '\' with '/' so that the
output when running the tests on Windows matches the specified
errors. This is what we want to most of the time, except for a
few tests with Literals containing slashes.

I thought about maybe changing the output of mypy in general so
it always uses '/' for paths in error outputs, even on Windows:
this would mean we would no longer have to do path normalization.
However, I wasn't convinced this was the right thing to do: using
'\' on Windows technically *is* the right thing to do, and I didn't
want to complicate the codebase by forcing us to keep track of when
to use os.sep vs '/'.

I don't think I'll add too many of these test cases, so I decided
to just go with a localized solution instead of changing mypy's
error output.
Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR here are some comments. Only two larger ones are RawLiteralType vs RawLiteral, and error messages that are opposite to what I would expect (excessively suggestive where not necessary, and not enough detailed where they should be).

ListExpr, StrExpr, BytesExpr, UnicodeExpr, EllipsisExpr, CallExpr,
get_member_expr_fullname
)
from mypy.fastparse import parse_type_comment
from mypy.types import (
Type, UnboundType, TypeList, EllipsisType, AnyType, Optional, CallableArgument, TypeOfAny
Type, UnboundType, TypeList, EllipsisType, AnyType, Optional, CallableArgument, TypeOfAny,
LiteralType, RawLiteralType,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two large scale questions:

  • Would it be better to call this just RawLiteral?
  • Does it need to be a subclass of Type?

It is OK to still put it in types.py, but wouldn't it be simpler to update UnboundType.args to be List[Union[Type, RawLiteral]]? I understand this may need some work, but currently we must implement several visitors and this type can sneak into type checking phase.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into doing this, but I believe we currently reference UnboundType.args in approximately 60 difference places throughout our codebase. I'm worried that propagating this change won't be cheap: we'd likely end up having to change a bunch of type signatures so we can pass through Union[Type, RawLiteral] and/or add a bunch of runtime checks to filter out RawLiteral.

(We'd also need to modify both fastparse.TypeConverter and exprtotype so they return a Union[Type, RawLiteral], but I think this is a more tractable problem: we could just implement some sort of wrapper around them that reports an error and returns AnyType if the top-level type is RawLiteral)

If we're worried about RawLiterals and similar types sneaking through, I'd rather just add another layer to semantic analysis that does a final scan of all types to make sure that RawLiteral, UnboundType, and any other "bad" types are fully removed. That way, would need to scan and check each type just once, instead of in potentially multiple places whenever we manipulate UnboundType.args.

Or even better, we could add a new base type named TypeLike that Type, UnboundType, and RawLiteral all inherit from (along with any other fake types we don't want sneaking into the typechecking phase). UnboundType would then be modified so that its args contain instances of TypeLike rather then Type.

The semantic analysis layer would always work with TypeLike, and we'd modify TypeAnalyzer so it traverses over TypeLike and only ever returns Type -- this would probably let us skip the extra traversal while still giving us the guarantees we want in a typesafe way.

The last solution would probably be just as much work, if not more, as changing UnboundType.args to be of type Union[Type, RawLiteral], but I think it would be more principled and help us eliminate the root problem in a more fundamental way. (I'd also be willing to tackle making this change if you + the others think it's a good idea.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into doing this, but I believe we currently reference UnboundType.args in approximately 60 difference places throughout our codebase. I'm worried that propagating this change won't be cheap

I would recommend just trying this, if it will be soon obvious that it is too hard, then don't spend time on this. The current approach is also OK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, changing the type of UnboundType.args seems a bit ad hoc to me. We already have some Type subclasses such as TypeList that are only meaningful during semantic analysis, and I think that adding another one doesn't make things significantly worse if we are careful. Most visitors will crash if they encounter the new type subclass, so issues will likely be found soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most visitors will crash if they encounter the new type subclass, so issues will likely be found soon.

Fair point.

mypy/fastparse.py Show resolved Hide resolved
mypy/subtypes.py Show resolved Hide resolved
mypy/subtypes.py Show resolved Hide resolved
mypy/typeanal.py Outdated Show resolved Hide resolved
test-data/unit/check-literal.test Outdated Show resolved Hide resolved
[case testLiteralDisallowComplexNumbers]
from typing_extensions import Literal
a: Literal[3j] # E: invalid type comment or annotation
b: Literal[3j + 2] # E: invalid type comment or annotation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this for floats, but I think I want to add better error messages for complex numbers and arbitrary expressions in general in a separate diff. I'll add it to the literal types todo list.

test-data/unit/check-literal.test Outdated Show resolved Hide resolved
from typing_extensions import Literal
# TODO: Currently, this test case causes a crash. Fix the crash then re-enable.
# (We currently aren't handling forward references + type aliases very gracefully,
# if at all.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although no 100% perfect, we definitely do, something is wrong with your implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turned out the issue was due to the call to UnionTypes.make_simplified_union -- changing back to UnionTypes.make_union resolved this issue.

(It also means we no longer remove duplicate entries from unions -- but it seems like we don't attempt doing that with normal types anyways? It's not something I think is worth worrying about either way, so eh)

test-data/unit/check-literal.test Show resolved Hide resolved
Michael0x2a pushed a commit to Michael0x2a/peps that referenced this pull request Nov 27, 2018
…erals

This commit:

1. Performs a bunch of cleanup suggested by Ivan
2. Adds a note clarifying that negative numbers are *allowed*
3. Adds a note reiterating that PEP 484 expects type checkers to
   understand enum values when performing type inference (and
   suggests this can be implemented by treating enums as roughly
   equal to the union of their types)
4. Moves "nested literals" into the "supported" category -- as I
   discovered in python/mypy#5947,
   implementing support for this is not as bad as I thought.
5. Adds an explicit warning to the "literals and generics" section.
6. Modifies some text to become more firm about disallowing
   'Literal[Any]' and related constructs.
7. Deletes discussion about TypeScript's "index types" and "keyof"
   operator -- I mostly included that only because I got some feedback
   earlier to discuss TypeScript. It felt pretty shoehorned in,
   anyways.
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a very quick pass and left a few comments.

test-data/unit/semanal-errors.test Outdated Show resolved Hide resolved
@@ -877,7 +877,7 @@ A[TypeVar] # E: Invalid type "typing.TypeVar"
from typing import TypeVar, Generic
t = TypeVar('t')
class A(Generic[t]): pass
A[1] # E: Type expected within [...]
A[1] # E: Invalid type. Try using Literal[1] instead?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency, it's better not to use periods in error messages. Maybe split into an error and a note, which we use for various other suggestions? Here's one way to do it:

x.py:4: error: Invalid type "1"
x.py:4: note: Suggestion: use Literal[1] instead of 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to go with messages like:

x.py:4: error: Invalid type: try using Literal[1] instead?

This seems to match the format of several of the other error messages generated by typeanal.py (e.g. "Invalid type: ClassVar nested inside other type").

[builtins fixtures/complex.pyi]
[out]

[case testLiteralDisallowComplexExpressions]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A somewhat meta comment: Since test runtimes are already somewhat painful, I often prefer merging multiple related test cases into a single one, especially if they are pretty straightforward. There is significant per-test case overhead. In the future it may make sense to support explicitly subdividing tests into multiple sections to make it easier to maintain longer test cases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merged a few of the existing tests together, though don't think I was able to for this specific one (some uses of complex numbers raise an error at the parsing stage while others raise an error at the semantic analysis stage -- errors at the parsing stage appear suppress any subsequent one).

In any case, I'll be sure to keep this in mind as I add new tests + tune the existing ones.

mypy/subtypes.py Show resolved Hide resolved
This commit makes a variety of changes addressing feedback given
regarding the test cases. It does *not* address the "maybe we
can remove RawLiteraType from the Type hierarchy" feedback: I'll
take a closer look at that tomorrow.
@Michael0x2a
Copy link
Collaborator Author

As an update, I believe I've addressed all feedback regarding error messages and tests. I haven't had a chance to seriously try making UnboundType.arg's type be Union[UnboundType, RawLiteral] yet -- I'll take a swing at that tomorrow (or we can just land this now and I can try doing that in a separate diff, don't really care which).

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bunch of minor comments, feel free to merge after you consider them (it is also OK to fix them in the next PR).

mypy/server/astmerge.py Outdated Show resolved Hide resolved
mypy/indirection.py Outdated Show resolved Hide resolved
mypy/test/testtypes.py Show resolved Hide resolved
mypy/typeanal.py Outdated Show resolved Hide resolved
mypy/typeanal.py Outdated
# if the user attempts use an explicit Any as a parameter, if the user
# is trying to use an enum value imported from a module with no type hints,
# giving it an an implicit type of 'Any', or if there's some other underlying
# problem with the parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make make it an itemized list using *? (maybe also note which TypeOfAny kind corresponds to each situation?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I turned this comment into a list. I think I want to avoid noting which TypeOfAny corresponds to which case for now though -- I haven't actually investigated deeply which kind corresponds to which case and want to avoid documenting false info.

Probably I'll flesh this comment out more once I circle back around to handling enums and do the thing you suggest then. I left a TODO about this.

@@ -980,7 +980,7 @@ e = TypeVar('e', int, str, x=1) # E: Unexpected argument to TypeVar(): x
f = TypeVar('f', (int, str), int) # E: Type expected
g = TypeVar('g', int) # E: TypeVar cannot have only a single constraint
h = TypeVar('h', x=(int, str)) # E: Unexpected argument to TypeVar(): x
i = TypeVar('i', bound=1) # E: TypeVar 'bound' must be a type
i = TypeVar('i', bound=1) # E: Invalid type: try using Literal[1] instead?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a common trend: if there is an invalid type, it is hard to give a more detailed error "upstream". A potential solution for this may be to introduce a new TypeOfAny.invalid_type, so that caller can still choose to give an error for this Any. Could you please open an issue for this? (Maybe also link #4030 there.)

test-data/unit/check-literal.test Outdated Show resolved Hide resolved
test-data/unit/check-literal.test Outdated Show resolved Hide resolved
test-data/unit/check-literal.test Show resolved Hide resolved
@gvanrossum
Copy link
Member

W00t! Can't wait to start using this. Is it okay to update typeshed with the overload for open() yet, and then to delete the plugin treatment?

@JelleZijlstra
Copy link
Member

Shouldn't we wait for other type checkers to gain at least some level of support for literal types?

@gvanrossum
Copy link
Member

The point is apparently moot, since typing_extensions doesn't actually define Literal yet. :-( And Michael still has a long list of unchecked boxes at #5935.

But I don't think we should necessarily wait for other type checkers, since AFAIK they all have a process for "vetting" new versions of typeshed. In fact, mypy does too -- it's called "sync typeshed". :-) But mypy is a bit special since typeshed's CI tests with the latest mypy.

@Michael0x2a
Copy link
Collaborator Author

Michael0x2a commented Dec 3, 2018

But I don't think we should necessarily wait for other type checkers, since AFAIK they all have a process for "vetting" new versions of typeshed. In fact, mypy does too -- it's called "sync typeshed".

Maybe we need to a way of specifically indicating certain stubs or signatures are for certain type checkers only? E.g. some sort of if MYPY: directive. Or maybe instead of casing by type checker, we case by features instead and add some sort of if SUPPORTS_LITERAL_TYPES directive? That would be a bit less ad-hoc.

@Michael0x2a Michael0x2a deleted the add-preliminary-literal-types-semantic-logic branch December 3, 2018 00:48
@Michael0x2a Michael0x2a mentioned this pull request Dec 3, 2018
42 tasks
Michael0x2a pushed a commit to Michael0x2a/peps that referenced this pull request Mar 14, 2019
…erals

This commit:

1. Performs a bunch of cleanup suggested by Ivan
2. Adds a note clarifying that negative numbers are *allowed*
3. Adds a note reiterating that PEP 484 expects type checkers to
   understand enum values when performing type inference (and
   suggests this can be implemented by treating enums as roughly
   equal to the union of their types)
4. Moves "nested literals" into the "supported" category -- as I
   discovered in python/mypy#5947,
   implementing support for this is not as bad as I thought.
5. Adds an explicit warning to the "literals and generics" section.
6. Modifies some text to become more firm about disallowing
   'Literal[Any]' and related constructs.
7. Deletes discussion about TypeScript's "index types" and "keyof"
   operator -- I mostly included that only because I got some feedback
   earlier to discuss TypeScript. It felt pretty shoehorned in,
   anyways.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants