Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need a better story about Any #3194

Open
ilevkivskyi opened this issue Apr 19, 2017 · 23 comments
Open

We need a better story about Any #3194

ilevkivskyi opened this issue Apr 19, 2017 · 23 comments

Comments

@ilevkivskyi
Copy link
Member

While working on protocols, I noticed that stubs often use Any where object will be more appropriate and vice versa. Because of how protocol inference works and how current mypy solver works, this leads to problems since sometimes mypy infers Any instead of giving an error (maybe we could improve this by "tightening" solver?)

I think the root of the problem is that wee don't have a good story about Any. I believe the rules for interaction between Any and other things should be like this (where C is a normal class like int or str):

(1) join(Any, object) == object
(2) meet(Any, object) == Any
(3) join(Any, C) == C
(4) meet(Any, C) == C
(5) join(Any, NoReturn) == Any
(6) meet(Any, NoReturn) == NoReturn

I use NoReturn for uninhabited type, since it is how it is currently called in PEP 484. The above rules could be summarized in a simple diagram:

        object
          |
         Any (if it appears in a meet)
        / | \
... all other types ...
        \ | /
         Any (if it appear in a join)
          |
       NoReturn

Currently, mypy behaves in a completely different way, and I believe this is not right, let me explain why. We could focus on rules (3) and (4). Currently, mypy returns Any in both these cases. Here is why it is not safe:

from typing import Any

x = 'a'
y: Any = 1

lst = [x, y]
lst[0] >> 1  # not detected by mypy, fails at runtime, will be fixed by rule (3)

def f(x: int) -> None:
    x >> 1
def g(x: Any) -> None:
    pass

funs = [f, g]
funs[0]('a')  # not detected by mypy, fails at runtime, will be fixed by rule (4)

@JukkaL I will be grateful for your opinion.

@gvanrossum
Copy link
Member

Hm, I have to think about this. I'm not sure I agree with everything in the first diagram either. I thought joins with Any always gave Any? (IOW why is rule (2) not the same as rule (4)?)

Off-topic there's also something with y: Any = 1 where I'd like mypy to behave differently: IMO it should infer that y is temporarily an int but allow assigning anything to it later.

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Apr 19, 2017

@gvanrossum

I thought joins with Any always gave Any?

The reasoning why I don't like this is similar to why we recently prohibited simplifying Union[int, Any] to Any, this is not safe (see example).

(IOW why is rule (2) not the same as rule (4)?)

Rules (2) and (5) are different form others because I want to preserve the fact that object is an "absolute top" and NoReturn is an "absolute bottom". This is natural, consider this simplified reasoning: Any is something that has all methods and all values, but object is something that has all values but no methods, therefore object is wider that Any. Similarly we find that Any is wider than NoReturn.

@gvanrossum
Copy link
Member

gvanrossum commented Apr 19, 2017

OK, that explains (1), (2), (5) and (6) to me. That leaves (3) and (4). I think you're saying (3) join(Any, C) == C because C has fewer methods than Any, while (4) meet(Any, C) == C because C has fewer values.

It seems the right way to think about the lattice is that up (join) is the direction of fewer methods and more values (with fewer methods being tested first), while down (meet) is the direction of fewer values and more methods (with fewer values being tested first).

PS. Please have pity on my feeble understanding. I have never had any theory classes about this at all, since except for Simula, OO was invented after I graduated!

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 19, 2017

The join with Any is Any to avoid false positives. For example, consider this code:

class A: pass
class B(A): pass

def f(x: B, y: Any) -> None:
    a = [x, y]
    g1(a)  # should this be fine?
    g2(a)  # what about this?

def g1(a: List[A]) -> None: pass
def g2(a: List[B]) -> None: pass

I think that both calls should be accepted, since each could be okay, given a suitable precise type for y. The philosophy is that replacing a type with an Any type should not generate additional errors if we can avoid that. That's why the inferred type of a is List[Any], not List[B].

@gvanrossum
Copy link
Member

gvanrossum commented Apr 19, 2017

So you disagree with (1) and (3) from the OP?

(1) join(Any, object) == object
...
(3) join(Any, C) == C
...

@ilevkivskyi
Copy link
Member Author

ilevkivskyi commented Apr 19, 2017

@JukkaL

The philosophy is that replacing a type with an Any type should not generate additional errors if we can avoid that.

g2(a) in your example passes with both versions of join, so let us focus on g1(a). Indeed, for y: A it will pass, but will start failing if replaced with y: Any in the case of proposed join. However, there is something that I don't like here, namely:

  • I intuitively feel that amount of false positives with proposed join will be smaller than amount of false negatives with current join (this is purely subjective of course, but see next point).
  • I don't like "proliferation" of Any, let me explain my intuition: If I see a list of B and "something that always work", then I would expect that a user wanted to create a list of B, not a list of things that will work everywhere.

Of course, a user could always overrule the inference with an explicit annotation, but it seems to me it is better to silence an error with a: List[Any] (or even better with a: List[A] for the case of g1) rather than have a (possibly subtle and nasty) uncaught error. What do you think?

@ilevkivskyi
Copy link
Member Author

@JukkaL
Also, note that in your example invariance is important, everything will still work with proposed join for

def g1(a: Sequence[A]) -> None:
    ...

You could argue that user-defined classes are invariant by default, we already have a section in "Common issues" about this, so that users will be prepared that sometimes explicit annotations are needed.

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 19, 2017

I may well disagree with the rules for meet as well, but this needs some thought as the issue is subtle so I'm focusing on join first. By the way, I'm not convinced that this is analogous to the union simplification fix and this needs to argued for based on its merits.

One of the reasons why Any types are 'contagious' is that that it can highlight code that is unsafe. If something has type Any, using the value is potentially unsafe. If using something is potentially unsafe, inferring Any is generally the right thing to do, unless the user overrides this with an explicit annotation. The proposed rules for join would make mypy less safe in that sense. Example:

class A:
    def f(self) -> None: pass

class B: pass

def f(a: A, b: Any) -> None:
    x = [a, b]
    x[1].f()   # unsafe if b is an instance of B, for example

With the current semantics, the receiver object on the final line has type Any, so it's clear that the method call is unsafe, which is correct. With the proposed semantics, the type would be A, and the call would look safe, even though it's not. We can argue that inferring List[Any] for x is safer than List[A], since the actual runtime type could well be a supertype of A.

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 19, 2017

Another issue with the rules for joins is the behavior with isinstance:

def f(a: str, b: Any) -> None:
    x = [a, b][0]
    if isinstance(x, str):
        ...
    else:
        x(1 + '')  # should be an error

If we infer str as the type of x, we get a false positive on the final line since mypy would think that the line unreachable because of the isinstance test.

Actually, it looks like inferring Union[str, Any] for x would actually conform to principles, but here the argument is different -- we try to avoid inferring union types since they are very unintuitive for users.

@ilevkivskyi
Copy link
Member Author

One of the reasons why Any types are 'contagious' is that that it can highlight code that is unsafe

But the only way for a user to see this is to use reveal_type, or am I missing something? To me it looks quite opposite, with current semantics Any rather "hides" errors (leaves them unreported). For example, extending your example:

def f(a: A, b: Any) -> None:
    x = [a, b]
    x[1].f()
    x[1].never_had_this()
    if function_that_typically_returns_zero():
        x[0].this_will_fail_late()

Another issue with the rules for joins is the behavior with isinstance:

This example is interesting. Inferring Union[X, Any] for particular join(X, Any) is also interesting. Maybe we can just do this internally but simplify this in error messages?

@ddfisher
Copy link
Collaborator

I don't think this is an issue that should be resolved in isolation. We have a more thorough review of the type system on our roadmap for the next couple months -- we should make sure to consider the role of Any as part of that, but doing it beforehand would be duplicating work and could potentially lead us in the wrong direction.

@ilevkivskyi
Copy link
Member Author

We have a more thorough review of the type system on our roadmap for the next couple months

Ah, thanks, I didn't know this. Then I propose to postpone this discussion, but leave the issue open just to be sure we will not forget about this.

@gvanrossum
Copy link
Member

I wonder if the review should be done in public rather than as Dropbox meetings.

@ilevkivskyi
Copy link
Member Author

I wonder if the review should be done in public rather than as Dropbox meetings.

I can't say for others, but I would very much like it if I had the chance to join :-)

@ilevkivskyi
Copy link
Member Author

our roadmap

By the way, is this roadmap a secret document, or is it available somewhere?

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 21, 2017

Our current roadmap for mypy is not public, but at least I wouldn't mind making it public -- with the caveat that it's continuously evolving.

@gvanrossum
Copy link
Member

Jukka, I think it's a great idea to publish the internal road map (I can't even recall what's on it). Maybe we can put it in a file named ROADMAP.md at the top level of the tree? Alternatively it could be a chapter in the docs. Another approach would be to create issues for each of the items and tie them together with a custom label (or, dare I say, a milestone?). We should recall that road maps are often vague and subject to change -- an item on the road map doesn't promise it will be delivered by a certain date.

@refi64
Copy link
Contributor

refi64 commented Apr 21, 2017

How about putting it in the wiki? Just restrict editing to collaborators only.

@gvanrossum
Copy link
Member

We're actually trying to kill the wiki, it's full of unmaintained stuff.

@refi64
Copy link
Contributor

refi64 commented Apr 21, 2017

We're actually trying to kill the wiki, it's full of unmaintained stuff.

FWIW I agree that the Developer Guides section is probably mostly crap at this point, but I don't really see why it wouldn't be an ideal place for a roadmap. It would have dead-easy editing from either the command line or the GUI, too.

Also:

http://www.mypy-lang.org/roadmap.html

This really needs to be updated:

Milestone 5: Faster, incremental type checking

Support more efficient, incremental type checking. After a code edit, mypy should only check modules affected by that change instead of the whole program. This will be useful especially for large projects.

Schedule: 2016

Milestone 6: Beta release

After this release, work harder to avoid changes that break backward compatibility in significant ways. > Most critical open issues resolved.

Schedule: 2016?

@gvanrossum
Copy link
Member

@JukkaL -- thoughts about the wiki and the outdated road map?

@kirbyfan64 -- that website is in GitHub, you can send PRs here: https://github.com/JukkaL/mypy-website

@JukkaL
Copy link
Collaborator

JukkaL commented Oct 24, 2018

To get the discussion moving on, here is how the type system behaves currently:

(1) join(Any, object) == Any
(2) meet(Any, object) == object
(3) join(Any, C) == Any
(4) meet(Any, C) == C
(5) join(Any, NoReturn) == Any
(6) meet(Any, NoReturn) == NoReturn

If I were to change anything, I'd consider changing meets (this is just an idea, I haven't carefully considered all the implications):

(1) join(Any, object) == Any
(2) meet(Any, object) == Any   <-- changed
(3) join(Any, C) == Any
(4) meet(Any, C) == Any  <-- changed
(5) join(Any, NoReturn) == Any
(6) meet(Any, NoReturn) == NoReturn

And here is how I think about Any types and these can be used to argue for the above rules:

  1. Any type is a type that mypy doesn't know. Usually it's a type the programmer hasn't specified (yet). Sometimes it's a type that can't be represented in the type system.
  2. Operations involving Any types should rarely generate false positives. It's totally okay to generate false negatives. The main purpose of Any is to turn type errors we don't want to see into false negatives.
  3. Any is designed to be used while migrating code to static typing. In fully migrated code it should be used rarely. (In practice, fully migrating code is often impractical, but we are a bit idealistic here and assume that this gets much easier in the future.)
  4. Replacing any non-Any type with Any should not generate additional errors (there are some exceptions but they should be very rare). This allows using libraries without stubs without resulting in many false positives. This also allows type checking only a subset of a codebase during migration to type checking -- imports from missing modules will result in Any types.

@ilevkivskyi
Copy link
Member Author

I think we should add the is_subtype() to the puzzle (in mypy functions terminology, this is what is called compatible in the more common gradual typing terminology). The current rules are:

(1) is_subtype(Any, C) == True
(2) is_subtype(C, Any) == True
(3) is_subtype(NoReturn, C) == True
(4) is_subtype(C, object) == True

The rules for join/meet should satisfy their definitions i.e. is_subtype(meet(A, B), B) must be True etc. I think this is currently OK. The proposed change also seems fine.

I would propose to add a fifth rule to how to think about Any:

  1. There are as few special cases as possible :-) IOW if join(Any, C) == Any then this also holds if
    C is a corner case like object or NoReturn.

This rule already holds. I would propose to keep it, so that meet(Any, C) == Any for all C including NoReturn. Note however this conflicts with the idea proposed in #5818 to return False for is_subtype(Any, NoReturn).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants