Make reachability code understand chained comparisons (v2) #8148

Michael0x2a · 2019-12-15T01:55:15Z

This pull request is v2 (well, more like v10...) of my attempts to make our reachability code better understand chained comparisons.

Unlike #7169, this diff focuses exclusively on adding support for chained operation comparisons and deliberately does not attempt to change any of the semantics of how identity and equality operations are performed.

Specifically, mypy currently only examines the first two operands within a comparison expression when refining types. That means the following expressions all do not behave as expected:

x: MyEnum
y: MyEnum
if x is y is MyEnum.A:
    # x and y are not narrowed at all

if x is MyEnum.A is y:
    # Only x is narrowed to Literal[MyEnum.A]

This pull request fixes this so we correctly infer the literal type
for x and y in both conditionals.

Some additional notes:

While analyzing our codebase, I found that while comparison expressions involving two or more is or == operators were somewhat common, there were almost no comparisons involving chains of != or is not operators, and no comparisons involving "disjoint chains" -- e.g. expressions like a == b < c == b where there are multiple "disjoint" chains of equality comparisons.

So, this diff is primarily designed to handle the case where a comparision expression has just one chain of is or ==. For all other cases, I fall back to the more naive strategy of evaluating each comparision individually and and-ing the inferred types together without attempting to propagate any info.
I tested this code against one of our internal codebases. This ended up making mypy produce 3 or 4 new errors, but they all seemed legitimate, as far as I can tell.
I plan on submitting a follow-up diff that takes advantage of the work done in this diff to complete support for tagged unions using any Literal key, as previously promised.

(I tried adding support for tagged unions in this diff, but attempting to simultaneously add support for chained comparisons while overhauling the semantics of == proved to be a little too overwhelming for me. So, baby steps.)

This pull request is v2 (well, more like v10...) of my attempts to make our reachability code better understand chained comparisons. Unlike python#7169, this diff focuses exclusively on adding support for chained operation comparisons and deliberately does not attempt to change any of the semantics of how identity and equality operations are performed. Specifically, mypy currently only examines the first two operands within a comparison expression when refining types. That means the following expressions all do not behave as expected: ```python x: MyEnum y: MyEnum if x is y is MyEnum.A: # x and y are not narrowed at all if x is MyEnum.A is y: # Only x is narrowed to Literal[MyEnum.A] ``` This pull request fixes this so we correctly infer the literal type for x and y in both conditionals. Some additional notes: 1. While analyzing our codebase, I found that while comparison expressions involving two or more `is` or `==` operators were somewhat common, there were almost no comparisons involving chains of `!=` or `is not` operators, and no comparisons involving "disjoint chains" -- e.g. expressions like `a == b < c == b` where there are multiple "disjoint" chains of equality comparisons. So, this diff is primarily designed to handle the case where a comparision expression has just one chain of `is` or `==`. For all other cases, I fall back to the more naive strategy of evaluating each comparision individually and and-ing the inferred types together without attempting to propagate any info. 2. I tested this code against one of our internal codebases. This ended up making mypy produce 3 or 4 new errors, but they all seemed legitimate, as far as I can tell. 3. I plan on submitting a follow-up diff that takes advantage of the work done in this diff to complete support for tagged unions using any Literal key, as previously promised. (I tried adding support for tagged unions in this diff, but attempting to simultaneously add support for chained comparisons while overhauling the semantics of `==` proved to be a little too overwhelming for me. So, baby steps.)

ilevkivskyi

Thanks, great work! I think I like the large-scale idea. I didn't check all the details, but here is a bunch of minor comments.

ilevkivskyi · 2019-12-20T19:14:27Z

test-data/unit/check-enum.test

+# TODO: This should behave in the same way as above.
+# However, unlike the above, we currently don't progressively update the type of 'x' as
+# we check each individual comparison. So, when we do 'x is Foo.B', mypy still thinks
+# 'x' is of type 'Foo', which is why we get the below faulty result.


Would it be possible to fix this by merging the groups if they contain expressions with the same literal hash and literal level LITERAL_TYPE? I mean my guess is that this fails because there are two groups in simplified_operator_list that contain x.

I tried implementing this suggestion, and it seems to work!

It did make the grouping algorithm more complicated though -- the line-count roughly doubled in size, I think. I also tried implementing a more direct/naive "merge the groups after making them" approach, and that ended up being similarly complex.

Not sure if that's something we're ok with or not: this feels like a lot of code for what I suspect is ultimately a very rare edge case. Maybe it might be better to remove the changes I made and keep this TODO to try and help minimize the maintenance burden of these changes? LMK what you think.

ilevkivskyi · 2019-12-20T19:18:56Z

test-data/unit/check-enum.test

+   reveal_type(x)   # N: Revealed type is '__main__.Foo'
+reveal_type(x)      # N: Revealed type is '__main__.Foo'
+
+[builtins fixtures/primitives.pyi]


Maybe one test with a tricky grouping, like two triples connected with <? Ideally you could just add a unit test for the grouping algorithm, but it seems to me this is not easy.

Done!

I also refactored out the grouping algorithm into a standalone function and added several unit tests for it to testinfer.py. (Not sure is this is the right home for these tests though. LMK if you want me to move them.)

ilevkivskyi · 2019-12-20T19:23:27Z

mypy/checker.py

-                    if node.operators == ['in']:
-                        return {expr: remove_optional(left_type)}, {}
-                    if node.operators == ['not in']:
-                        return {}, {expr: remove_optional(left_type)}


I can't find where the this whole chunk moved. Do we now support a in b in c? If yes, is there a test?

This chunk of code starts on line 3853. I guess the diff ended up being a little messy because I renamed the variables ("left_type" -> "item_type" and "right_type" -> "collection_type") and moved the logic around a bit.

But we don't do anything really special for this case: we treat a in b in c as if it were a in b and b in c and only narrow away Optionals from the LHS item type if it overlaps with whatever's inside the RHS collection.

So in the end, I don't think we've really improved support for this pattern.

ilevkivskyi · 2019-12-20T19:25:19Z

mypy/checker.py

+            if any(is_overlapping_erased_types(expr_type, t) for t in non_optional_types):
+                if_map[operands[i]] = remove_optional(expr_type)
+
+        return if_map, {}


Should this refine both x and y if I have:

x: Optional[int] y: Optional[int] if x == y == 1: ...

If yes, please add a test for this.

It does! I added a test to check-optional.test.

ilevkivskyi · 2019-12-20T19:28:09Z

mypy/checker.py

+
+        # Oh well, give up and just arbitrarily pick the last item.
+        if singleton_index == -1:
+            singleton_index = possible_singleton_indices[-1]


Do we have a test for this situation?

Currently, no -- but I did spend some time trying to construct one and eventually ended up convincing myself that we'll always get the same result no matter which index we pick. I updated the comment to include the reasoning.

mypy/checker.py

ilevkivskyi · 2019-12-20T19:34:27Z

mypy/checker.py

@@ -4587,6 +4763,75 @@ def or_conditional_maps(m1: TypeMap, m2: TypeMap) -> TypeMap:
    return result


+def or_partial_conditional_maps(m1: TypeMap, m2: TypeMap) -> TypeMap:


This name doesn't really reflect what this does. Maybe use combine_conditional_maps() would be better? I however don't have strong feelings here.

I think using the word "combine" would be a bit ambiguous because it'd be unclear whether the combining step will end up "and"-ing or "or"-ing the the two maps.

Basically, the way I was thinking about this change is that there are now two kinds of TypeMaps -- full ones, which represent all info we know about types within a certain "context", and partial ones which contain only some of the info.

Partial TypeMaps also only exist as an implementation detail of the narrowing logic: it always ends up returning full TypeMaps.

So if and_conditional_maps and or_conditional_maps functions are for full TypeMaps, I was thinking it made sense to add and_partial_conditional_maps and or_partial_conditional_maps functions for the partial ones.

But it also turned out that and_conditional_maps and and_partial_conditional_maps behave in the exact same way/would share the same implementation, so I didn't bother defining the latter.

The other approach I could take is to not define the or_partial_conditional_maps and instead just rewrite reduce_partial_conditional_maps function to produce the output maps more directly and do the (pseudo)-intersecting and unioning itself.

That works just as well (and is actually probably slightly more efficient), but does end up making it harder to see how the function relates to the existing conditional map logic.

ilevkivskyi · 2019-12-20T19:35:45Z

mypy/checker.py

+        )
+
+    ...where "PseudoIntersection[X, Y] == Y" because mypy actually doesn't understand intersections
+    yet, so we settle for just arbitrarily picking the right expr's type.


Do we have a test whether this would actually make a difference?

It seems yes -- if I modify and_conditional_maps to bias towards picking elements from the left map, this ends up breaking the testEnumReachabilityWithMultipleEnums test. Basically, when we do:

class Foo(Enum): A = 1 B = 2 class Bar(Enum): A = 1 B = 2 x3: Union[Foo, Bar] if x3 is Foo.A or x3 is Bar.A: reveal_type(x3) else: reveal_type(x3)

...we end up inferring the less precise type of Union[Literal[Foo.B], Bar] instead of Union[Literal[Foo.B], Literal[Bar.B]] in the else case -- the less precise narrowing we get from x3 is Foo.A overrode anything we learned in the next clause.

ilevkivskyi

Not sure if that's something we're ok with or not: this feels like a lot of code for what I suspect is ultimately a very rare edge case. Maybe it might be better to remove the changes I made and keep this TODO to try and help minimize the maintenance burden of these changes? LMK what you think.

I think this is generally fine. Also do I understand correctly that the only reason for the more complex algorithm instead of few lines iterative group merging is that the naive algorithm is quadratic in the number of comparisons?

mypy/checker.py

Make reachability code understand chained comparisons (v2) (python#8148)

This was referenced Dec 15, 2019

Make reachability code understand chained comparisons #7169

Closed

Add support for narrowing Literals using equality #8151

Merged

ilevkivskyi reviewed Dec 20, 2019

View reviewed changes

Michael0x2a added 2 commits December 22, 2019 19:57

Respond to code review

a9c3b3f

Add annotation for mypyc

34dbe1a

ilevkivskyi approved these changes Dec 23, 2019

View reviewed changes

mypy/checker.py Outdated Show resolved Hide resolved

Add more comments to DisjointDict; consolidate some methods

c27568d

Michael0x2a merged commit 9101707 into python:master Dec 25, 2019

sthagen added a commit to sthagen/python-mypy that referenced this pull request Dec 25, 2019

Merge pull request #3 from python/master

67fb9fc

Make reachability code understand chained comparisons (v2) (python#8148)

ilevkivskyi mentioned this pull request Jan 2, 2020

Crash in lambda expression as generic argument #8230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make reachability code understand chained comparisons (v2) #8148

Make reachability code understand chained comparisons (v2) #8148

Michael0x2a commented Dec 15, 2019

ilevkivskyi left a comment

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi Dec 20, 2019

Michael0x2a Dec 23, 2019

ilevkivskyi left a comment

		@@ -4587,6 +4763,75 @@ def or_conditional_maps(m1: TypeMap, m2: TypeMap) -> TypeMap:
		return result


		def or_partial_conditional_maps(m1: TypeMap, m2: TypeMap) -> TypeMap:

Make reachability code understand chained comparisons (v2) #8148

Make reachability code understand chained comparisons (v2) #8148

Conversation

Michael0x2a commented Dec 15, 2019

ilevkivskyi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilevkivskyi left a comment

Choose a reason for hiding this comment