Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix is_complete_for check #587

Merged
merged 4 commits into from
Feb 28, 2023

Conversation

manopapad
Copy link
Contributor

This is the underlying issue behind nv-legate/cunumeric#721 (comment). Thanks to @aschaffer for doing most of the work to uncover this.

The error in the original logic can be seen when running the following example:

import cunumeric as num
a = num.ones((8, 10, 8))
v = num.ones((3, 5, 3))
num.convolve(a, v, mode="same")

with LEGATE_TEST=1 CUNUMERIC_TEST=1 legate --cpus 3 --event --dataflow, in which case legion spy verification fails with:

ERROR: Index Partition: 15 was labelled complete but there are missing points. This is definitely an application bug and invalidates all of Legion's dependence analysis. You must fix this before Legion Spy can continue.

The call to is_complete_for for the offending partition is misjudging it as complete. Here are the values in that call:

self._tile_shape = Shape((8, 4, 8))
self._color_shape = Shape((1, 3, 1))
self._offset = Shape((0, 0, -2))
extents = Shape((8, 10, 8))
offsets = Shape((0, 0, 0))
my_lo = Shape((0, 0, -2))
my_hi = Shape((8, 12, 6)))

@magnatelee please also take a look at the following potential related issues:

  • The task creation code in deferred.py:convolve is setting PartSym.complete = False, but this information is not honored when creating the partitions. Instead is_complete_for is consulted like normal.
  • Here a (lexicographic) comparison operator on Shape was used where an elementwise comparison was intended. It would be good to check any other uses of these operators, to confirm that this mixup isn't happening there too.

@manopapad manopapad added the category:bug-fix PR is a bug fix and will be classified as such in release notes label Feb 24, 2023
@magnatelee
Copy link
Contributor

magnatelee commented Feb 24, 2023

@manopapad @aschaffer thanks for catching this bug. It's my fault to assume comparison between tuples to be element-wise.

The task creation code in deferred.py:convolve is setting PartSym.complete = False, but this information is not honored when creating the partitions. Instead is_complete_for is consulted like normal.

Note that completeness of a partition symbol is a constraint on the partition and not a command to the solver. It's perfectly fine to pass a complete partition to a place where an incomplete partition is expected (the inverse isn't true though). That said, the solver currently doesn't check if the solution for a given partition symbol abides by the completeness constraint, though I should and will add it at some point.

Here a (lexicographic) comparison operator on Shape was used where an elementwise comparison was intended. It would be good to check any other uses of these operators, to confirm that this mixup isn't happening there too.

Let's change Shape's comparison operator directly. I meant it to be element-wise and thus shouldn't have used tuple comparison. I checked all uses of <= in the code and the one you guys spotted seems like the only one, so there should be no surprise with changing the behavior.

@manopapad
Copy link
Contributor Author

I tried removing the definitions of all inequality comparisons from Shape and running the cuNumeric test suite, and this uncovered another use which I replaced, please see 91576e1.

Let's change Shape's comparison operator directly. I meant it to be element-wise and thus shouldn't have used tuple comparison.

I don't like this, for two reasons. First and less importantly, this would make <= on Shape not a total ordering. Second and more important, this would make <= do something different on Shape than on tuple, and Shape is pretty much some extra functionality on top of tuple, so this is asking for a mixup to happen in the future.

I would rather we define proper elementwise_lt or similar methods on Shape (or as freestanding functions), and possibly remove the inequality operators from Shape, to avoid any possibility of confusion in the future.

CCing @bryevdv for his opinion too.

@bryevdv
Copy link
Contributor

bryevdv commented Feb 24, 2023

I don't have all the context, but

        return all(a <= b for (a, b) in zip(my_lo, offsets)) and all(
            a <= b for (a, b) in zip(offsets + extents, my_hi)
        )

is pretty tedious and verbose, and not very expressive of the fact that some kind of higher level comparison is happening between my_lo, my_hi, and offsets. So I'd agree that an explicit API for these comparisons seems reasonable on that basis.

@magnatelee
Copy link
Contributor

magnatelee commented Feb 24, 2023

I don't like this, for two reasons. First and less importantly, this would make <= on Shape not a total ordering. Second and more important, this would make <= do something different on Shape than on tuple, and Shape is pretty much some extra functionality on top of tuple, so this is asking for a mixup to happen in the future.

Shape behaving like tuple for some operations is accidental and not by design. If anything, Shape meant to behave more like NumPy ndarray; for example, + on tuples concatenate the values, whereas that on Shape and ndarray performs element-wise addition. Also, what does a lexicographical order between Shapes even mean? We shouldn't decide the semantics of an operator based on what an abstraction's closest kin does, but what's best for that abstraction. I agree that there's ambiguity in the operator semantics, but I strongly disagree Shape should do what tuple does.

I would rather we define proper elementwise_lt or similar methods on Shape (or as freestanding functions), and possibly remove the inequality operators from Shape, to avoid any possibility of confusion in the future.

An alternative is to have <= and other comparison operators return a tuple of booleans, which would then be passed to all or any. I don't like spelling out elementwise_lt knowing we'll never want the lexicographical order on shapes.

@manopapad
Copy link
Contributor Author

@magnatelee is suggesting we simply use the built-in __le__ for this:

Shape(a,b) <= Shape(c,d) := a <= c and b <= d

However, I am a little worried about this solution, because this would imply:

holds: (1,2) <= (2,1)  # tuple.__le__ is lexicographic
does not hold: Shape(1,2) <= (2,1)  # Shape.__le__ is elementwise

so IMHO this is breaking the principle of least surprise.

I would instead suggest an explicit "elementwise" comparison function (naming & placement TBD).

@manopapad
Copy link
Contributor Author

We shouldn't decide the semantics of an operator based on what an abstraction's closest kin does, but what's best for that abstraction.

I agree with the principle. However, in this particular case I think there is (justifiably) a high risk of confusion. The class is named "shape", which is bound to be conflated with a NumPy shape, which is an actual tuple. Moreover, a lot of the time we are initializing a Shape based on a NumPy array's shape (i.e. a tuple), so it is very likely that developers will assume that the two are compatible. So to save ourselves some headaches I think we should try to avoid subtle cases of incompatibility.

An alternative is to have <= and other comparison operators return a tuple of booleans, which would then be passed to all or any. I don't like spelling out elementwise_lt knowing we'll never want the lexicographical order on shapes.

That's a better solution, I can proceed with that.

@bryevdv
Copy link
Contributor

bryevdv commented Feb 24, 2023

An alternative is to have <= and other comparison operators return a tuple of booleans,

Just noting that in case someone forgets to do

if all(s1 <= s2):  # if all((True, False, True....)):  [False]

and instead does (the IMO more typical)

if s1 <= s2:  # if (True, False, True....):  [Truthy]

then the comparison will nearly always return True.

@manopapad
Copy link
Contributor Author

then the comparison will nearly always return True.

Will mypy flag it as an error?

@bryevdv
Copy link
Contributor

bryevdv commented Feb 24, 2023

I don't see why it would, it's perfectly valid (if not useful) to do

dev39 ❯ cat foo.py 
a = (True, False)

if all(a): print("all")

if a: print("oops")

~/tmp
dev39 ❯ mypy foo.py
Success: no issues found in 1 source file

~/tmp
dev39 ❯ python foo.py 
oops

@magnatelee
Copy link
Contributor

An alternative is to have <= and other comparison operators return a tuple of booleans,

Just noting that in case someone forgets to do

if all(s1 <= s2):  # if all((True, False, True....)):  [False]

and instead does (the IMO more typical)

if s1 <= s2:  # if (True, False, True....):  [Truthy]

then the comparison will nearly always return True.

I think we can return our own container where __bool__ is overriden to raise an exception if there are more than one value. fyi, that's how numpy ndarray avoids ambiguity.

@magnatelee
Copy link
Contributor

magnatelee commented Feb 25, 2023

I agree with the principle. However, in this particular case I think there is (justifiably) a high risk of confusion. The class is named "shape", which is bound to be conflated with a NumPy shape, which is an actual tuple. Moreover, a lot of the time we are initializing a Shape based on a NumPy array's shape (i.e. a tuple), so it is very likely that developers will assume that the two are compatible. So to save ourselves some headaches I think we should try to avoid subtle cases of incompatibility.

That's fair, but I'd blame inherent ambiguity in the comparison operators defined to return a single boolean. Except for the unfortunate naming collision with ndarray.shape, there's nothing in Shape that would get people thinking that it behaves like a tuple. And that actually makes me think naming it Shape was a mistake and we should rename it. Plus, we could potentially give a more meaningful name to each comparison operator returning boolean; for example, a <= b could be renamed to b.subsumes(a), which is what it's really meant to be.

That's a better solution, I can proceed with that.

See my comment about a custom boolean container. We could do it if we want to avoid all footguns.

Will mypy flag it as an error?

I believe when an object obj passed in a place where a boolean is expected, an implicit obj.__bool__() call is inserted by the interpreter.

@manopapad
Copy link
Contributor Author

I went ahead and implemented the approach of returning a tuple-like of bools, guarding that __bool__ is not called directly on it. MyPy helpfully pointed out one more use of comparison operators.

Assuming that this is a core-internal API, I just used asserts to signal misuse, but happy to switch to something else if you guys prefer.

Plus, we could potentially give a more meaningful name to each comparison operator returning boolean; for example, a <= b could be renamed to b.subsumes(a), which is what it's really meant to be.

We need two types of subsumes, all(<=) and all(<), so I guess we'd need subsumes and strict_subsumes, if we wanted to go with this.

Copy link
Contributor

@bryevdv bryevdv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a proper compare API would be nicer, but if users aren't really expected to be comparing shapes this seems OK at least for the short term to get some guardrails in the next release.

@manopapad manopapad merged commit 33ce363 into nv-legate:branch-23.03 Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:bug-fix PR is a bug fix and will be classified as such in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants