Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: keepdims in numpy.sum should not be None #376

Merged
merged 5 commits into from
Sep 7, 2023

Conversation

Saransh-cpp
Copy link
Member

Description

Fixes #372

See #372 (comment)

I've added tests for the default case as well.

Checklist

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't any other open Pull Requests for the required change?
  • Does your submission pass pre-commit? ($ pre-commit run --all-files or $ nox -s lint)
  • Does your submission pass tests? ($ pytest or $ nox -s tests)
  • Does the documentation build with your changes? ($ cd docs; make clean; make html or $ nox -s docs)
  • Does your submission pass the doctests? ($ xdoctest ./src/vector or $ nox -s doctests)

Before Merging

  • Summarize the commit messages into a brief review of the Pull request.

@Saransh-cpp Saransh-cpp added the bug The problem described is something that must be fixed label Sep 6, 2023
@@ -20,6 +20,7 @@
import typing

import numpy
from numpy._globals import _NoValue, _NoValueType
Copy link
Member Author

@Saransh-cpp Saransh-cpp Sep 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a public API for numpy._NoValue? Should we be using the private API or is there a better way to do this? Or should I set keepdims=False by default?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set keepdims=False by default.

Suggested change
from numpy._globals import _NoValue, _NoValueType

Anything that starts with an underscore is not safe for other libraries to use, so that would include this one.

NumPy's default value for keepdims is effectively False; that's what's used if no keepdims argument is given.

>>> np.sum(np.ones((3, 5)), axis=1)
array([5., 5., 5.])
>>> np.sum(np.ones((3, 5)), axis=1, keepdims=False)
array([5., 5., 5.])
>>> np.sum(np.ones((3, 5)), axis=1, keepdims=True)
array([[5.],
       [5.],
       [5.]])

They have this _NoValue to distinguish "user did not specify a keepdims value" from "user explicitly specified the value that would be the default, anyway." The reason they're making that distinction is given in the np.sum documentation:

If the default value is passed, then keepdims will not be passed through to the sum method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

presumably to allow for subclasses that did not handle this argument, maybe because they predated it?

If Vector just sets the default keepdims=False, then it would not carry over this behavior to subclasses of vector.backends.numpy.VectorNumpy*D, but that's probably fine. We can just declare that while NumPy doesn't expect subclasses of np.ndarray to have a keepdims argument on its sum method, we do expect subclasses of vector.backends.numpy.VectorNumpy*D to have such an argument. An inheritance subtree can make stronger assumptions that parents of that subtree.

If this is also the case for other reducers, then this PR should probably fix them as well, taking keepdims=False as the default everywhere.

Note: this is how Awkward Array solves the same problem:

https://github.com/scikit-hep/awkward/blob/684a84deffb51c75ba05ec3ca69ad964c3ff5845/src/awkward/operations/ak_sum.py#L15-L23

It handles every argument that NumPy handles, but it does have stronger constraints on them. (Now that I'm looking at it, we seem to be missing initial...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They have this _NoValue to distinguish "user did not specify a keepdims value" from "user explicitly specified the value that would be the default, anyway." The reason they're making that distinction is given in the np.sum documentation:

Oh, that makes sense, I missed this.

Thanks for the detailed explanation! This was very helpful!

Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, so it wasn't a missing implementation of the whole function, just one argument.

Does Vector implement any other reducers? If so, the same mistake might be made there, too.

More inline below. (Technically, I'm requesting changes, but they're suggestions that can just be accepted—no need to redo the review afterward.)

@@ -20,6 +20,7 @@
import typing

import numpy
from numpy._globals import _NoValue, _NoValueType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set keepdims=False by default.

Suggested change
from numpy._globals import _NoValue, _NoValueType

Anything that starts with an underscore is not safe for other libraries to use, so that would include this one.

NumPy's default value for keepdims is effectively False; that's what's used if no keepdims argument is given.

>>> np.sum(np.ones((3, 5)), axis=1)
array([5., 5., 5.])
>>> np.sum(np.ones((3, 5)), axis=1, keepdims=False)
array([5., 5., 5.])
>>> np.sum(np.ones((3, 5)), axis=1, keepdims=True)
array([[5.],
       [5.],
       [5.]])

They have this _NoValue to distinguish "user did not specify a keepdims value" from "user explicitly specified the value that would be the default, anyway." The reason they're making that distinction is given in the np.sum documentation:

If the default value is passed, then keepdims will not be passed through to the sum method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.

presumably to allow for subclasses that did not handle this argument, maybe because they predated it?

If Vector just sets the default keepdims=False, then it would not carry over this behavior to subclasses of vector.backends.numpy.VectorNumpy*D, but that's probably fine. We can just declare that while NumPy doesn't expect subclasses of np.ndarray to have a keepdims argument on its sum method, we do expect subclasses of vector.backends.numpy.VectorNumpy*D to have such an argument. An inheritance subtree can make stronger assumptions that parents of that subtree.

If this is also the case for other reducers, then this PR should probably fix them as well, taking keepdims=False as the default everywhere.

Note: this is how Awkward Array solves the same problem:

https://github.com/scikit-hep/awkward/blob/684a84deffb51c75ba05ec3ca69ad964c3ff5845/src/awkward/operations/ak_sum.py#L15-L23

It handles every argument that NumPy handles, but it does have stronger constraints on them. (Now that I'm looking at it, we seem to be missing initial...)

src/vector/backends/numpy.py Outdated Show resolved Hide resolved
src/vector/backends/numpy.py Outdated Show resolved Hide resolved
@Saransh-cpp
Copy link
Member Author

Thanks for the review!

Does Vector implement any other reducers? If so, the same mistake might be made there, too.

Yes, just double checked, none of them makes the same mistake.

I'll create a quick bug-fix release after merging this PR.

@Saransh-cpp Saransh-cpp merged commit abeb0cc into scikit-hep:main Sep 7, 2023
16 checks passed
@Saransh-cpp Saransh-cpp deleted the fix-sum-keepdims branch September 7, 2023 15:31
@Saransh-cpp Saransh-cpp added this to the v1.1.1 milestone Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The problem described is something that must be fixed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Summation of Vectors (defined from px,py,pz,M data) does not work as expected
2 participants