-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add annotations to the last 8 functions in numpy.core.fromnumeric #16729
Conversation
The 8 respective functions, all located in `np.core.fromnumeric`, consist of: * `prod` * `cumprod` * `ndim` * `size` * `around` * `mean` * `std` * `var`
np.prod(a, axis=1.0) # E: No overload variant of "prod" matches argument type | ||
np.prod(a, out=False) # E: No overload variant of "prod" matches argument type | ||
np.prod(a, keepdims=1.0) # E: No overload variant of "prod" matches argument type | ||
np.prod(a, initial=int) # E: No overload variant of "prod" matches argument type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is legal for suitable a:
>>> a = np.array([], dtype=object)
>>> np.prod(a, initial=int)
int
Or for an almost plausible use:
>>> from ctypes import c_uint8
>>> shape = np.array([1, 2, 3])
>>> np.prod(shape.astype(object), initial=c_uint8)
# a ctypes array type
numpy.core.fromnumeric.c_ubyte_Array_1_Array_2_Array_3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, this also proves that the return type of np.prod
can be anything, not just number
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, this also proves that the return type of
np.prod
can be anything, not justnumber
.
If I'm not mistaken this is limited to, the notoriously difficult to type, object arrays.
If so I feel it'd be better to leave things as they are for now, especially since this appears to be a very specific (seemingly undocumented?) situation.
The alternative would be to set number
and the return type to Any
,
which is practically useless from a typing standpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially since this appears to be a very specific (seemingly undocumented?) situation.
How so? initial
is documented as a "scalar", not as a number.
Perhaps the compromise is to stick # TODO: This is actually legal
or similar by each test case that currently expects and results in an error, but should not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so?
initial
is documented as a "scalar", not as a number.
That's actually a good point; I more or less conflated scalar with numerical scalar here.
Perhaps the compromise is to stick # TODO: This is actually legal or similar by each test case that currently expects and results in an error, but should not.
I agree, I'll push an update in a bit.
As for the future:
What I've gathered is, in the case of object arrays, that np.prod()
simply falls back to the .__mult__()
method of the passed arrays individual elements and hence all that's required of initial
is that its type is compatible with aforementioned elements.
For object arrays we may or may not be able to easily express this with a __mult__
Protocol once ndarray is generic over its dtype, though this will depend on the exact implementation details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the compromise is to stick
# TODO: This is actually legal
or similar by each test case that currently expects and results in an error, but should not.
Added in 3bc51b8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For object arrays we may or may not be able to easily express this with a
__mult__
Protocol once ndarray is generic over its dtype, though this will depend on the exact implementation details.
I wouldn't personally bother. Simply having prod(ndarray[int], initial: int) -> int
, prod(ndarray[object], initial: object) -> object
etc is all that's really needed. The details of exactly what the object loops of ufuncs do and do not support is not really interesting.
numpy/__init__.pyi
Outdated
@overload | ||
def var( | ||
a: ArrayLike, | ||
axis: None = ..., | ||
dtype: DtypeLike = ..., | ||
out: Optional[ndarray] = ..., | ||
ddof: int = ..., | ||
keepdims: Literal[False] = ..., | ||
) -> number: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number
isn't correct when out
is specified:
>>> np.var([1], out=np.zeros(()))
array(0.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies for most but not all functions in this patch with out
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This applies for most but not all functions in this patch with
out
.
Considering the widespread nature of this issue I'd propose to save it for a future maintenance pull request and fix it all at once (as there are already quite a few older annotated functions with out
parameter).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the meantime then, it would probably be a good idea to annotate these with Union[number, ndarray]
- weak annotations are better than incorrect ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it turns out is fairly easy way to fix this: out: Optional[ndarray]
must be changed into out: None
when a number
is returned.
I have noticed some inconsistent behavior when using out
in combination with passing a number
though,
as in such case another number
is returned rather than a ndarray
.
Builtin scalars, builtin sequences and ndarray
s do all return an ndarray
(see example below).
In [1]: import numpy as np
In [2]: np.__version__
Out[2]: '1.20.0.dev0+62f26cf'
In [3]: np.var(np.int64(1), out=np.zeros(()))
Out[3]: 0.0
In [4]: np.var(1, out=np.zeros(()))
Out[4]: array(0.)
In [5]: np.var([1], out=np.zeros(()))
Out[5]: array(0.)
In [6]: np.var(np.ones(1), out=np.zeros(()))
Out[6]: array(0.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is arguably a bug in np.var
, when out
is given, it must be the return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number
isn't correct whenout
is specified:
Fixed as of 5a42440.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is arguably a bug in
np.var
, whenout
is given, it must be the return value.
I'll double check which (currently annotated) functions are affected by this this and create an issue afterwards (see #16734).
Clarified that the comment holds for all `np.ufunc.reduce()` wrappers
Return `out` if `out != None`
keepdims: Literal[False] = ..., | ||
initial: _NumberLike = ..., | ||
where: _ArrayLikeBool = ..., | ||
) -> number: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am assuming we are allowing number subtypes for one but not the other why is that ?
In other words, why is there a difference in return types of the above two overloads ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key difference is that the first one is a TypeVar
while the latter is not.
To illustrate, in the first overload an arbitrary number
subclass (denoted by _Number
)
is provided as input and a value of the same type is returned.
In the second overload we know that an instance of np.int_
is returned,
but there is a catch: np.int_
is a platform-dependent alias for one of the number
subclasses,
which is not something we've been able to easily express as of yet.
Lines 511 to 517 in 92665ab
# TODO(alan): Platform dependent types | |
# longcomplex, longdouble, longfloat | |
# bytes, short, intc, intp, longlong | |
# half, single, double, longdouble | |
# uint_, int_, float_, complex_ | |
# float128, complex256 | |
# float96 |
As a workaround the latter case is now simply annotated as returning a number
instance and
while correct, this is obviously not as specific as it could be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay got it, if my understanding is correct, the functionality is exactly the same if we use number of _Number here (second overload), but it is something we want to change in the future ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
..., but it is something we want to change in the future ?
Correct, once we've dealt with these platform-dependent generic
subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh hang on, I'm actually confusing the issues pertaining int_
and number
here, sorry.
The problem with returningnumber
is more straightforward.
As ArrayLike
is not yet Generic
with respect to the data type we don't have a way yet to correlate the input and output types.
In the future we should be able to use expressions such as the one below (or at least something similar):
>>> from typing import TypeVar
>>> import numpy as np
>>> from numpy.typing import ArrayLike
>>> T = TypeVar('T', bound=np.number)
>>> def func(a: ArrayLike[T]) -> T:
... pass
If you're interested, see #16759 for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks ! IMO, it would be nice to have np.prod(...,dtype=None) in tests.
Are there any more remarks or comments on this pull request? |
Co-authored-by: Eric Wieser <[email protected]>
circleci build failure seems to be unrelated. |
Unfortunately, circleci does not merge-from-master before building. You can ignore the error, merge from master, or rebase off master. |
I'd say let's leave it as it is for now in order to not trigger any more (arguably) redundant CI runs. |
Thanks @BvB93 |
This pull requests adds annotations to 8 functions
numpy.core.fromnumeric
,thus adding annotations to the last untyped functions from aforementioned module.
The functions in question are:
prod()
cumprod()
ndim()
size()
around()
mean()
std()
var()