-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: make dtype generic over scalar type #16759
Conversation
class dtype: | ||
class dtype(Generic[_DTypeScalar]): | ||
@overload | ||
def __new__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In reality this is done by the _DTypeMeta
metaclass.
numpy/tests/typing/reveal/dtype.py
Outdated
@@ -0,0 +1,9 @@ | |||
import numpy as np | |||
|
|||
reveal_type(np.dtype(np.float64)) # E: numpy.dtype[numpy.float64*] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some examples of making dtype generic over scalar types working.
numpy/tests/typing/reveal/dtype.py
Outdated
reveal_type(np.dtype(np.float64)) # E: numpy.dtype[numpy.float64*] | ||
reveal_type(np.dtype(np.int64)) # E: numpy.dtype[numpy.int64*] | ||
|
||
# Uh oh, this shouldn't be ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some examples that would need to be sorted out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good thus far!
In all likelihood we'll have to go big on the @overload
s (~1 per initiable generic
subclass) to remedy the issues below.
Ping @seberg since the validity of this approach depends on what's going to happen with |
I like that approach enough to mention it in the draft of NEP 42, but there is nothing quite fixed about that. Although possibly it is the other way around here and typing can be the motivation... Note that there can be DTypes without scalar types. |
Are you referring to structured data types here or something else? In the former case, isn't this mostly a terminology-related issue? |
@BvB93 the issue is around things such as pandas |
When I was thinking about this I was figuring that for something like
somewhere in Pandas and then you could write Maybe I was thinking about it wrong though? |
Yes, sorry, exactly, you would have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a comment on the np.dtype[float64] and not specifically this PR. Now that we have type hinting support, this actually makes this syntax a bit confusing for end users from a dtype perspective IMO. For example, for most use-cases, spelling like List[int]
is an indication to the person reading this code that this is for the type checker and will be ignored or erased at run time. But with np.dtype[float64] the meaning becomes slightly different and it has a specific meaning different from np.dtype[int64] at runtime.
That is starting to change with PEP 585 though. |
Thanks! I missed that completely, that makes sense. |
dc6f3df
to
b3af8d9
Compare
Ok, some progress in dc6f3df:
The reason that from typing import Generic, Type, TypeVar
from typing_extensions import Literal
T = TypeVar('T')
class A(Generic[T]):
def __new__(cls, x: Literal['str']) -> A[str]:
reveal_type(A('str')) gives from typing import Generic, Type, TypeVar
from typing_extensions import Literal
T = TypeVar('T')
class A(Generic[T]):
def __new__(cls, x: Literal['str']) -> A[str]:
...
def __init__(self, x: Literal['str']) -> None:
...
reveal_type(A('str')) gives The remaining issue is this test case:
which works but shouldn't; this should be because the |
Yeah, something specifically for annotating |
numpy/__init__.pyi
Outdated
@overload | ||
def __new__( | ||
cls, | ||
dtype: Type[int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll have to be careful with int
and here as the corresponding a float
np.generic
is platform dependent.
This is particularly true for Windows as it generally (always?) uses 32-bit integers,
both on 32- and 64-bit operating systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I think we want np.intc
, but we haven't added an annotation for it yet!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, I think we want
np.intc
, but we haven't added an annotation for it yet!
On MacOS it seems that np.intc is np.int32
while np.dtype(int) == np.int64
.
I believe np.int_
does to the trick though.
In any case, it appears that the likes of mypy support platform checks (ref), which might be exactly what we need here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.dtype(int)
and np.dtype(np.int_)
are synonyms on all platforms. np.intc
is always the C int
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.dtype(int)
andnp.dtype(np.int_)
are synonyms on all platforms.
Good to know, then this is exactly what we're looking for! (considering this is about adding a np.dtype(int)
overload)
I think the code below should trick unless there are more situations where np.int_ is np.int32
.
if sys.platform == "win32":
int_ = np.int32 # Windows-specific code
else:
int_ = np.int64 # Other systems
class dtype:
def __new__(cls, dtype: Type[int], ...) -> dtype[int_]: ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are
ifs
legal like that in subfiles? If so, issys.platform
whitelisted as one of the things you can access?
A select few of them are (ref).
This includes sys.version_info
(which we've used previously) and apparently also sys.platform
, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mostly correct logic is:
# best guess of size_t for common platforms
if sys.maxsize > 2**32:
intp = np.int64
else:
intp = np.int32
# C long tends to match `size_t`, except on windows where it is always 32 bits.
if sys.platform == "win32":
int_ = np.int32
else:
int_ = np.intp
You might need to make an upstream patch to mypy to support sys.maxsize
.
In the meantime, I suppose you can define intp = Union[np.int32, np.int64]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also potentially fake np.intp
and np.int_
as being their own types. If one were writing code that needed to take variation across systems into account then the type checker complaining about something like intp + int64
by default regardless of platform might be the most helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another option would be to add some sort of dynamically generated .pyi
file (perhaps whenever setup.py
is runned?) for handling all these platform-specific generic
aliases and then import them from there.
Considering there is quite a number of them (ref).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dynamically generated .pyi
file sounds like a nice solution long term-for now I've demoted those returns to dtype[Any]
so as to not get blocked here on solving the platform-dependent dtype problem.
Ok, so @BvB93 pointed out that I need some more string aliases and to get the overload for
So now we need to decide if we actually want to take this path (recall that the goal is making |
Ok, rebased post #16622, which has revealed a few more cases that could be handled better. But, let's not put that work in until we're confident we're actually going to keep this. |
No further movement here-what should we do move forward the discussion about using |
Would be good to get @seberg's opinion here, but I suspect that pinging the mailing list won't help much (most experts are already participating here), so a proposed edit to NEP 42 sounds good to me. From the examples given so far, it's not 100% clear to me if this only impacts typing of |
If that feels better to you, you can also move it to NEP 41. It might be better there, since its more on the concept of DTypes being classes and we have this |
Thanks @person142, that's exactly the answer I was looking for. Side note: I hope we can provide aliases for def inner(x: np.ndarray[Float64], y: np.ndarray[Float64]) -> np.float64: Typed signatures are annoyingly hard to read, so these convenience shorthands are quite useful.
If it's helpful for you please do, but it's not necessary for me right now. Most PRs have been quite clear. This one seemed to be an exception, so I thought I'd ask. |
Adapting It might be possible though to do some magic with PEP 593's new from typing import Union, Sequence, TypeVar
import numpy as np
from numpy.typing import _SupportsArray
T = TypeVar("T", bound=np.generic)
ArrayLike = Union[
T, Sequence[T], _SupportsArray[np.dtype[T]] # etc...
]
def func(a: ArrayLike[T]) -> np.ndarray[np.dtype[T]]: ... |
As discussed in numpy#16759, the new DType classes provide a good path forward for making `ndarray` generic over DType. Update NEP 42 to discuss those applications in more detail.
0a4c9e4
to
85c846c
Compare
Ok, looks like there's consensus on #17447, so I've taken this out of draft and (hopefully) addressed the outstanding review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also there are a number of overloads still missing:
-
complex64
-
complex128
-
float16
-
int8
-
int16
-
uint8
-
uint16
-
uint32
-
uint64
-
str_
-
bytes_
* NEP: update NEP 42 with discussion of type hinting applications As discussed in #16759, the new DType classes provide a good path forward for making `ndarray` generic over DType. Update NEP 42 to discuss those applications in more detail. * NEP: discuss typing for use scalar types in NEP 42 Also clean up the language a bit.
Many more overloads added; some overloads consolidated. I moved the warning about not reordering them willy-nilly up to the top. |
numpy/__init__.pyi
Outdated
"=u8", | ||
"<u8", | ||
">u8", | ||
"L", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave out "L"
- and "l"
-based char codes for now,
as those can correspond to either 64- or 32-bit integers depending on the platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a reference for that? This is surprising given that
In [5]: np.dtype('int64').char
Out[5]: 'l'
In [6]: np.dtype('int32').char
Out[6]: 'i'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a maybe not as explicit as it should be, but the generic docs do mention platform specifics
(these compatible: ...
statements). A given character code will always correspond to the
same platform-independent type/alias (byte
, short
, intc
, etc.), but the same does not
hold for their associated numerical precision.
On windows for example:
In [1]: import numpy as np
In [2]: np.dtype('int64').char
Out[2]: 'q'
In [3]: np.dtype('int32').char
Out[3]: 'l'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, ditched the character codes for all int types; since they all correspond to C types they are all platform dependent. (Or at least the C standard allows them to be platform independent though they might not be in practice.)
The only comments here that really need addressing are #16759 (comment) and, The PR is good to go afterwards in my opinion. |
This allows representing dtype subclasses via constructs like `np.dtype[np.float64]`.
bcdb251
to
12e3e1a
Compare
The various character codes are for C types, which are all platform-dependent.
Ok, I think all outstanding comments have been addressed. |
@seberg any last thoughts? |
Thanks @person142 . |
Related to #16545.
Note: this is in no way complete, or even necessarily the right way forward. Just wanted to have a discussion about typing dtype subclasses with concrete code examples.
Goal: discuss whether making
dtype
generic over scalar types is a good way to move forward with makingndarray
generic overdtype
.First, a rough summary of where things stand:
ndarray
generic overdtype
(i.e. write things likendarray[dtype]
) because there is only one dtype classndarray
generic overdtype
.List[int]
that means "a list whose elements are instances ofint
, which would not be the case for the dtype classes.Now, one of the first difficulties you hit in using the new dtype subclasses is that they are created dynamically; i.e. there is no concrete
np.Float64
dtype subclass that you can add types for and then writendarray[Float64]
. The subclasses do have a suggestive notation though:Following that, this PR makes
dtype
generic over subclasses ofnp.generic
, so that on the typing level we can writenp.dtype[float64]
to mean "theFloat64
subclass ofdtype
". Some thoughts about thatThis means we are using
dtype[<scalar>]
to represent a subclass ofdtype
. I think that this is compatible with how PEP 585:https://www.python.org/dev/peps/pep-0585/#parameters-to-generics-are-available-at-runtime
is handling things like
list[int]
If you are implementing a new dtype class by subclassing dtype, then it won't be generic over scalar type; consider e.g. this example:
which will be rejected:
error: The type "Type[B]" is not generic and not indexable
. This is good.The
dtype[<scalar>]
syntax is not settled yet on the non-typing level; NEP 42 says "Note: This is currently a possible extension and not yet decided."