Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NDArray instead of ArrayLike when dtype is given #442

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

yosh-matsuda
Copy link
Contributor

For ndarray typing, I would like to suggest using NDArray instead of ArrayLike when the dtype is given.

The NDArray has data type annotation while ArrayLike appears to treat the data type as Any.

@yosh-matsuda
Copy link
Contributor Author

I am not sure why the test for pypy3.10 failed, but all the tests passed on my fork.
https://github.com/yosh-matsuda/nanobind/actions/runs/8123702629

@wjakob
Copy link
Owner

wjakob commented Mar 3, 2024

Doesn't NDArray imply that this is actually a NumPy array? Whereas ArrayLike is a bit more loose ("could in principle be converted into a numpy array").

@wjakob
Copy link
Owner

wjakob commented Mar 3, 2024

I'm actually not set on numpy.typing necessarily being the best kind of type to use here, perhaps there are other options as well? This one e.g., seems interesting: https://github.com/patrick-kidger/jaxtyping

@yosh-matsuda
Copy link
Contributor Author

yosh-matsuda commented Mar 5, 2024

Sorry, I had assumed that stubgen was annotating ndarray for Numpy. However, I think ArrayLike is inappropriate because it would include not only ndarray, but also types for which numpy.ndarray can be constructed, i.e. Python Sequence and Scalar.

module.def("ndarray_func1", [](nb::ndarray<std::int32_t> arr) {});
def ndarray_func1(arg: Annotated[ArrayLike, dict(dtype='int32')], /) -> None:
>>> ndarray_func1([1,2,3])	# type checking OK, but...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ndarray_func1(): incompatible function arguments. The following argument types are supported:
    1. ndarray_func1(arg: ndarray[dtype=int32], /) -> None

Invoked with types: list

On the other hand, as you say, NDArray seems to be compatible only with numpy.

import torch
import test

test.ndarray_func1(torch.tensor([1, 2, 3]))

mypy

test.py:5: error: Argument 1 to "ndarray_func1" has incompatible type "Tensor"; expected "ndarray[Any, dtype[signedinteger[_32Bit]]]"  [arg-type]

I am currently trying to use jaxtyping, but I have not successfully checked multiple array types at once with dtype specified.

@yosh-matsuda
Copy link
Contributor Author

@wjakob

Since numpy is the only user module in nanobind that can be imported, what about the following idea about post-processing in stubgen?

When ndarray framework is specified:

  • Annotated[<framework_type_name>, dict(...)] for input and output
  • For numpy numpy.NDArray is used with dtype

Framework is not specified for input arguments:

  • Specify the array type to annotate for ndarray in the stubgen argument.
    • stubgen --numpy --jax --torch --tensorflow (tentative naming) means nb::ndaray will be annotated with like Annotated[np.NDArray[...] | jax.Array | torch.Tensor | tensorflow.(?), dict(...)].
  • numpy.NDArray is enabled for default

Framework is not specified for return values:

  • Raw "ndarray" will be used with Annotated

@wjakob
Copy link
Owner

wjakob commented Mar 8, 2024

I think that your answer applies to the output end (A function returning an nd-array in a specific framework).

On the input end, the situation is rather more complex. Nanobind will accept anything that implements the buffer protocol or DLPack protocol as input. That could be encoded as follows:

# Contents of a hypothetical nanobind.typing module

from collections.abc import Buffer
from typing import Protocol, Any, TypeAlias, Union
from types import CapsuleType


class DLPackBuffer(Protocol):
    def __dlpack__(
        self,
        stream: Any = None,
        max_version: tuple[int, int] | None = None,
        dl_device: Any | None = None,
        copy: bool | None = None,
    ) -> CapsuleType: ...


NDArray: TypeAlias = Union[Buffer, DLPackBuffer]

@yosh-matsuda yosh-matsuda marked this pull request as draft March 9, 2024 07:54
@yosh-matsuda yosh-matsuda force-pushed the typing-ndarray branch 3 times, most recently from cf0dd0e to ba12ce8 Compare March 9, 2024 11:46
@yosh-matsuda yosh-matsuda marked this pull request as ready for review March 9, 2024 11:52
@yosh-matsuda
Copy link
Contributor Author

@wjakob Could you review the last commit (force pushed) in this PR? The changes are as follows:

  • Add array protocol class NDArray for nd-array in stub file based on your suggestion.
    • Since implementations of __dlpack__ in various frameworks do not seem to strictly follow Python array API standard, the protocol DLPackBuffer.__dlpack__ has no arguments.
    • I checked that numpy, torch, jax arrays are accepted but not tensorflow.
      (I am not sure if tensorflow.python.framework.ops.EagerTensor has __dlpack__)
  • If the framework is specified in nb::ndarray, the framework type will be Annotated with metadata.
    • numpy.ndarray will be replaced with numpy.typing.NDArray[<dtype>].
  • If not, the typing of nb::ndarray will be Annotated[NDArray, meta].

Please see the example stub file in tests: https://github.com/wjakob/nanobind/blob/ba12ce8ce410bdea65bf03f205f3e8d990019150/tests/test_ndarray_ext.pyi.ref

@breathe
Copy link
Contributor

breathe commented Aug 10, 2024

I'm arriving here from this discussion

@wjakob I switched my codebase to build with this fork and am able to get rid of my sed post-processing hack on the pyi file. The generated types when using this branch appear to have essentially the same semantics as what I got by hackily string replacing ArrayLike with npt.NDArray in the generated .pyi file -- this solution works for me!

@WKarel
Copy link
Contributor

WKarel commented Nov 14, 2024

If it is already clear at compile-time that nanobind will return a concrete numpy.ndarray at run-time (because of binding with nb::ndarray<numpy> as return type), then I'd be very happy if the stubs said so, too. For other containers, this is already the case (e.g. (abstract, liberal) input: Sequence[int] vs. (concrete, strict) output: list[int]). Currently, static type checkers will e.g. flag method calls on the returned numpy.ndarray in this case, because ArrayLike does not provide them:

import nbmod
arr = nbmod.numpyArray()
arr.sum()  # flagged

numpy.ndarray does not support being parameterized with all information that nanobind may have - e.g. one cannot include the information whether an ndarray is read-only. However, providing in a stub as return type a numpy.ndarray seems much tighter to me than "something convertible to a numpy.ndarray" - even if that numpy.ndarray is not parameterized, and even if the stubs say that this "something" has a certain shape, etc.

Also, numpy.ndarray supports __class_getitem__, which returns an accordingly parameterized ndarray. So far, the only documented argument has been a numpy.dtype. However, a shape-tuple can be passed, as well, and the dev-branch finally seems to document that.

I think that a concrete numpy.ndarray, possibly even with a fixed data type and / or a certain shape or number of dimensions would be much more helpful as return type than the current approach using ArrayLike and Annotated. Finally, (using current Python and NumPy versions) I do no longer see a reason for using numpy.typing.NDArray in stubs instead of directly indexing numpy.ndarray.

import numpy as np
from typing import Literal
type VecF[N: int] = np.ndarray[tuple[N], np.dtype[np.float64]]
type VecF2 = VecF[Literal[2]]
type Mat2xF[N: int] = np.ndarray[tuple[Literal[2], N], np.dtype[np.float64]]

def func(vec: VecF2):
    pass

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants