Use `NDArray` instead of `ArrayLike` when `dtype` is given #442

yosh-matsuda · 2024-03-02T15:10:16Z

For ndarray typing, I would like to suggest using NDArray instead of ArrayLike when the dtype is given.

The NDArray has data type annotation while ArrayLike appears to treat the data type as Any.

yosh-matsuda · 2024-03-02T15:29:53Z

I am not sure why the test for pypy3.10 failed, but all the tests passed on my fork.
https://github.com/yosh-matsuda/nanobind/actions/runs/8123702629

wjakob · 2024-03-03T08:39:41Z

Doesn't NDArray imply that this is actually a NumPy array? Whereas ArrayLike is a bit more loose ("could in principle be converted into a numpy array").

wjakob · 2024-03-03T08:42:24Z

I'm actually not set on numpy.typing necessarily being the best kind of type to use here, perhaps there are other options as well? This one e.g., seems interesting: https://github.com/patrick-kidger/jaxtyping

yosh-matsuda · 2024-03-05T14:54:03Z

Sorry, I had assumed that stubgen was annotating ndarray for Numpy. However, I think ArrayLike is inappropriate because it would include not only ndarray, but also types for which numpy.ndarray can be constructed, i.e. Python Sequence and Scalar.

module.def("ndarray_func1", [](nb::ndarray<std::int32_t> arr) {});

def ndarray_func1(arg: Annotated[ArrayLike, dict(dtype='int32')], /) -> None:

>>> ndarray_func1([1,2,3])	# type checking OK, but...
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ndarray_func1(): incompatible function arguments. The following argument types are supported:
    1. ndarray_func1(arg: ndarray[dtype=int32], /) -> None

Invoked with types: list

On the other hand, as you say, NDArray seems to be compatible only with numpy.

import torch
import test

test.ndarray_func1(torch.tensor([1, 2, 3]))

mypy

test.py:5: error: Argument 1 to "ndarray_func1" has incompatible type "Tensor"; expected "ndarray[Any, dtype[signedinteger[_32Bit]]]"  [arg-type]

I am currently trying to use jaxtyping, but I have not successfully checked multiple array types at once with dtype specified.

yosh-matsuda · 2024-03-08T05:45:23Z

@wjakob

Since numpy is the only user module in nanobind that can be imported, what about the following idea about post-processing in stubgen?

When ndarray framework is specified:

Annotated[<framework_type_name>, dict(...)] for input and output
For numpy numpy.NDArray is used with dtype

Framework is not specified for input arguments:

Specify the array type to annotate for ndarray in the stubgen argument.
- stubgen --numpy --jax --torch --tensorflow (tentative naming) means nb::ndaray will be annotated with like Annotated[np.NDArray[...] | jax.Array | torch.Tensor | tensorflow.(?), dict(...)].
numpy.NDArray is enabled for default

Framework is not specified for return values:

Raw "ndarray" will be used with Annotated

wjakob · 2024-03-08T09:13:37Z

I think that your answer applies to the output end (A function returning an nd-array in a specific framework).

On the input end, the situation is rather more complex. Nanobind will accept anything that implements the buffer protocol or DLPack protocol as input. That could be encoded as follows:

# Contents of a hypothetical nanobind.typing module

from collections.abc import Buffer
from typing import Protocol, Any, TypeAlias, Union
from types import CapsuleType


class DLPackBuffer(Protocol):
    def __dlpack__(
        self,
        stream: Any = None,
        max_version: tuple[int, int] | None = None,
        dl_device: Any | None = None,
        copy: bool | None = None,
    ) -> CapsuleType: ...


NDArray: TypeAlias = Union[Buffer, DLPackBuffer]

yosh-matsuda · 2024-03-09T12:57:08Z

@wjakob Could you review the last commit (force pushed) in this PR? The changes are as follows:

Add array protocol class NDArray for nd-array in stub file based on your suggestion.
- Since implementations of __dlpack__ in various frameworks do not seem to strictly follow Python array API standard, the protocol DLPackBuffer.__dlpack__ has no arguments.
- I checked that numpy, torch, jax arrays are accepted but not tensorflow.
  (I am not sure if tensorflow.python.framework.ops.EagerTensor has __dlpack__)
If the framework is specified in nb::ndarray, the framework type will be Annotated with metadata.
- numpy.ndarray will be replaced with numpy.typing.NDArray[<dtype>].
If not, the typing of nb::ndarray will be Annotated[NDArray, meta].

Please see the example stub file in tests: https://github.com/wjakob/nanobind/blob/ba12ce8ce410bdea65bf03f205f3e8d990019150/tests/test_ndarray_ext.pyi.ref

breathe · 2024-08-10T01:08:43Z

I'm arriving here from this discussion

@wjakob I switched my codebase to build with this fork and am able to get rid of my sed post-processing hack on the pyi file. The generated types when using this branch appear to have essentially the same semantics as what I got by hackily string replacing ArrayLike with npt.NDArray in the generated .pyi file -- this solution works for me!

WKarel · 2024-11-14T14:43:42Z

If it is already clear at compile-time that nanobind will return a concrete numpy.ndarray at run-time (because of binding with nb::ndarray<numpy> as return type), then I'd be very happy if the stubs said so, too. For other containers, this is already the case (e.g. (abstract, liberal) input: Sequence[int] vs. (concrete, strict) output: list[int]). Currently, static type checkers will e.g. flag method calls on the returned numpy.ndarray in this case, because ArrayLike does not provide them:

import nbmod
arr = nbmod.numpyArray()
arr.sum()  # flagged

numpy.ndarray does not support being parameterized with all information that nanobind may have - e.g. one cannot include the information whether an ndarray is read-only. However, providing in a stub as return type a numpy.ndarray seems much tighter to me than "something convertible to a numpy.ndarray" - even if that numpy.ndarray is not parameterized, and even if the stubs say that this "something" has a certain shape, etc.

Also, numpy.ndarray supports __class_getitem__, which returns an accordingly parameterized ndarray. So far, the only documented argument has been a numpy.dtype. However, a shape-tuple can be passed, as well, and the dev-branch finally seems to document that.

I think that a concrete numpy.ndarray, possibly even with a fixed data type and / or a certain shape or number of dimensions would be much more helpful as return type than the current approach using ArrayLike and Annotated. Finally, (using current Python and NumPy versions) I do no longer see a reason for using numpy.typing.NDArray in stubs instead of directly indexing numpy.ndarray.

import numpy as np
from typing import Literal
type VecF[N: int] = np.ndarray[tuple[N], np.dtype[np.float64]]
type VecF2 = VecF[Literal[2]]
type Mat2xF[N: int] = np.ndarray[tuple[Literal[2], N], np.dtype[np.float64]]

def func(vec: VecF2):
    pass

wjakob force-pushed the master branch from 4240a97 to e1cb670 Compare March 3, 2024 19:35

oremanj mentioned this pull request Mar 5, 2024

Add support for keyword-only arguments #448

Merged

yosh-matsuda marked this pull request as draft March 9, 2024 07:54

yosh-matsuda force-pushed the typing-ndarray branch 3 times, most recently from cf0dd0e to ba12ce8 Compare March 9, 2024 11:46

yosh-matsuda marked this pull request as ready for review March 9, 2024 11:52

wjakob mentioned this pull request Mar 9, 2024

Tracking issue: stub generation #420

Closed

wjakob force-pushed the master branch 4 times, most recently from 56d7e93 to e80edb1 Compare March 11, 2024 17:04

wjakob force-pushed the master branch 4 times, most recently from c30294a to af57451 Compare March 22, 2024 08:46

wjakob force-pushed the master branch from 5dea297 to 4148e83 Compare April 2, 2024 14:23

wjakob force-pushed the master branch 4 times, most recently from d7117a4 to 983d6c0 Compare May 22, 2024 15:28

yosh-matsuda force-pushed the typing-ndarray branch from ba12ce8 to 2651557 Compare August 17, 2024 09:02

Add NDArray protocol class for nd-array annotations

579a69e

yosh-matsuda force-pushed the typing-ndarray branch from 2651557 to 579a69e Compare August 17, 2024 09:14

wjakob force-pushed the master branch from d022f72 to d78ccba Compare August 21, 2024 01:48

wjakob force-pushed the master branch 2 times, most recently from f9e5e0b to 30e96b7 Compare September 9, 2024 14:54

wjakob force-pushed the master branch from 96cca6c to ee23846 Compare September 20, 2024 02:01

wjakob force-pushed the master branch 2 times, most recently from f3e2796 to bff96e2 Compare October 4, 2024 03:20

wjakob force-pushed the master branch from 046c7a1 to e262b7c Compare October 16, 2024 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `NDArray` instead of `ArrayLike` when `dtype` is given #442

Use `NDArray` instead of `ArrayLike` when `dtype` is given #442

yosh-matsuda commented Mar 2, 2024

yosh-matsuda commented Mar 2, 2024

wjakob commented Mar 3, 2024

wjakob commented Mar 3, 2024

yosh-matsuda commented Mar 5, 2024 •

edited

Loading

yosh-matsuda commented Mar 8, 2024

wjakob commented Mar 8, 2024

yosh-matsuda commented Mar 9, 2024

breathe commented Aug 10, 2024 •

edited

Loading

WKarel commented Nov 14, 2024 •

edited

Loading

Use NDArray instead of ArrayLike when dtype is given #442

Are you sure you want to change the base?

Use NDArray instead of ArrayLike when dtype is given #442

Conversation

yosh-matsuda commented Mar 2, 2024

yosh-matsuda commented Mar 2, 2024

wjakob commented Mar 3, 2024

wjakob commented Mar 3, 2024

yosh-matsuda commented Mar 5, 2024 • edited Loading

yosh-matsuda commented Mar 8, 2024

wjakob commented Mar 8, 2024

yosh-matsuda commented Mar 9, 2024

breathe commented Aug 10, 2024 • edited Loading

WKarel commented Nov 14, 2024 • edited Loading

Use `NDArray` instead of `ArrayLike` when `dtype` is given #442

Use `NDArray` instead of `ArrayLike` when `dtype` is given #442

yosh-matsuda commented Mar 5, 2024 •

edited

Loading

breathe commented Aug 10, 2024 •

edited

Loading

WKarel commented Nov 14, 2024 •

edited

Loading