RFC: add a way to convert back to Python (`tolist`) #710

vnmabus · 2023-11-17T08:43:54Z

I think (correct me if I am mistaken) that currently the only way to convert an array object back to a Python representation is to call float, int, bool, etc on 0D arrays. This requires that the user knows the appropriate function to call and does not offer any standard way to retrieve the underlying Python object when the library has additional dtypes, such as object in NumPy.

Moreover, as there is no tolist in the standard, it is also not possible to obtain a list representation of the array (from which the Python object could be retrieved).

I propose to add tolist to the standard, as defined in NumPy and Pytorch to deal with these cases. Although the name is a bit misleading (because for 0D arrays there is no list at all), I think that prior art justifies reusing that name.

The text was updated successfully, but these errors were encountered:

leofang · 2023-11-17T12:31:53Z

There're some difficulties that NumPy folks are discussing: numpy/numpy#24989

vnmabus · 2023-11-17T12:43:30Z

Well, maybe the standard should introduce a new name then, such as topython, so that people are not confused by the name.

rgommers · 2023-11-17T13:45:56Z

The semantic issues with tolist are real, and I'm also not sure that this should be supported for n-D arrays. If there's a need for this, it'd be better to add the relevant dunder method so list(x) works, a separate function or method doesn't seem great.

Would one actually need this for arrays of arbitrary dimensionality? Some real-world examples would be good to see.

vnmabus · 2023-11-17T16:47:37Z

I don't know which dunder method would that be. I think currently list(x) would return a list of 0D arrays for arrays that follow the standard (not current NumPy ndarray), instead of a list of Python types. If you are interested in just a normal list, a possibility would be to offer an iterator over the elements of the array (like NumPy's ndarray.flat), so that you can do list(x.flat). I am not sure if there are use cases for a multidimensional tolist.

My use case was for 0D arrays, more similar to NumPy's ndarray.item(). However, I thought that it was preferable to have one dimension-independent function to retrieve the Python objects (so, similar to how tolist behaves), rather than including just item() only for it to become redundant if something like tolist is added later.

rgommers · 2023-11-17T20:13:27Z

For 0-D arrays, list(float(x)) should work already. Extend a little if it needs to be generic over all dtypes, by checking with isdtype - that's not a bad thing because it's not clear whether you'd want uint* - Python int.

vnmabus · 2023-11-17T22:43:20Z

I think there was a misunderstanding... I do not want a list returned for 0D arrays, but a dtype- independent way to convert them to a Python object that can hold them, that works also for non-standard dtypes.

seberg · 2023-11-20T09:24:51Z

What you need to use is [float(x) for x in arr.ravel()], since iteration behavior is unspecified (assuming you know you want a Python float).

NumPy has the .tolist()/.item() (also object casts actually) which have a preference to convert to the corresponding Python type when deemed reasonable (not saying that what it deems "reasonable" is actually reasonable).

As I said on the NumPy issue, maybe raveling would be the more useful default behavior, although I am not sure... Loosing dimensions is also surprising!
But then I also think that iterating all elements would generally be a nice thing for array objects (although, I realize that would require user teaching and better ways to iterator a single axis).

Whatever the solution, maybe a new name is fine, maybe one should just keep the tolist name but make raveled/flattened=True/False compulsory (i.e. it is undefined if not passed and the "minimal" implementation used for testing would raise).

betatim · 2023-11-21T11:47:26Z

This issue made me wonder about converting from one namespace to another. Say from PyTorch to Numpy. This works:

x = array_api_compat.torch.asarray([1,2,3])
array_api_compat.numpy.asarray(x) # -> array([1, 2, 3])

The reason I was thinking about this was that it would be nice to have a consistent way of converting things. Of course, there is no asarray for normal Python, so this is more of a thought experiment.

oleksandr-pavlyk · 2023-11-21T13:53:34Z

We did discuss a possibility to standardize bringing data from any array object to Python. It would make sense to have a function that would transfer content of array into another type that exposes Python buffer protocol. From here the content could be converted to NumPy, or passed to xp.asarray in another library.

rgommers · 2023-11-22T16:31:03Z

I think there was a misunderstanding... I do not want a list returned for 0D arrays, but a dtype- independent way to convert them to a Python object that can hold them, that works also for non-standard dtypes.

Non-standard dtypes may not have a pure Python equivalent, so that's quite tricky clearly. Things like datetime may work, for different precisions like float128 it's hard to determine whether it's fine to downcast to 64-bit float's, etc. I don't think there should be a too magical do-it-all function. The current issues with numpy's .tolist show that that's a problem. It's easy enough to write for 0-D arrays with the set of dtypes that you care about something like:

def convert_0D_arrays(x):
    if not x.ndim == 0:
        raise ValueError('...')

    if xp.isdtype(x, 'real floating'):
        return float(x)
    elif xp.isdtype(x, 'complex floating'):
        return complex(x)
    # etc.

Static typing is also easier outside of a magical do-it-all function, because you can add the overloads for the different return types.

The reason I was thinking about this was that it would be nice to have a consistent way of converting things.

This can be done with from_dlpack, or with asarray.

vnmabus · 2023-11-22T19:53:48Z

Non-standard dtypes may not have a pure Python equivalent, so that's quite tricky clearly. Things like datetime may work, for different precisions like float128 it's hard to determine whether it's fine to downcast to 64-bit float's, etc. I don't think there should be a too magical do-it-all function. The current issues with numpy's .tolist show that that's a problem. It's easy enough to write for 0-D arrays with the set of dtypes that you care about something like:
def convert_0D_arrays(x):
    if not x.ndim == 0:
        raise ValueError('...')

    if xp.isdtype(x, 'real floating'):
        return float(x)
    elif xp.isdtype(x, 'complex floating'):
        return complex(x)
    # etc.

The thing is that if the arrays implement additional dtypes apart from the standard ones (something allowed in the standard, as far as I know), this is not so easy to do from the user side. Consider for example the datetime extension you mentioned. There is no dunder equivalent like __float__ for datetime. Thus:

datetime(x) will likely not work.
Dlpack does not help here, as it only supports numeric types.
The buffer protocol won't help you either as that format is not supported for a memoryview.

So, what can a user do to retrieve the object?

And note that although it is valid to say "this is not a problem for the standard, as it only requires numerical types with dunder methods", I still see the value in standardizing at least the naming and interface of a function similar to NumPy item that provides the "best" way to represent a scalar quantity as a Python object, as intended by the array developers. Depending on how that is standardized, it could even be the case that the value returned for a, for example, float64 dtype is not a Python float. For example, a library that wraps arrays compatible with the standard and adds physical units on top, presenting itself a similar API to the standard, could implement item as returning objects with units attached.

rgommers · 2023-11-22T21:02:40Z

I still see the value in standardizing at least the naming and interface of a function similar to NumPy item that provides the "best" way to represent a scalar quantity as a Python object,

There's a reasonable amount of consensus, among both NumPy devs and devs from other array libraries, that NumPy's scalars were a design mistake. They add a large amount of complexity, and we'd remove them from NumPy if we could (but, backwards compat). So I don't think this is going to fly.

Consider for example the datetime extension you mentioned. There is no dunder equivalent like float for datetime.

There is only one library that supports datetime dtypes, namely NumPy. So you can explicitly handle that case with a NumPy function.

leofang · 2023-12-06T02:49:39Z

Just thinking out loud... If we agree that a "0D list" is an ill-defined construct, perhaps we can at least have a clean .tolist() semantics on unambiguous cases? From a purist perspective, in addition to always get a return value of type list, it's also very good to try and preserve the dimension/shape of an array, for facilitating a correct round trip (tolist -> asarray, or vice versa).

For N-D arrays (N > 0):
- if all axes have nonzero lengths, .tolist() returns an N-nested Python list, with lengths of the inner lists dictated by .shape
  - ex: .shape = (2,), output = [1, 2]
  - ex: .shape = (2, 3), output = [[1, 2, 3], [2, 3, 4]]
  - ...
- if any inner-most, fast-running axis has length 0, that axis is an empty list
  - this is because in order to get a well-defined semantics, .tolist() must assume C order
  - ex: .shape = (0,), output = []
  - ex: .shape = (2, 0), output = [[], []]
  - ex: .shape = (2, 3, 0), output = [[[], [], []], [[], [], []]]
  - ...
- otherwise, we either raise an exception, or make the behavior implementation defined (and not standardize it)
  - ex: .shape = (0, 2)
  - ex: .shape = (2, 0, 4)
  - ex: .shape = (0, 3, 4)
  - ...
For 0-D arrays:
- Same, we either raise an exception (and tell users to use builtin functions like int(), float(), ... to get a Python scalar),
- Or make the behavior implementation defined (and not standardize it)

rgommers · 2023-12-07T11:40:17Z

That seems reasonable @leofang. However, there are going to be other exceptions aside from 0-D arrays, because leaving array land isn't always possible. E.g., what about non-CPU devices or detaching from an autograd graph?

vnmabus · 2023-12-07T12:56:02Z

In case that this is implemented I would rather have the natural implementation for 0D arrays: returning a non-list (either a Python representation of the scalar value itself or the array unchanged) so that array(0).tolist() == array([0]).tolist()[0]. If the name tolist is considered problematic for something that does not necessarily return a list, I would change the name, rather than raising an exception for a case where the natural behavior is obvious.

fcharras · 2023-12-07T14:19:33Z

Adding to

@betatim

This issue made me wonder about converting from one namespace to another. Say from PyTorch to Numpy. This works:
x = array_api_compat.torch.asarray([1,2,3])
array_api_compat.numpy.asarray(x) # -> array([1, 2, 3])
The reason I was thinking about this was that it would be nice to have a consistent way of converting things. Of course, there is no asarray for normal Python, so this is more of a thought experiment.

and

@oleksandr-pavlyk

We did discuss a possibility to standardize bringing data from any array object to Python. It would make sense to have a function that would transfer content of array into another type that exposes Python buffer protocol. From here the content could be converted to NumPy, or passed to xp.asarray in another library.

remarks,

should we use a separate issue that covers inter-namespace conversion specifically rather than tolist ?

I want to emphasize with this usecase I have when trying to adapt code for Array API compliance. There is some code that can't compromise on numerical accuracy and absolutly requires at least float64 precision, but I could use an integrated GPU that supports at most float32 (e.g using mps or xpu backends using pytorch) for everything else. For this I would have to transfer data from device to cpu, run the float64 compute, and transfer back to device. But .to_device("cpu"), this is not part of the standard and some array libraries might not support it (like cupy arrays) so I can't rely on it. .from_dlpack does not support inter-device conversion so it's not appropriate either.

For this usecase an intermediate object that enable inter-device and inter-namespace conversion surely would be practical.

tolist have been mentionned but also conversion to and from numpy is commonly supported:

torch have Tensor.numpy and torch.from_numpy
cupy have cupy.asnumpy and cupy.asarray works with numpy arrays
jax have jax.numpy.array and np.asarray works with jax arrays
dask from_array supports numpy inputs and np.asarray works with dask arrays
tensorflow have Tensor.numpy and tf.convert_to_tensor()
mxnet have NDArray.asnumpy and array supports numpy inputs
dpctl.tensor have asnumpy and dpctl.tensor.asarray supports numpy inputs

wouldn't it be practical to add to the Array API a conversion to numpy, e.g to_numpy or asnumpy ? (from_numpy doesn't seem as necessary since asarray or from_dlpack commonly already works with numpy inputs)

asmeurer · 2023-12-11T19:33:35Z

But .to_device("cpu"), this is not part of the standard and some array libraries might not support it (like cupy arrays) so I can't rely on it.

Related discussion #626

rgommers · 2023-12-11T19:39:41Z

@fcharras thanks for your thoughts! I've copied your comment to gh-626, so we can keep that "to host" topic there, and keep this one for .tolist.

fcharras mentioned this issue Dec 8, 2023

ENH Use Array API in r2_score scikit-learn/scikit-learn#27904

Merged

rgommers mentioned this issue Dec 11, 2023

to_device() -- any way to force back to host "portably?" #626

Closed

rgommers added the API extension Adds new functions or objects to the API. label Dec 11, 2023

kgryte changed the title ~~A way to convert back to Python (tolist)~~ RFC: add a way to convert back to Python (tolist) Apr 4, 2024

kgryte added RFC Request for comments. Feature requests and proposed changes. Needs Discussion Needs further discussion. labels Apr 4, 2024

vnmabus mentioned this issue Jun 20, 2024

RFC: item() to return scalar for arrays with exactly 1 element. #815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: add a way to convert back to Python (`tolist`) #710

RFC: add a way to convert back to Python (`tolist`) #710

vnmabus commented Nov 17, 2023

leofang commented Nov 17, 2023

vnmabus commented Nov 17, 2023

rgommers commented Nov 17, 2023

vnmabus commented Nov 17, 2023

rgommers commented Nov 17, 2023

vnmabus commented Nov 17, 2023

seberg commented Nov 20, 2023

betatim commented Nov 21, 2023

oleksandr-pavlyk commented Nov 21, 2023

rgommers commented Nov 22, 2023

vnmabus commented Nov 22, 2023

rgommers commented Nov 22, 2023

leofang commented Dec 6, 2023

rgommers commented Dec 7, 2023

vnmabus commented Dec 7, 2023

fcharras commented Dec 7, 2023

asmeurer commented Dec 11, 2023

rgommers commented Dec 11, 2023

RFC: add a way to convert back to Python (tolist) #710

RFC: add a way to convert back to Python (tolist) #710

Comments

vnmabus commented Nov 17, 2023

leofang commented Nov 17, 2023

vnmabus commented Nov 17, 2023

rgommers commented Nov 17, 2023

vnmabus commented Nov 17, 2023

rgommers commented Nov 17, 2023

vnmabus commented Nov 17, 2023

seberg commented Nov 20, 2023

betatim commented Nov 21, 2023

oleksandr-pavlyk commented Nov 21, 2023

rgommers commented Nov 22, 2023

vnmabus commented Nov 22, 2023

rgommers commented Nov 22, 2023

leofang commented Dec 6, 2023

rgommers commented Dec 7, 2023

vnmabus commented Dec 7, 2023

fcharras commented Dec 7, 2023

asmeurer commented Dec 11, 2023

rgommers commented Dec 11, 2023

RFC: add a way to convert back to Python (`tolist`) #710

RFC: add a way to convert back to Python (`tolist`) #710