Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression in v2.18.0 when setting an item in an object array #1874

Closed
aplowman opened this issue May 13, 2024 · 13 comments · Fixed by #1875
Closed

Regression in v2.18.0 when setting an item in an object array #1874

aplowman opened this issue May 13, 2024 · 13 comments · Fixed by #1875
Labels
bug Potential issues with the zarr-python library

Comments

@aplowman
Copy link

aplowman commented May 13, 2024

Zarr version

v2.18.0

Numcodecs version

0.12.1

Python Version

3.12

Operating System

Windows

Installation

via poetry install

Description

There seems to be a regression in v2.18.0 (compared to v2.17.2) concerning setting items in an object array.

Steps to reproduce

Setup:

import zarr
from numcodecs import MsgPack
import numpy as np

root = zarr.group()
arr = root.create_dataset(
    name="my_dataset",
    shape=0,
    dtype=object,
    object_codec=MsgPack(),
)
new_items = [
    ["A", 1],
    ["B", 2, "hello"],
]
arr_add = np.empty(len(new_items), dtype=object)
arr_add[:] = new_items
arr.append(arr_add)
arr[:] # --> array([list(['A', 1]), list(['B', 2, 'hello'])], dtype=object)

Problem:

arr[0] = ["C", 3] # raises AttributeError: 'list' object has no attribute 'astype' (from Array._process_for_setitem)

Using Zarr v2.17.2, no such exception is raised, and the first item is successfully modified.

Additional output

No response

@aplowman aplowman added the bug Potential issues with the zarr-python library label May 13, 2024
@dcherian
Copy link
Contributor

FWIW I couldn't reproduce this on python 3.11

@aplowman
Copy link
Author

aplowman commented May 13, 2024

I have tried with a simpler environment in Python 3.12 and 3.11 on Windows. I set up a virtual environment (via python -m venv) and installed zarr and msgpack; pip freeze reports:

asciitree==0.3.3
fasteners==0.19
msgpack==1.0.8
numcodecs==0.12.1
numpy==1.26.4
zarr==2.17.2

for both Python 3.11.3 and Python 3.12.3. This works fine, but if I upgrade Zarr to 2.18.0, I see the exception in both Python versions.

Edited to add traceback from Python 3.12.3:

Traceback (most recent call last):
  File "C:\code_local\pythons\zarr_bug.py", line 21, in <module>
    arr[0] = ["C", 3]
    ~~~^^^
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 1452, in __setitem__
    self.set_basic_selection(pure_selection, value, fields=fields)
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 1548, in set_basic_selection
    return self._set_basic_selection_nd(selection, value, fields=fields)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 1938, in _set_basic_selection_nd
    self._set_selection(indexer, value, fields=fields)
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 1991, in _set_selection
    self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 2257, in _chunk_setitem
    self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, fields=fields)
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 2261, in _chunk_setitem_nosync
    cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\code_local\pythons\venv\Lib\site-packages\zarr\core.py", line 2285, in _process_for_setitem
    chunk = value.astype(self._dtype, order=self._order, copy=False)
            ^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'astype'

@jhamman
Copy link
Member

jhamman commented May 13, 2024

@aplowman - would you mind adding a complete traceback to your report above?

@jakirkham
Copy link
Member

Does this happen for you on Linux?

FWIW tried reproducing on macOS with Conda and was unable to

@aplowman
Copy link
Author

I can reproduce the issue on Google Colab, which is Ubuntu: https://colab.research.google.com/drive/1IO3WFOJM5rGJOvve_j-UZahk_t3hke0J?usp=sharing

@jakirkham
Copy link
Member

Thanks for the example! 🙏

Looking through the code paths in the traceback in git blame, the changes appear to be all 2yrs old or older. Maybe I'm missing something?

Am wondering what would have changed that would lead to this issue

The only relevant thing I can think of is this PR ( zarr-developers/numcodecs#417 ), but that landed over 1yr ago

Are others aware of changes that occurred here that might be relevant?

@jakirkham
Copy link
Member

jakirkham commented May 13, 2024

Ok was eventually able to reproduce. The issue goes away when reverting the change in PR: #1800

@dcherian would you be able to take a closer look at the error here?

Edit: Possibly the easiest change is just to revert that code in v2. In v3 breaking changes would be expected (though maybe we could warn about this in v2?). Though not sure if there is a reason we want that optimization in v2 also (if so maybe it needs a closer look)

@dcherian
Copy link
Contributor

dcherian commented May 13, 2024

What did it take to reproduce? Does #1875 fix it? If not, I agree that it's best to revert.

@rabernat
Copy link
Contributor

Since this is about a singleton array (shape=0) I would be very optimistic that #1875 fixes this. Is it really so hard to reproduce?

dcherian added a commit to dcherian/zarr-python that referenced this issue May 14, 2024
@dcherian
Copy link
Contributor

Sorry I misread the reproducer. Now fixed on #1875

@aplowman
Copy link
Author

I'm not sure #1875 fully resolves this issue. I am able to set the item now, but I cannot then retrieve it: https://colab.research.google.com/drive/1hTmv23yTQbTgv2tY0uOBW-Kd1YyasBlu?usp=sharing . The item retrieval stack trace is below.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-4-d3e0db857085>](https://localhost:8080/#) in <cell line: 1>()
----> 1 arr[0]

6 frames
[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in __getitem__(self, selection)
    798             result = self.get_orthogonal_selection(pure_selection, fields=fields)
    799         else:
--> 800             result = self.get_basic_selection(pure_selection, fields=fields)
    801         return result
    802 

[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in get_basic_selection(self, selection, out, fields)
    924             return self._get_basic_selection_zd(selection=selection, out=out, fields=fields)
    925         else:
--> 926             return self._get_basic_selection_nd(selection=selection, out=out, fields=fields)
    927 
    928     def _get_basic_selection_zd(self, selection, out=None, fields=None):

[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in _get_basic_selection_nd(self, selection, out, fields)
    966         indexer = BasicIndexer(selection, self)
    967 
--> 968         return self._get_selection(indexer=indexer, out=out, fields=fields)
    969 
    970     def get_orthogonal_selection(self, selection, out=None, fields=None):

[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in _get_selection(self, indexer, out, fields)
   1341             # allow storage to get multiple items at once
   1342             lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)
-> 1343             self._chunk_getitems(
   1344                 lchunk_coords,
   1345                 lchunk_selection,

[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in _chunk_getitems(self, lchunk_coords, lchunk_selection, out, lout_selection, drop_axes, fields)
   2181         for ckey, chunk_select, out_select in zip(ckeys, lchunk_selection, lout_selection):
   2182             if ckey in cdatas:
-> 2183                 self._process_chunk(
   2184                     out,
   2185                     cdatas[ckey],

[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in _process_chunk(self, out, cdata, chunk_selection, drop_axes, out_is_ndarray, fields, out_selection, partial_read_decode)
   2094         except ArrayIndexError:
   2095             cdata = cdata.read_full()
-> 2096         chunk = self._decode_chunk(cdata)
   2097 
   2098         # select data from chunk

[/content/zarr-python/zarr/core.py](https://localhost:8080/#) in _decode_chunk(self, cdata, start, nitems, expected_shape)
   2378         # ensure correct chunk shape
   2379         chunk = chunk.reshape(-1, order="A")
-> 2380         chunk = chunk.reshape(expected_shape or self._chunks, order=self._order)
   2381 
   2382         return chunk

ValueError: cannot reshape array of size 2 into shape (1,)

@jakirkham
Copy link
Member

@aplowman if you have a moment, could you please test PR ( #1875 )?

@aplowman
Copy link
Author

@aplowman if you have a moment, could you please test PR ( #1875 )?

Works for me! Thanks all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
5 participants