Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Possible bug] ContainsGroupError when trying to overwrite SpatialData object (overwriting a backed sdata/element) #137

Closed
melonora opened this issue Feb 14, 2023 · 6 comments
Labels
bug 🚨 Something isn't working I/O 💿

Comments

@melonora
Copy link
Collaborator

I am currently retrying writing tiled images and encounter a ContainsGroupError in version 0.0.1.dev0.

Specifically running something similar to this twice with the first time no overwrite results in the error:

sdata = sd.SpatialData(images=images)
sdata.write(<Path>, storage_options={"chunks": (1,imgs.shape[1]/2,imgs.shape[2]/2)}, overwrite=True)

Is this expected behaviour?

@melonora melonora changed the title [Possible bug] [Possible bug] ContainsGroupError when trying to overwrite SpatialData object Feb 14, 2023
@LucaMarconato LucaMarconato added bug 🚨 Something isn't working I/O 💿 labels Feb 14, 2023
@LucaMarconato
Copy link
Member

Hi, thanks for reporting. The errors is due to the fact that when you save a SpatialData object to disk, and when it contains backed elements (Images, Labels, Points), it doesn't load the data in-memory by default but it reads it lazily from disk.

More generally, for the same reason, this happens even when you want to add a single element to an existing SpatialData object with add_image(), add_labels() and add_points().

This is shown in the following example. All works with Shapes (Polygons and Tables), but the equivalent code breaks with Points.

from spatialdata import SpatialData
from tests.conftest import _get_shapes, _get_points
import tempfile
import os

with tempfile.TemporaryDirectory() as tmpdir:
    f = os.path.join(tmpdir, "data.zarr")
    sdata = SpatialData(shapes=_get_shapes())
    print(sdata)
    sdata.write(f)
    e = sdata.shapes['shapes_0']
    sdata.add_shapes('new_shapes', e, overwrite=True)

    # ok
    sdata.add_shapes('new_shapes', e, overwrite=True)
    # ok, beacause shapes are not backed
    sdata.add_shapes('new_shapes', sdata.shapes['new_shapes'], overwrite=True)

with tempfile.TemporaryDirectory() as tmpdir:
    f = os.path.join(tmpdir, "data.zarr")
    sdata = SpatialData(points=_get_points())
    print(sdata)
    sdata.write(f)
    e = sdata.points['points_0']
    sdata.add_points('new_points', e, overwrite=True)

    # ok
    sdata.add_points('new_points', e, overwrite=True)
    # exception (data is deleted, we should raise an error before this happens)
    sdata.add_points('new_points', sdata.points['new_points'], overwrite=True)

In the latest version from main (I suggest doing a develop install of the repo so that you can install the latest version by just pulling the repo (or changing branch)), the error that you got should not happen: you should get an exception before it happens.

Traceback (most recent call last):
  File "/Users/macbook/Library/Application Support/JetBrains/PyCharm2022.3/scratches/scratch_13.py", line 22, in <module>
    sdata.write(f, overwrite=True)
  File "/Users/macbook/embl/projects/basel/spatialdata/spatialdata/_core/_spatialdata.py", line 663, in write
    raise ValueError("Can't overwrite the original file")
ValueError: Can't overwrite the original file

But for add_image(), add_labels() and add_points() no exception is raised and this leads to data loss. This is a critical bug and I'll address it soon.

@LucaMarconato
Copy link
Member

In the near future we will allow the user to rewrite a backed object in-place. An implementation could be to write the new zarr group in a temporary directory and then do an atomic move of the new data to the original one. In this way we also prevent data loss if something goes wrong in between.

For the moment, a quick and dirty workaround is to save the SpatialData object in a new location, and then the .zarr to the previous location, basically a manual version of the above (or feel free to make a pr).

@LucaMarconato LucaMarconato changed the title [Possible bug] ContainsGroupError when trying to overwrite SpatialData object [Possible bug] ContainsGroupError when trying to overwrite SpatialData object (overwriting a backed sdata/element) Feb 14, 2023
@LucaMarconato
Copy link
Member

LucaMarconato commented Feb 14, 2023

Actually I have fixed the bug your encountered and the one that I described, in this PR #138. I used the approach discussed above of writing to another location and then automatically moving the written data. All the test passes, please also check it if works.

Thanks again for reporting this critical bug.

@melonora
Copy link
Collaborator Author

Thanks for looking into it. I will check!

@melonora
Copy link
Collaborator Author

I can confirm that on main no group error is given anymore and properly overwrites. I checked this by also given a different number of images when overwriting. If you are ok with it, this issue can be closed.

@LucaMarconato
Copy link
Member

As discussed in this PR: #138 we decided not to add the support for overwriting the data into the path that is used for backing. This holds true both for SpatialData objects and backed elements (images, labels and points). I kindly invite you and any other users reading this to provide feedback regarding this behavior, since we want to get back to it in the future. Closing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🚨 Something isn't working I/O 💿
Projects
None yet
Development

No branches or pull requests

2 participants