Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding attrs at the SpatialData object level #711

Merged
merged 13 commits into from
Dec 1, 2024

Conversation

quentinblampey
Copy link
Contributor

@quentinblampey quentinblampey commented Sep 23, 2024

As described in issue #404, it is useful to have attributes at the SpatialData object level. This will also be useful in Sopa, to simplify the API.

Example Usage

import spatialdata

from spatialdata.datasets import blobs

sdata = blobs()

sdata.attrs # empty dictionnary
sdata.attrs["test"] = 12

sdata.write("blobs.zarr") # this saves the attributes on disk

sdata = spatialdata.read_zarr("blobs.zarr") # the attributes are read normally

sdata.attrs["test"] # returns 12

What do you think @LucaMarconato?

Check-list

  • Add attrs attribute to the SpatialData object
  • Can write the attrs on disk when the data is not backed
  • Can write the attrs on disk when the data is already backed
  • Can read the attrs from disk
  • Pass the attrs to new SpatialData objects
  • Merge conflict keys in attrs during concatenation
  • Update the format version (see Luca's comment)

Copy link

codecov bot commented Sep 23, 2024

Codecov Report

Attention: Patch coverage is 88.60759% with 9 lines in your changes missing coverage. Please review.

Project coverage is 91.78%. Comparing base (2a0c23a) to head (4546bea).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/spatialdata.py 92.00% 4 Missing ⚠️
src/spatialdata/_io/format.py 71.42% 4 Missing ⚠️
src/spatialdata/_io/io_zarr.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #711      +/-   ##
==========================================
- Coverage   91.80%   91.78%   -0.03%     
==========================================
  Files          45       45              
  Lines        6959     7045      +86     
==========================================
+ Hits         6389     6466      +77     
- Misses        570      579       +9     
Files with missing lines Coverage Δ
src/spatialdata/_core/_deepcopy.py 98.38% <100.00%> (+0.02%) ⬆️
src/spatialdata/_core/concatenate.py 91.80% <100.00%> (+0.20%) ⬆️
src/spatialdata/_core/operations/_utils.py 92.15% <100.00%> (ø)
src/spatialdata/_core/operations/rasterize.py 90.45% <100.00%> (ø)
src/spatialdata/_core/operations/transform.py 90.82% <100.00%> (ø)
src/spatialdata/_core/query/spatial_query.py 95.50% <100.00%> (ø)
src/spatialdata/_io/io_zarr.py 87.77% <75.00%> (-0.60%) ⬇️
src/spatialdata/_core/spatialdata.py 90.87% <92.00%> (-0.01%) ⬇️
src/spatialdata/_io/format.py 83.22% <71.42%> (-1.13%) ⬇️

... and 3 files with indirect coverage changes

@LucaMarconato LucaMarconato self-assigned this Sep 24, 2024
@LucaMarconato
Copy link
Member

LucaMarconato commented Sep 26, 2024

Thanks for the PR.

  • before merging we need to update the format version; we probably should add a format for the SpatialData container; something asked also by several users (we already have a format for the single elements, which would be independent from the format of the container).

@quentinblampey
Copy link
Contributor Author

Hi @LucaMarconato, have you started working on your last comment? If not, do you want me to have a look (although I'm not exactly sure to understand what you want to do)?

@quentinblampey
Copy link
Contributor Author

I noticed we are missing a feature: after we save a SpatialData object, if we add a new attr, there is currently no way to save it on disk. Something like sdata.write_element for the attrs

@Marius1311
Copy link

Just a random note on this: I agree that having SpatialData - level attributes would be super helpful. Beyond that, I think it would be great to be able to annotate coordinate systems - these often correspond to samples, conditions, time-points etc. I can of course annotate all elements within a coordinate system through tables, but I think it would be more direct to somehow directly provide metadata for an entire coordinate system.

@quentinblampey
Copy link
Contributor Author

quentinblampey commented Oct 31, 2024

I also noticed that we want to pass the attrs inside certain functions, e.g. when performing a query (the attrs should be kept). I made a commit for this

The only concern is when we concatenate multiple SpatialData objects, how should we merge the duplicate keys (if any)?

NB: I updated the first post with a check-list

@LucaMarconato
Copy link
Member

@quentinblampey thanks for the update. I haven't updated the format yet, we can work on it this week at the hackathon. Regarding saving to disk, I suggest to extend the write_metadata() method. Finally, regarding .attrs, we could proceed as AnnData does for .uns when calling ad.concat().

@timtreis
Copy link
Member

Extremely interested in this, thanks @quentinblampey !!

@LucaMarconato
Copy link
Member

LucaMarconato commented Dec 1, 2024

@quentinblampey I have finalized this PR, I'm going to merge it now. Thanks for your contribution!

Two comments.

  1. root-level versioning: On disk, I am now adding a version number for the root level metadata, and also I am adding the string of the spatialdata software version that was used to create the metadata (this helps reproducibility). When reading the .zattrs from disk into the SpatialData.attrs slow I am not parsing the two version strings not to clutter the object.
  2. .attrs passed by reference: I have removed the code dict(sdata.attrs) and passing the attrs always by reference. The only exceptions are when I call spatialdata.deepcopy(), where I call copy.deepcopy(), and when the user passes a Mapping that is not a dict. In such case the casting to dict performs a shallow copy.

@LucaMarconato LucaMarconato enabled auto-merge (squash) December 1, 2024 13:21
@LucaMarconato LucaMarconato merged commit 02396ea into scverse:main Dec 1, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

On disk versioning feedback
4 participants