Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update table model #164

Merged
merged 16 commits into from
Mar 4, 2023
Merged

update table model #164

merged 16 commits into from
Mar 4, 2023

Conversation

giovp
Copy link
Member

@giovp giovp commented Mar 1, 2023

as discussed in #158

  • introduce mandatory region region_key and instance_key
  • rewrite concatenate following implementation
  • support string index io for shapes
  • change type of radius to Union[float, ArrayLike]

changes

  • the empty table case is not supported anymore, we can revert to this behaviour later but for now don't see priority

things leaving out of this PR as discussed in #158

  • validation of type of categorical in for instances and instances in labels (should be done in spatialdata)
    reason:
  • not high priority and postponing to later work to consolidate spatialdata validation (e.g. we need a way to consolidate relationships between elements).

@giovp giovp marked this pull request as ready for review March 2, 2023 21:07
@giovp giovp requested review from LucaMarconato, kevinyamauchi and ivirshup and removed request for LucaMarconato March 2, 2023 21:11
@codecov
Copy link

codecov bot commented Mar 2, 2023

Codecov Report

Merging #164 (030dc77) into main (74a033b) will increase coverage by 3.42%.
The diff coverage is 89.76%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #164      +/-   ##
==========================================
+ Coverage   86.79%   90.22%   +3.42%     
==========================================
  Files          23       23              
  Lines        3401     3570     +169     
==========================================
+ Hits         2952     3221     +269     
+ Misses        449      349     -100     
Impacted Files Coverage Δ
spatialdata/_core/models.py 86.14% <62.85%> (+1.40%) ⬆️
spatialdata/_core/_spatialdata_ops.py 89.14% <91.11%> (-0.77%) ⬇️
spatialdata/_core/_spatial_query.py 93.23% <92.13%> (+16.02%) ⬆️
spatialdata/_core/_spatialdata.py 95.01% <100.00%> (+19.28%) ⬆️
spatialdata/_core/_transform_elements.py 87.87% <100.00%> (ø)
spatialdata/_core/core_utils.py 91.53% <100.00%> (+0.17%) ⬆️
spatialdata/_io/write.py 97.36% <100.00%> (+0.05%) ⬆️
spatialdata/utils.py 87.16% <100.00%> (+10.74%) ⬆️
spatialdata/_io/read.py 97.45% <0.00%> (-1.92%) ⬇️
... and 2 more

@giovp
Copy link
Member Author

giovp commented Mar 2, 2023

actually please don't merge as I forgot to add support for non-numeric indexes in geopandas io


Parameters
----------
sdatas
The spatial data objects to concatenate.
omit_table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep this. Use case: we have a xenium and a visium dataset and we want to merge them keeping only the table of one of the two. To do so we can call concatenate(..., omit_table=True) and then do sdata_concatenated.table = xenium.table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between manually adding the visium image and visium shapes to the xenium spatialdata?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big one: if you add an image to the xenium data, the new layer is saved to disk. On the contrary, concatenate() creates a new in-memory sdata object that is not backed (but the elements are backed from their respective sdata object)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok what about then simply passing one of the two spatialdata without the table? to me it just complicates the signature while the behavior is already impliclty implemetned

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big one: if you add an image to the xenium data, the new layer is saved to disk. On the contrary, concatenate() creates a new in-memory sdata object that is not backed (but the elements are backed from their respective sdata object)

this hidden behavior makes me also think we might want to be explicit about in memory/backed in concatenate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this. I don't think the feature being described fits the definition of concatenation. Maybe this is more of a "merge"?

Also, is super easy for a user to do themselves in the meantime.

example
from functools import reduce
from operator import or_

sdata_w_table = ...
other_sdatas: list[SpatialData]

sdatas = [sdata_w_table, *other_sdatas]

new_sdata = SpatialData(
    table=sdata_w_table.table,
    images=reduce(or_, (sdata.images for sdata in sdatas)),
    labels=reduce(or_, (sdata.labels for sdata in sdatas)),
    shapes=reduce(or_, (sdata.shapes for sdata in sdatas)),
    points=reduce(or_, (sdata.points for sdata in sdatas)),
)

@giovp
Copy link
Member Author

giovp commented Mar 3, 2023

@LucaMarconato once this is merged, can #159 be closed?

# index cannot be string
# https://github.com/zarr-developers/zarr-python/issues/1090
shapes_group.create_dataset(name="Index", data=shapes.index.values)
if shapes.index.dtype.kind == "U" or shapes.index.dtype.kind == "O":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh nice!

@LucaMarconato
Copy link
Member

I'll have a quick double check of the tests that I added in #159, but now mostly is covered here. Btw I reviewed this PR, if you it's finished for me you can merge.

@giovp
Copy link
Member Author

giovp commented Mar 3, 2023

I'll have a quick double check of the tests that I added in #159, but now mostly is covered here. Btw I reviewed this PR, if you it's finished for me you can merge.

I think it was mostly work on concatenate which has been re-added here.

Copy link
Collaborator

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks!

Copy link
Member

@ivirshup ivirshup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, a few suggestions and minor things.

spatialdata/_core/_spatialdata_ops.py Outdated Show resolved Hide resolved
spatialdata/_core/_spatialdata_ops.py Outdated Show resolved Hide resolved
assert type(sdatas) == list
assert len(sdatas) > 0
assert type(sdatas) == list, "sdatas must be a list"
assert len(sdatas) > 0, "sdatas must be a non-empty list"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the special casing for the "one table passed" case?

It's weird to me that it:

  • A reference to the object that was passed in. So any modification of the result would modify the original, breaking the semantics of concatenation.
  • The "renaming" properties of region_key and instance_key arguments are not respected.

Tests for this case would be great.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right

Copy link
Member Author

@giovp giovp Mar 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean this case?

if len(sdatas) == 1:
        return sdatas[0]

removed,agree with logic


Parameters
----------
sdatas
The spatial data objects to concatenate.
omit_table
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this. I don't think the feature being described fits the definition of concatenation. Maybe this is more of a "merge"?

Also, is super easy for a user to do themselves in the meantime.

example
from functools import reduce
from operator import or_

sdata_w_table = ...
other_sdatas: list[SpatialData]

sdatas = [sdata_w_table, *other_sdatas]

new_sdata = SpatialData(
    table=sdata_w_table.table,
    images=reduce(or_, (sdata.images for sdata in sdatas)),
    labels=reduce(or_, (sdata.labels for sdata in sdatas)),
    shapes=reduce(or_, (sdata.shapes for sdata in sdatas)),
    points=reduce(or_, (sdata.points for sdata in sdatas)),
)

Parameters
----------
adata
:class:`anndata.AnnData`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the type of the argument class already be included from the type signature?

Suggested change
:class:`anndata.AnnData`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, think it'd be nice to also implement a docstring processor likw we have in squidpy

spatialdata/_core/models.py Outdated Show resolved Hide resolved
Comment on lines -145 to -146
with pytest.raises(RuntimeError):
_concatenate_tables([table4, table6])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant with the new one

tests/_core/test_spatialdata_operations.py Outdated Show resolved Hide resolved
@LucaMarconato
Copy link
Member

Answering to the comment re omit_table, #164 (comment). Maybe a merge would be better here. Not sure. I would then remove it for now and then we see how to provide this functionality. For the moment in the Xenium + Visium notebook we can either use the reduce() approach (or equivalent) or never merge the two sdata objects (we can do most/all the operations keeping them separate actually).

@ivirshup
Copy link
Member

ivirshup commented Mar 3, 2023

@giovp, actually one other (maybe major) point.

What happens if (for any of the tables) instance key is None?

@LucaMarconato
Copy link
Member

LucaMarconato commented Mar 3, 2023

What happens if (for any of the tables) instance key is None?

Good point! I see that this is not possible for a table created with the parser (see here https://github.com/scverse/spatialdata/blob/80dc1024d397a06120f9bd5b27237b9772a85997/spatialdata/_core/models.py#LL685-L686C65), but a table with instance_key = None would pass validation. I would update the validation to catch this case and raise an exception.

@giovp
Copy link
Member Author

giovp commented Mar 4, 2023

thanks all for feedback! I accepted all suggestions and made couple of modifications

  • explicitly check for regions in uns and obs (spotted one minor bug in filter_by_coordinate_system @LucaMarconato but fixed it)
    that is, adata.uns["spatialdata_attrs"]["regions"] and adata.obs.region_key.unique() need to always match

  • explicitly check for all spatialdata_attrs to be valid (e.g. value of instance_key cannot be None).

  • accepted code suggestion by @ivirshup but added additional check on unique region names that suggested version missed, e.g.:

sdata0 = SpatialData(labels={"sample1":..}, table}
sdata1 = SpatialData(labels={"sample1":..}, table}
concatenate(sdata0, sdata1)
# now throws error, as there is no way to understand to which obs refers which region.

the latter is something that ideally we will change, but it'd require keeping track of region names and rename them in the resulting spatialdata object. If tests pass I'll go on and merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants