-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update table model #164
update table model #164
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #164 +/- ##
==========================================
+ Coverage 86.79% 90.22% +3.42%
==========================================
Files 23 23
Lines 3401 3570 +169
==========================================
+ Hits 2952 3221 +269
+ Misses 449 349 -100
|
actually please don't merge as I forgot to add support for non-numeric indexes in geopandas io |
|
||
Parameters | ||
---------- | ||
sdatas | ||
The spatial data objects to concatenate. | ||
omit_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep this. Use case: we have a xenium and a visium dataset and we want to merge them keeping only the table of one of the two. To do so we can call concatenate(..., omit_table=True)
and then do sdata_concatenated.table = xenium.table
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between manually adding the visium image and visium shapes to the xenium spatialdata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big one: if you add an image to the xenium data, the new layer is saved to disk. On the contrary, concatenate()
creates a new in-memory sdata
object that is not backed (but the elements are backed from their respective sdata
object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok what about then simply passing one of the two spatialdata
without the table? to me it just complicates the signature while the behavior is already impliclty implemetned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big one: if you add an image to the xenium data, the new layer is saved to disk. On the contrary, concatenate() creates a new in-memory sdata object that is not backed (but the elements are backed from their respective sdata object)
this hidden behavior makes me also think we might want to be explicit about in memory/backed in concatenate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this. I don't think the feature being described fits the definition of concatenation. Maybe this is more of a "merge"
?
Also, is super easy for a user to do themselves in the meantime.
example
from functools import reduce
from operator import or_
sdata_w_table = ...
other_sdatas: list[SpatialData]
sdatas = [sdata_w_table, *other_sdatas]
new_sdata = SpatialData(
table=sdata_w_table.table,
images=reduce(or_, (sdata.images for sdata in sdatas)),
labels=reduce(or_, (sdata.labels for sdata in sdatas)),
shapes=reduce(or_, (sdata.shapes for sdata in sdatas)),
points=reduce(or_, (sdata.points for sdata in sdatas)),
)
@LucaMarconato once this is merged, can #159 be closed? |
# index cannot be string | ||
# https://github.com/zarr-developers/zarr-python/issues/1090 | ||
shapes_group.create_dataset(name="Index", data=shapes.index.values) | ||
if shapes.index.dtype.kind == "U" or shapes.index.dtype.kind == "O": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh nice!
I'll have a quick double check of the tests that I added in #159, but now mostly is covered here. Btw I reviewed this PR, if you it's finished for me you can merge. |
I think it was mostly work on concatenate which has been re-added here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good, a few suggestions and minor things.
assert type(sdatas) == list | ||
assert len(sdatas) > 0 | ||
assert type(sdatas) == list, "sdatas must be a list" | ||
assert len(sdatas) > 0, "sdatas must be a non-empty list" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the special casing for the "one table passed" case?
It's weird to me that it:
- A reference to the object that was passed in. So any modification of the result would modify the original, breaking the semantics of concatenation.
- The "renaming" properties of
region_key
andinstance_key
arguments are not respected.
Tests for this case would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean this case?
if len(sdatas) == 1:
return sdatas[0]
removed,agree with logic
|
||
Parameters | ||
---------- | ||
sdatas | ||
The spatial data objects to concatenate. | ||
omit_table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove this. I don't think the feature being described fits the definition of concatenation. Maybe this is more of a "merge"
?
Also, is super easy for a user to do themselves in the meantime.
example
from functools import reduce
from operator import or_
sdata_w_table = ...
other_sdatas: list[SpatialData]
sdatas = [sdata_w_table, *other_sdatas]
new_sdata = SpatialData(
table=sdata_w_table.table,
images=reduce(or_, (sdata.images for sdata in sdatas)),
labels=reduce(or_, (sdata.labels for sdata in sdatas)),
shapes=reduce(or_, (sdata.shapes for sdata in sdatas)),
points=reduce(or_, (sdata.points for sdata in sdatas)),
)
spatialdata/_core/models.py
Outdated
Parameters | ||
---------- | ||
adata | ||
:class:`anndata.AnnData`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the type of the argument class already be included from the type signature?
:class:`anndata.AnnData`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, think it'd be nice to also implement a docstring processor likw we have in squidpy
with pytest.raises(RuntimeError): | ||
_concatenate_tables([table4, table6]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant with the new one
Answering to the comment re |
@giovp, actually one other (maybe major) point. What happens if (for any of the tables) instance key is |
Good point! I see that this is not possible for a table created with the parser (see here https://github.com/scverse/spatialdata/blob/80dc1024d397a06120f9bd5b27237b9772a85997/spatialdata/_core/models.py#LL685-L686C65), but a table with |
Co-authored-by: Isaac Virshup <[email protected]>
for more information, see https://pre-commit.ci
thanks all for feedback! I accepted all suggestions and made couple of modifications
sdata0 = SpatialData(labels={"sample1":..}, table}
sdata1 = SpatialData(labels={"sample1":..}, table}
concatenate(sdata0, sdata1)
# now throws error, as there is no way to understand to which obs refers which region. the latter is something that ideally we will change, but it'd require keeping track of region names and rename them in the resulting spatialdata object. If tests pass I'll go on and merge |
as discussed in #158
region
region_key
andinstance_key
Union[float, ArrayLike]
changes
things leaving out of this PR as discussed in #158
reason: