[WIP] torch DataSet + utils #145

kevinyamauchi · 2023-02-20T09:20:05Z

This PR adds a torch DataSet with some addition utils. Initially, this implements a spot ROI dataset for replicating this squidpy example. The DataSet is implemented such that it is compatible with monai and pytorch lightning, which gives us access to a ton of tooling (e.g., multi-GPU training, tensorboard logging, learning rate schedulers).

This PR requires #132, #143, and image bounding box query with transforms to be merged.

codecov · 2023-02-20T09:23:10Z

Codecov Report

Merging #145 (41d818a) into main (9975a5c) will decrease coverage by 2.64%.
The diff coverage is 34.61%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
- Coverage   89.59%   86.95%   -2.64%     
==========================================
  Files          24       27       +3     
  Lines        3854     3995     +141     
==========================================
+ Hits         3453     3474      +21     
- Misses        401      521     +120

Impacted Files	Coverage Δ
spatialdata/_core/data_extent.py	`0.00% <0.00%> (ø)`
spatialdata/_dataloader/transforms.py	`0.00% <0.00%> (ø)`
spatialdata/_dataloader/datasets.py	`6.84% <6.84%> (ø)`
spatialdata/utils.py	`84.52% <12.50%> (-3.61%)`	⬇️
spatialdata/__init__.py	`93.33% <83.33%> (-6.67%)`	⬇️
spatialdata/_core/_spatialdata.py	`91.20% <86.36%> (-0.27%)`	⬇️
spatialdata/_core/_rasterize.py	`83.00% <100.00%> (+0.34%)`	⬆️
spatialdata/_core/_spatial_query.py	`93.82% <100.00%> (-0.42%)`	⬇️
spatialdata/_core/core_utils.py	`91.91% <100.00%> (-0.27%)`	⬇️
spatialdata/_core/models.py	`86.14% <100.00%> (ø)`

kevinyamauchi · 2023-02-20T10:05:05Z

I have created an example notebook showing how the Dataset and transforms work for the SpotCropDataset. We can now generate tiles such that they are compatible with monai, torchvision, and pytorch lightning, so we can add tons of augmentations, dataloader cacheing, multiGPU, etc.. Once #132 lands, I can update the spot centroid fetching to use the polygons item.

https://gist.github.com/kevinyamauchi/3a1d1c375b084732c5f60b19afabf461

kevinyamauchi · 2023-02-20T15:10:39Z

I've updated it to now use the Shapes element to get the spot locations. See the example notebook below:

https://gist.github.com/kevinyamauchi/77f986889b7626db4ab3c1075a3a3e5e

LucaMarconato · 2023-03-07T00:03:26Z

minor features of this PR:

overloading of __get_item__() and __set_item__() for SpatialData.

LucaMarconato · 2023-03-08T22:51:02Z

I have pushed a code that has still open todos and bugs to fix, but it is usable. An example of usage is in this script from the sandbox (which run like this shows a bug.

But if you use it

from the coordinate system global
only with visium data (not querying the xenium image with the visium cirlces)
not cropping the data first but making the tiles from the full data

it will work. So it should be good enough for the deep learning example.

LucaMarconato · 2023-03-08T22:57:01Z

Current todos:

wrong tiling result (wrong queried data and wrong content) when tiling a cropped multiscale image (probably due to this: 2 bugs with spatial cropping (with multiscale rasters and when missing the table) #178)
make tests

Initial plan, postponed to a new PR (see #184):

extend functionality to get tiles from the raw space (not just from the target space)
- when querying tiles from the raw space, allow to specify either units, either pixels. Not both
- when querying tiles from the target space, allow to specify also only units (and infer pixels), or only pixels. This is possible only when the data is not a multiscale, otherwise the pixels could not be determined

when tiling

giovp

@kevinyamauchi @LucaMarconato minor (maybe nitpick) comments.

one major one regarding examples. They have to be in spatialdata-notebooks, not here. Everything that is not API should stay there. Ok to have python files and not notebooks (although better notebooks) but would still move.

spatialdata/_core/_spatialdata.py

giovp · 2023-03-11T17:22:33Z

spatialdata/_core/data_extent.py

@@ -0,0 +1,35 @@
+"""This file contains functions to compute the bounding box describing the extent of a spatial element,


why this is not in the bounding box related module? I would put it there.

this and other functions that could populate this file are not used for spatial queries so I would not put them in _spatial_query.py. The complexity of that file increase when implementing non-bounding box queries, so I would keep this code in another place.

spatialdata/_dl/datasets.py

giovp · 2023-03-11T17:31:34Z

spatialdata/_dl/datasets.py

+from geopandas import GeoDataFrame
+from multiscale_spatial_image import MultiscaleSpatialImage
+from spatial_image import SpatialImage
+from torch.utils.data import Dataset


I would make torch as optional depedency, therefore I think in Init of this module or where the ImageTilesDataset is import, something like this would be needed

try: from spatialdata._dl.datasets import ImageTilesDataset except ImportError as e: _error: str | None = str(e) else: _error = None

would you add torch somewhere in pyproject.toml or it would be responsibility of the user to install it properly? I would go for the second.

spatialdata/_core/_rasterize.py

giovp · 2023-03-11T17:38:50Z

Current todos:

wrong tiling result (wrong queried data and wrong content) when tiling a cropped multiscale image (probably due to this: 2 bugs with spatial cropping (with multiscale rasters and when missing the table) #178)

extend functionality to get tiles from the raw space (not just from the target space)

when querying tiles from the raw space, allow to specify either units, either pixels. Not both

when querying tiles from the target space, allow to specify also only units (and infer pixels), or only pixels. This is possible only when the data is not a multiscale, otherwise the pixels could not be determined

make tests

@LucaMarconato I would not extend functionality in this PR. Think priority wise is to add some minimal test and merge right away. We need to do the module conversion of the repo to start testing intersphinx for notebooks and documentation (and also this change will impact io/napari/plot so there will be lot of work to be done there as well.

LucaMarconato · 2023-03-11T17:42:59Z

Ok, I created an issue to keep track of that. I will make the tests and ask for review (btw I am working on the Xenium + Visium data atm, but I am going to work on this PR right after).

kevinyamauchi

This looks good to me! I can't approve because I opened the PR. For me, the main things to do before merging:

make torch import optional
make sure functions have docstrings (at least a description of what the function does)

spatialdata/_core/_rasterize.py

kevinyamauchi · 2023-03-13T19:59:13Z

spatialdata/_dl/datasets.py

+        self,
+        sdata: SpatialData,
+        regions_to_images: dict[str, str],
+        tile_dim_in_units: float,
+        tile_dim_in_pixels: int,
+        target_coordinate_system: str = "global",
+        transform: Optional[Callable[[SpatialData], dict[str, SpatialImage]]] = None,


please add a docstring. I think the input parameters aren't clear (e.g., tile_dim_in_units vs. tile_dim_in_pixels)

LucaMarconato · 2023-03-14T09:54:42Z

one major one regarding examples. They have to be in spatialdata-notebooks, not here. Everything that is not API should stay there. Ok to have python files and not notebooks (although better notebooks) but would still move.

ok I deleted the folder examples and moved this to the sandbox. I'll made these example not to show things to other users but to debug/visually test the spatial query, rasterization and tiler. I am not using notebooks because I use these for debugging, setting breakpoint etc.

Co-authored-by: Giovanni Palla <[email protected]>

giovp · 2023-03-14T10:44:21Z

@LucaMarconato could you also quickly re add mypy in CI, seems like it's skipped atm, wasn't aware of that
https://github.com/scverse/spatialdata/blob/main/.pre-commit-config.yaml

…patialdata into torch-dataloader

LucaMarconato · 2023-03-14T10:54:45Z

@giovp restored mypy in ci, we have some problems with the installation

.pre-commit-config.yaml

LucaMarconato · 2023-03-14T11:34:25Z

Gonna merge and increase the coverage in a next pr.

kevinyamauchi added 2 commits February 20, 2023 10:14

add sdata-> data dict transform

a3a1a52

add initial dataset

b5fa7c6

fix typos

479d7e2

kevinyamauchi added 2 commits February 20, 2023 15:04

Merge branch 'main' into torch-dataloader

a95ab3c

add shapes to dataset

42706ff

kevinyamauchi and others added 6 commits February 22, 2023 13:58

start multislide

e0bb5d8

Merge branch 'main' into torch-dataloader

3f45b3b

wip, need to merge with rasterize branch

9337549

Merge branch 'feature/rasterize' into torch-dataloader

3ab9398

wip tiling

8902390

added __set_item__() and merge branch 'main' into torch-dataloader

c6bee89

This was referenced Mar 8, 2023

2 bugs with spatial cropping (with multiscale rasters and when missing the table) #178

Closed

(Solved in coming PR) Wrong scale chosen when (re)rasterizing a raster object #179

Closed

tiling still wip, but usable

a307f1b

LucaMarconato added 4 commits March 8, 2023 23:58

Merge branch 'main' into torch-dataloader

354fa3f

fixed mypy

2125bec

type fix

9c2cf75

fixed bug with xarray coordinates in multiscale, fixed wrong centroids

656616b

when tiling

giovp approved these changes Mar 11, 2023

View reviewed changes

kevinyamauchi marked this pull request as ready for review March 13, 2023 19:52

kevinyamauchi commented Mar 13, 2023

View reviewed changes

Apply suggestions from code review

db62b71

Co-authored-by: Giovanni Palla <[email protected]>

LucaMarconato added 2 commits March 14, 2023 11:53

implemented suggestions from code review

edf71bd

Merge branch 'torch-dataloader' of https://github.com/kevinyamauchi/s…

b37691d

…patialdata into torch-dataloader

LucaMarconato added 2 commits March 14, 2023 11:55

Merge branch 'main' into torch-dataloader

a7228c2

fixed test

7a27ef0

timtreis reviewed Mar 14, 2023

View reviewed changes

.pre-commit-config.yaml Outdated Show resolved Hide resolved

LucaMarconato added 2 commits March 14, 2023 12:24

removed numpy=1.22 contraint for mypy

6f4aa7c

mypy now using numpy==1.24

41d818a

LucaMarconato merged commit ff458c8 into scverse:main Mar 14, 2023

giovp mentioned this pull request Mar 14, 2023

modules import hierarchy #103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] torch DataSet + utils #145

[WIP] torch DataSet + utils #145

kevinyamauchi commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading

kevinyamauchi commented Feb 20, 2023 •

edited

Loading

kevinyamauchi commented Feb 20, 2023

LucaMarconato commented Mar 7, 2023

LucaMarconato commented Mar 8, 2023 •

edited

Loading

LucaMarconato commented Mar 8, 2023 •

edited

Loading

giovp left a comment

giovp Mar 11, 2023

LucaMarconato Mar 14, 2023

giovp Mar 11, 2023

LucaMarconato Mar 14, 2023

giovp commented Mar 11, 2023

LucaMarconato commented Mar 11, 2023

kevinyamauchi left a comment

kevinyamauchi Mar 13, 2023

LucaMarconato commented Mar 14, 2023

giovp commented Mar 14, 2023

LucaMarconato commented Mar 14, 2023

LucaMarconato commented Mar 14, 2023

		@@ -0,0 +1,35 @@
		"""This file contains functions to compute the bounding box describing the extent of a spatial element,

[WIP] torch DataSet + utils #145

[WIP] torch DataSet + utils #145

Conversation

kevinyamauchi commented Feb 20, 2023 • edited Loading

codecov bot commented Feb 20, 2023 • edited Loading

Codecov Report

kevinyamauchi commented Feb 20, 2023 • edited Loading

kevinyamauchi commented Feb 20, 2023

LucaMarconato commented Mar 7, 2023

LucaMarconato commented Mar 8, 2023 • edited Loading

LucaMarconato commented Mar 8, 2023 • edited Loading

giovp left a comment

Choose a reason for hiding this comment

giovp Mar 11, 2023

Choose a reason for hiding this comment

LucaMarconato Mar 14, 2023

Choose a reason for hiding this comment

giovp Mar 11, 2023

Choose a reason for hiding this comment

LucaMarconato Mar 14, 2023

Choose a reason for hiding this comment

giovp commented Mar 11, 2023

LucaMarconato commented Mar 11, 2023

kevinyamauchi left a comment

Choose a reason for hiding this comment

kevinyamauchi Mar 13, 2023

Choose a reason for hiding this comment

LucaMarconato commented Mar 14, 2023

giovp commented Mar 14, 2023

LucaMarconato commented Mar 14, 2023

LucaMarconato commented Mar 14, 2023

kevinyamauchi commented Feb 20, 2023 •

edited

Loading

codecov bot commented Feb 20, 2023 •

edited

Loading

kevinyamauchi commented Feb 20, 2023 •

edited

Loading

LucaMarconato commented Mar 8, 2023 •

edited

Loading

LucaMarconato commented Mar 8, 2023 •

edited

Loading