Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] torch DataSet + utils #145

Merged
merged 23 commits into from
Mar 14, 2023
Merged

Conversation

kevinyamauchi
Copy link
Collaborator

@kevinyamauchi kevinyamauchi commented Feb 20, 2023

This PR adds a torch DataSet with some addition utils. Initially, this implements a spot ROI dataset for replicating this squidpy example. The DataSet is implemented such that it is compatible with monai and pytorch lightning, which gives us access to a ton of tooling (e.g., multi-GPU training, tensorboard logging, learning rate schedulers).

This PR requires #132, #143, and image bounding box query with transforms to be merged.

@codecov
Copy link

codecov bot commented Feb 20, 2023

Codecov Report

Merging #145 (41d818a) into main (9975a5c) will decrease coverage by 2.64%.
The diff coverage is 34.61%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #145      +/-   ##
==========================================
- Coverage   89.59%   86.95%   -2.64%     
==========================================
  Files          24       27       +3     
  Lines        3854     3995     +141     
==========================================
+ Hits         3453     3474      +21     
- Misses        401      521     +120     
Impacted Files Coverage Δ
spatialdata/_core/data_extent.py 0.00% <0.00%> (ø)
spatialdata/_dataloader/transforms.py 0.00% <0.00%> (ø)
spatialdata/_dataloader/datasets.py 6.84% <6.84%> (ø)
spatialdata/utils.py 84.52% <12.50%> (-3.61%) ⬇️
spatialdata/__init__.py 93.33% <83.33%> (-6.67%) ⬇️
spatialdata/_core/_spatialdata.py 91.20% <86.36%> (-0.27%) ⬇️
spatialdata/_core/_rasterize.py 83.00% <100.00%> (+0.34%) ⬆️
spatialdata/_core/_spatial_query.py 93.82% <100.00%> (-0.42%) ⬇️
spatialdata/_core/core_utils.py 91.91% <100.00%> (-0.27%) ⬇️
spatialdata/_core/models.py 86.14% <100.00%> (ø)

@kevinyamauchi
Copy link
Collaborator Author

kevinyamauchi commented Feb 20, 2023

I have created an example notebook showing how the Dataset and transforms work for the SpotCropDataset. We can now generate tiles such that they are compatible with monai, torchvision, and pytorch lightning, so we can add tons of augmentations, dataloader cacheing, multiGPU, etc.. Once #132 lands, I can update the spot centroid fetching to use the polygons item.

https://gist.github.com/kevinyamauchi/3a1d1c375b084732c5f60b19afabf461

@kevinyamauchi
Copy link
Collaborator Author

I've updated it to now use the Shapes element to get the spot locations. See the example notebook below:

https://gist.github.com/kevinyamauchi/77f986889b7626db4ab3c1075a3a3e5e

@LucaMarconato
Copy link
Member

minor features of this PR:

  • overloading of __get_item__() and __set_item__() for SpatialData.

@LucaMarconato
Copy link
Member

LucaMarconato commented Mar 8, 2023

I have pushed a code that has still open todos and bugs to fix, but it is usable. An example of usage is in this script from the sandbox (which run like this shows a bug.

But if you use it

  • from the coordinate system global
  • only with visium data (not querying the xenium image with the visium cirlces)
  • not cropping the data first but making the tiles from the full data

it will work. So it should be good enough for the deep learning example.

@LucaMarconato
Copy link
Member

LucaMarconato commented Mar 8, 2023

Current todos:

Initial plan, postponed to a new PR (see #184):

  • extend functionality to get tiles from the raw space (not just from the target space)
    • when querying tiles from the raw space, allow to specify either units, either pixels. Not both
    • when querying tiles from the target space, allow to specify also only units (and infer pixels), or only pixels. This is possible only when the data is not a multiscale, otherwise the pixels could not be determined

Copy link
Member

@giovp giovp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinyamauchi @LucaMarconato minor (maybe nitpick) comments.

one major one regarding examples. They have to be in spatialdata-notebooks, not here. Everything that is not API should stay there. Ok to have python files and not notebooks (although better notebooks) but would still move.

spatialdata/_core/_spatialdata.py Outdated Show resolved Hide resolved
@@ -0,0 +1,35 @@
"""This file contains functions to compute the bounding box describing the extent of a spatial element,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is not in the bounding box related module? I would put it there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and other functions that could populate this file are not used for spatial queries so I would not put them in _spatial_query.py. The complexity of that file increase when implementing non-bounding box queries, so I would keep this code in another place.

spatialdata/_dl/datasets.py Outdated Show resolved Hide resolved
from geopandas import GeoDataFrame
from multiscale_spatial_image import MultiscaleSpatialImage
from spatial_image import SpatialImage
from torch.utils.data import Dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make torch as optional depedency, therefore I think in Init of this module or where the ImageTilesDataset is import, something like this would be needed

try:
    from spatialdata._dl.datasets import ImageTilesDataset
except ImportError as e:
    _error: str | None = str(e)
else:
    _error = None

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you add torch somewhere in pyproject.toml or it would be responsibility of the user to install it properly? I would go for the second.

spatialdata/_core/_rasterize.py Outdated Show resolved Hide resolved
spatialdata/_core/_rasterize.py Show resolved Hide resolved
spatialdata/_core/_rasterize.py Outdated Show resolved Hide resolved
@giovp
Copy link
Member

giovp commented Mar 11, 2023

Current todos:

  • wrong tiling result (wrong queried data and wrong content) when tiling a cropped multiscale image (probably due to this: 2 bugs with spatial cropping (with multiscale rasters and when missing the table) #178)

  • extend functionality to get tiles from the raw space (not just from the target space)

    • when querying tiles from the raw space, allow to specify either units, either pixels. Not both
    • when querying tiles from the target space, allow to specify also only units (and infer pixels), or only pixels. This is possible only when the data is not a multiscale, otherwise the pixels could not be determined
  • make tests

@LucaMarconato I would not extend functionality in this PR. Think priority wise is to add some minimal test and merge right away. We need to do the module conversion of the repo to start testing intersphinx for notebooks and documentation (and also this change will impact io/napari/plot so there will be lot of work to be done there as well.

@LucaMarconato
Copy link
Member

Ok, I created an issue to keep track of that. I will make the tests and ask for review (btw I am working on the Xenium + Visium data atm, but I am going to work on this PR right after).

@kevinyamauchi kevinyamauchi marked this pull request as ready for review March 13, 2023 19:52
Copy link
Collaborator Author

@kevinyamauchi kevinyamauchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! I can't approve because I opened the PR. For me, the main things to do before merging:

  • make torch import optional
  • make sure functions have docstrings (at least a description of what the function does)

spatialdata/_core/_rasterize.py Outdated Show resolved Hide resolved
Comment on lines 27 to 33
self,
sdata: SpatialData,
regions_to_images: dict[str, str],
tile_dim_in_units: float,
tile_dim_in_pixels: int,
target_coordinate_system: str = "global",
transform: Optional[Callable[[SpatialData], dict[str, SpatialImage]]] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a docstring. I think the input parameters aren't clear (e.g., tile_dim_in_units vs. tile_dim_in_pixels)

@LucaMarconato
Copy link
Member

one major one regarding examples. They have to be in spatialdata-notebooks, not here. Everything that is not API should stay there. Ok to have python files and not notebooks (although better notebooks) but would still move.

ok I deleted the folder examples and moved this to the sandbox. I'll made these example not to show things to other users but to debug/visually test the spatial query, rasterization and tiler. I am not using notebooks because I use these for debugging, setting breakpoint etc.

@giovp
Copy link
Member

giovp commented Mar 14, 2023

@LucaMarconato could you also quickly re add mypy in CI, seems like it's skipped atm, wasn't aware of that
https://github.com/scverse/spatialdata/blob/main/.pre-commit-config.yaml

@LucaMarconato
Copy link
Member

@giovp restored mypy in ci, we have some problems with the installation

.pre-commit-config.yaml Outdated Show resolved Hide resolved
@LucaMarconato
Copy link
Member

Gonna merge and increase the coverage in a next pr.

@LucaMarconato LucaMarconato merged commit ff458c8 into scverse:main Mar 14, 2023
@giovp giovp mentioned this pull request Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants