-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] torch DataSet + utils #145
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #145 +/- ##
==========================================
- Coverage 89.59% 86.95% -2.64%
==========================================
Files 24 27 +3
Lines 3854 3995 +141
==========================================
+ Hits 3453 3474 +21
- Misses 401 521 +120
|
I have created an example notebook showing how the Dataset and transforms work for the SpotCropDataset. We can now generate tiles such that they are compatible with monai, torchvision, and pytorch lightning, so we can add tons of augmentations, dataloader cacheing, multiGPU, etc.. Once #132 lands, I can update the spot centroid fetching to use the polygons item. https://gist.github.com/kevinyamauchi/3a1d1c375b084732c5f60b19afabf461 |
I've updated it to now use the Shapes element to get the spot locations. See the example notebook below: https://gist.github.com/kevinyamauchi/77f986889b7626db4ab3c1075a3a3e5e |
minor features of this PR:
|
I have pushed a code that has still open todos and bugs to fix, but it is usable. An example of usage is in this script from the sandbox (which run like this shows a bug. But if you use it
it will work. So it should be good enough for the deep learning example. |
Current todos:
Initial plan, postponed to a new PR (see #184):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevinyamauchi @LucaMarconato minor (maybe nitpick) comments.
one major one regarding examples. They have to be in spatialdata-notebooks
, not here. Everything that is not API should stay there. Ok to have python files and not notebooks (although better notebooks) but would still move.
@@ -0,0 +1,35 @@ | |||
"""This file contains functions to compute the bounding box describing the extent of a spatial element, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is not in the bounding box related module? I would put it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this and other functions that could populate this file are not used for spatial queries so I would not put them in _spatial_query.py
. The complexity of that file increase when implementing non-bounding box queries, so I would keep this code in another place.
spatialdata/_dl/datasets.py
Outdated
from geopandas import GeoDataFrame | ||
from multiscale_spatial_image import MultiscaleSpatialImage | ||
from spatial_image import SpatialImage | ||
from torch.utils.data import Dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make torch as optional depedency, therefore I think in Init of this module or where the ImageTilesDataset is import, something like this would be needed
try:
from spatialdata._dl.datasets import ImageTilesDataset
except ImportError as e:
_error: str | None = str(e)
else:
_error = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you add torch
somewhere in pyproject.toml
or it would be responsibility of the user to install it properly? I would go for the second.
@LucaMarconato I would not extend functionality in this PR. Think priority wise is to add some minimal test and merge right away. We need to do the module conversion of the repo to start testing intersphinx for notebooks and documentation (and also this change will impact io/napari/plot so there will be lot of work to be done there as well. |
Ok, I created an issue to keep track of that. I will make the tests and ask for review (btw I am working on the Xenium + Visium data atm, but I am going to work on this PR right after). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me! I can't approve because I opened the PR. For me, the main things to do before merging:
- make
torch
import optional - make sure functions have docstrings (at least a description of what the function does)
spatialdata/_dl/datasets.py
Outdated
self, | ||
sdata: SpatialData, | ||
regions_to_images: dict[str, str], | ||
tile_dim_in_units: float, | ||
tile_dim_in_pixels: int, | ||
target_coordinate_system: str = "global", | ||
transform: Optional[Callable[[SpatialData], dict[str, SpatialImage]]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a docstring. I think the input parameters aren't clear (e.g., tile_dim_in_units
vs. tile_dim_in_pixels
)
ok I deleted the folder |
Co-authored-by: Giovanni Palla <[email protected]>
@LucaMarconato could you also quickly re add mypy in CI, seems like it's skipped atm, wasn't aware of that |
@giovp restored mypy in ci, we have some problems with the installation |
Gonna merge and increase the coverage in a next pr. |
This PR adds a torch DataSet with some addition utils. Initially, this implements a spot ROI dataset for replicating this squidpy example. The DataSet is implemented such that it is compatible with monai and pytorch lightning, which gives us access to a ton of tooling (e.g., multi-GPU training, tensorboard logging, learning rate schedulers).
This PR requires #132, #143, and image bounding box query with transforms to be merged.