Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relabel block #664

Merged
merged 28 commits into from
Nov 27, 2024
Merged

relabel block #664

merged 28 commits into from
Nov 27, 2024

Conversation

ArneDefauw
Copy link
Contributor

Follow up PR for #588, see discussion #597 for relabeling in case output data is a labels layer (i.e. dims does not contain c) via use of a bit shift.

Copy link

codecov bot commented Aug 7, 2024

Codecov Report

Attention: Patch coverage is 95.55556% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.83%. Comparing base (8c12732) to head (4f06d59).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata/_core/operations/map.py 97.61% 1 Missing ⚠️
src/spatialdata/_types.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #664      +/-   ##
==========================================
+ Coverage   91.80%   91.83%   +0.02%     
==========================================
  Files          45       45              
  Lines        6959     7002      +43     
==========================================
+ Hits         6389     6430      +41     
- Misses        570      572       +2     
Files with missing lines Coverage Δ
src/spatialdata/__init__.py 96.29% <100.00%> (ø)
src/spatialdata/_core/operations/map.py 97.67% <97.61%> (-0.11%) ⬇️
src/spatialdata/_types.py 69.23% <50.00%> (-3.50%) ⬇️

@ArneDefauw ArneDefauw marked this pull request as draft August 7, 2024 14:05
@ArneDefauw
Copy link
Contributor Author

Now also added helper function _relabel_sequential, see https://github.com/ArneDefauw/spatialdata/blob/633a132ddfb6353925bc099114b893df43ad3f2e/src/spatialdata/_core/operations/map.py#L211, as discussed here #597 (comment)

I think the PR is ready for review now.

@ArneDefauw ArneDefauw marked this pull request as ready for review August 8, 2024 09:18
Copy link
Member

@giovp giovp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @ArneDefauw , thanks a lot, super useful. I went back to this function in squidpy, and remember that we also did an additional step to harmonize the labels with dask image

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

any reason for leaving this out? Otherwise for me it is good to go!

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Aug 26, 2024

hi @ArneDefauw , thanks a lot, super useful. I went back to this function in squidpy, and remember that we also did an additional step to harmonize the labels with dask image

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

any reason for leaving this out? Otherwise for me it is good to go!

Hi @giovp ,

thanks for the review!

Yes, I did not include the step to harmonize the labels across chunks due to some issues I observed when cell density is high, see also my previous comments.
#597 (comment)
#597 (comment)

I make it a little bit more concrete with an example:

So this is map_raster on a labels layer without harmonizing labels:

#I use the harpy package (https://github.com/saeyslab/harpy) for downloading of some test data
from sparrow.datasets import mibi_example
sdata = mibi_example()
def copy_labels( arr ):
    return arr

results=map_raster( sdata[ "masks_whole" ].chunk( 212 ), func=copy_labels, blockwise=True, relabel=True, )
from spatialdata import read_zarr

sdata[ "masks_whole_copied_relabeled_no_merging" ] = results
sdata.write_element( "masks_whole_copied_relabeled_no_merging" )
sdata = read_zarr( sdata.path )

Then without using the harmonizing of labels I get:
Screenshot 2024-08-26 at 11 35 57

and with harmonizing of the labels using

from dask_image.ndmeasure._utils._label import (
            connected_components_delayed,
            label_adjacency_graph,
            relabel_blocks,
        )

# max because labels are not continuous (and won't be continuous)
label_groups = label_adjacency_graph(img, None, img.max())
new_labeling = connected_components_delayed(label_groups)
relabeled = relabel_blocks(img, new_labeling)

I get:

Screenshot 2024-08-26 at 11 30 33

Which is not great, cells on the boundary of the chunks are merged together.

A solution for this, is using this approach:
https://github.com/gletort/NeubiasPasteur2023_AdvancedCellPose/blob/main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z for merging chunks back together.
Also implemented here https://github.com/saeyslab/harpy/blob/e4966aaf12ff0c36e301ff27df675c2c96f87ed2/src/sparrow/image/segmentation/_segmentation.py#L494,
which fixes much of these issues.

But I do not know if you want this to be included in this PR, see also comment of @LucaMarconato #597 (comment).
I would be happy to prepare a follow up PR.

@giovp
Copy link
Member

giovp commented Sep 2, 2024

hi @ArneDefauw , sorry for late reply.

I took a look at

gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z

it's a nice approach I think, and I wonder if it should be included already here. Meaning, I am not sure how useful it is the output of relabelling without the across-block label harmonization. Yes, relabelling fix one step of the chunked segmentation, but the result is not really actionable/useful right? Or am I missing something?

also @LucaMarconato wdyt? you can take a look at the output of the excellent post from @ArneDefauw before to understand visually what is the problem

@ArneDefauw
Copy link
Contributor Author

ArneDefauw commented Sep 17, 2024

hi @ArneDefauw , sorry for late reply.

I took a look at

gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py?rgh-link-date=2024-07-08T08%3A01%3A09Z

it's a nice approach I think, and I wonder if it should be included already here. Meaning, I am not sure how useful it is the output of relabelling without the across-block label harmonization. Yes, relabelling fix one step of the chunked segmentation, but the result is not really actionable/useful right? Or am I missing something?

also @LucaMarconato wdyt? you can take a look at the output of the excellent post from @ArneDefauw before to understand visually what is the problem

Yes, indeed, if you do not know how to do the across-block label harmonization, the result will not be very useful. On the other hand, there are probably many ways to do it.
Maybe spatialdata should at least implement one method to do this label harmonization.
But I think it is up to you guys to decide :)

@melonora
Copy link
Collaborator

@ArneDefauw thanks for the PR and apologies for taking some time to get to it. After discussing with @LucaMarconato and @berombau we decided to put the label block harmonization in squidpy as this library is for having user friendly operations like this. We do not necessarily want the user to have to think about blocks and the implications of that, rather the average user just wants a function relabel.

@ArneDefauw
Copy link
Contributor Author

@ArneDefauw thanks for the PR and apologies for taking some time to get to it. After discussing with @LucaMarconato and @berombau we decided to put the label block harmonization in squidpy as this library is for having user friendly operations like this. We do not necessarily want the user to have to think about blocks and the implications of that, rather the average user just wants a function relabel.

Hi @melonora , that makes sense, thanks for having a look at the PR!

@melonora
Copy link
Collaborator

I adjusted the error message a tiny bit. While the function is private, it is called by public API. A less experienced programmer might not know about bit shift.

@melonora
Copy link
Collaborator

Also made relabel_sequential public given that @ArneDefauw put it as suggestion in the error message which in itself is in a private function, but the error message can be shown by calling a public function.

@LucaMarconato LucaMarconato linked an issue Nov 27, 2024 that may be closed by this pull request
@LucaMarconato
Copy link
Member

Thanks @ArneDefauw for the PR and @melonora for addressing the review comments. We are good to merge!

@LucaMarconato LucaMarconato enabled auto-merge (squash) November 27, 2024 17:04
@LucaMarconato LucaMarconato merged commit 72dbffd into scverse:main Nov 27, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

check on c_coords should be one the output data in map_raster
4 participants