check on `c_coords` should be one the output data in `map_raster` #597

giovp · 2024-06-24T13:40:25Z

You are right, the check on c_coords should be one the output data. I merged already because I need to merge #594, which was opened against the current PR. Please can you make a patch for fixing that behavior?

Originally posted by @LucaMarconato in #588 (comment)

The text was updated successfully, but these errors were encountered:

LucaMarconato · 2024-06-24T14:23:14Z

CC @ArneDefauw

ArneDefauw · 2024-06-25T08:00:02Z

I've added PR for this issue, see #599

I wonder if support should also be added to spatialdata for relabeling across chunks to avoid label collisions if going from image->labels

LucaMarconato · 2024-06-25T12:00:20Z

Thanks! That's an interesting API! It could be implemented making use of the new map_raster() right?

ArneDefauw · 2024-06-26T11:23:02Z

yes, indeed. Would you prefer adding it as a parameter to map_raster()? Say a boolean flag, relabel_blocks, ignored if output type is not a labels layer or if blockwise is False.
Or would you prefer wrapping map_raster in a seperate function, which will take care of relabeling chunks if output layer is a labels layer?

Wouldn't be too difficult to implement, I can prepare a PR if you would be interested.

LucaMarconato · 2024-06-26T12:05:24Z

I think I would wrap it in a separate function, @giovp @melonora wdyt?

giovp · 2024-06-26T14:16:31Z

that sounds really useful @ArneDefauw ! I think it depends a bit, if it's just a single argument, then maybe can be done in map_raster? as in, we have already a very extensive list of functions/methods in the repo, but I also understand on striking a balance of composability and fleixiblity

LucaMarconato · 2024-06-27T13:00:14Z

Good point! If using map_raster() for that use case would feel unnatural, or if the functionality would be "too hidden" I would go for a separate function with an obvious name.

LucaMarconato · 2024-06-27T13:02:45Z

Developing more on this, I guess that signature of the separate function would be

relabel(labels: DataArray | DataTree, relabel: dict[str: str])

while the usage with map_raster() would read as

python
map_raster(data=data, func=lambda x: x, relabel: dict[str, str])

So pretty much equivalent.

I propose to:

set the default func in map_raster() to the identity
try to keep it in map_raster() and show it in the docs and as an example of the map_raster() docstring; if we "feel" it's too hidden we can wrap it into an explicit function.

ArneDefauw · 2024-06-28T09:28:40Z

I agree, I think we can first try to wrap it in map_raster.

@LucaMarconato , I don't think I understand the type of relabel (relabel: dict[str:str]) you propose. I think relabel can be a boolean.
I was planning to implement relabeling via use of a bit shift. This is how it was done in squidpy, and is also how I typically implement it to avoid collisions when I do not know which labels will be predicted for each block. I.e. something like:
...

        num_blocks = x.numblocks
        shift = int(np.prod(num_blocks[0] * num_blocks[1] * num_blocks[2]) - 1).bit_length()

...

        block_num = block_id[0] * (num_blocks[1] * num_blocks[2]) + block_id[1] * (num_blocks[2]) + block_id[2]
        mask: NDArray = labels > 0
        labels[mask] = (labels[mask] << shift) | block_num

giovp · 2024-07-04T11:08:54Z

I think it sounds great @ArneDefauw ! I agree on the type of relabel, I think maybe one thing would have to be checked if the label image would be annotated by a table, whether the segmentation mask ids match

LucaMarconato · 2024-07-05T13:35:38Z

I believe we are thinking of different relabeling use cases.

my interpretation

What I was thinking is of having the relabeling operation as a function that takes as input a mapping between the old labels values and the new label values, with the extra assumption that the mapping is injective. The type would be dict[int, int] (rather than dict[str, str] as I mentioned erroneously before).

The idea is that we could have the unique values [0, 1, 3, 5] in the array, and if the dict is {0: 0, 1: 1, 3: 2, 5: 3}, now we obtain as unique values [0, 1, 2, 3].

The relabeling function is straightforward to implement, and I guess the most common use case would be to give a set of indices without gaps, like in the example above (where the gap between 1, 3 and 3, 5, disappears).

your interpretation

I haven't understood the use case for your code, could you please explain it? What are the properties of the new relabeling?
Furthermore, I have two comments.

With the code above I think that if you have two blocks that contain pixels with the same values (let's say the label 0x000000ff); I think that after the relabeling, this number could become, say, 0x00000fff in one block, but could be a different number, for instance 0x0000f0ff in another block due to the different bits added with | block_num (in my example block_num is 0x00000f00 and 0x0000f000 respectively.
Does your code guarantee that the relabeling is injective? I could imagine that if block_num is something like 0xffffffff, then all the labels would collapse to that number.

final comment

I think it sounds great @ArneDefauw ! I agree on the type of relabel, I think maybe one thing would have to be checked if the label image would be annotated by a table, whether the segmentation mask ids match

I agree, if the relabeling is performed we could also take care of ensuring the annotating table still match.

giovp · 2024-07-08T07:19:14Z

The relabelling is needed because the segmentation masks ids are unique within each block, but they are not connected in the global mask, hence you need to relabel based on the connected segmentation masks across blocks, you can see how it is done in squidpy here: https://github.com/scverse/squidpy/blob/7c5c60c47316f3f75f2d73aa684d13bccfbc2662/src/squidpy/im/_segment.py#L105-L135

I am not sure there is a realistic use case when the user wants to pass a set of segmentation masks ids used to relabel the existing ids. In that case however, it could simply be a lambda function + map_blocks, but the relabelling approach from @ArneDefauw would still be needed.

ArneDefauw · 2024-07-08T08:01:09Z

Now I understand the confusion @LucaMarconato , we are indeed thinking of different use cases :).

I think the following use cases can be isolated.

labels_layer -> labels_layer. Relabeling an existing labels layer. Your use case is providing some utility function (or wrap it in map_raster) to relabel a labels layer. Indeed, relabeling to sequential labels would be the most common use case. A bit similar to the scikit image utility function skimage.segmentation.relabel_sequential, but then implemented for dask arrays. I don't think dask-image implements such function, so there could be a use case for it.
I don't know if people would like more fine grained control over this relabeling apart from a sequential relabeling, so I am not sure if we should support passing a dict.
labels_layer -> labels_layer. A func is passed to map_raster that does not alter id's of the labels. E.g. expansion of the masks, or filtering of masks (for example filtering out cells that are too small or too big). I can not think of an operation that would need to alter id's of the labels, so relabeling as I propose in use case 3 would probably not be necessary if you have a single labels layer, but we can make it optional, even in this case.
image_layer -> labels_layer (segmentation), multiple labels_layers -> labels_layer ( merging of different labels layers to a single labels layer).

a) Segmentation. If your callable converts image layers to labels layers, you are basically doing segmentation, and you need a way to handle the predictions done by e.g. cellpose on each block. Cellpose, or any segmentation algorithm will predict labels ranging from 1 to say 100 on each block. So different cells in different blocks can be assigned the same label, so hence you need a way to make the labels unique across blocks.

'Does your code guarantee that the relabeling is injective? I could imagine that if block_num is something like 0xffffffff, then all the labels would collapse to that number.''
No, collision between labels in different chunks is a theoretical possibility. I think a worst case scenario would be that you have max_uint16+1 chunks (I assume we work in uint32).

max_uint16 = np.iinfo(np.uint16).max
shift=int(max_uint16-1).bit_length() # shift is 16 in this case

And if you then would have labels ranging up to max_uint16+1 in a block, then they would be mapped to 0 if you do max_uint16<<16.
But is this not merely theoretical? max_unint16=65535, which is a huge amount of chunks. I think solution would be to put a sanity assert on the number of blocks and the max label in a block. We can check if collisions would happend, and throw an error message.

As @giovp mentioned, after this relabeling, you still need a way to connect them in the global mask. I found that squidpy's way of connecting them, using dask-image functions like label_adjancency_graph, leads to issues for dense cell regions . So either we choose another method, e.g. as in https://github.com/gletort/NeubiasPasteur2023_AdvancedCellPose/blob/main/distributed_segmentation.py (this method worked better); or we let the user connect the predicted mask by the algorithm of choice. I.e., we let the user solve the chunking artifacts on the border of the chunks.

b) Merging labels layers, i.e. [ label_layer_1, label_layer_2 ] -> merged_label_layer is currently not supported by map_raster because we do not allow passing more than one DataArray. But you would want to handle relabeling in same way as in a.

LucaMarconato · 2024-07-09T23:15:12Z

Ok thank you for the explanation, now it's clear.

I don't know if people would like more fine grained control over this relabeling apart from a sequential relabeling, so I am not sure if we should support passing a dict.

I would default for the sequential labeling then. We are thinking of having a package/some code that deals with image feature extraction. This function could live in such a package and be implemented, as Giovanni suggested, by passing the correct lambda function to map_raster(). (CC @melonora. TLDR; since it's a long issue: you can skip all the text above, in short, we could have a function in the raster manipulation package to relabel existing labels objects. You can stop reading now :D ).

LucaMarconato · 2024-07-09T23:16:58Z

But is this not merely theoretical? max_unint16=65535, which is a huge amount of chunks. I think solution would be to put a sanity assert on the number of blocks and the max label in a block. We can check if collisions would happend, and throw an error message.

You are right, it would be ok.

LucaMarconato · 2024-07-09T23:22:38Z

As @giovp mentioned, after this relabeling, you still need a way to connect them in the global mask. I found that squidpy's way of connecting them, using dask-image functions like label_adjancency_graph, leads to issues for dense cell regions . So either we choose another method, e.g. as in gletort/NeubiasPasteur2023_AdvancedCellPose@main/distributed_segmentation.py (this method worked better); or we let the user connect the predicted mask by the algorithm of choice. I.e., we let the user solve the chunking artifacts on the border of the chunks.

If I am getting the problem right, you are trying to have a way to detect cells that overlapped to a block border, and make sure that their label value is the same globally.

In such a case, in SOPA, the external segmentation algorithms return polygons/multipolygons (or when they return raster, the segmentation is converted to polygons/multipolygons), and then a conflict resolution strategy is applied to the borders. IIRC basically with the same short code shown here (or with some equivalent code):

spatialdata/src/spatialdata/_core/operations/vectorize.py

Line 251 in 330caa0

def _dissolve_on_overlaps(label: int, group: GeoDataFrame) -> GeoDataFrame:

.

Practically speaking, I would not worry about connecting adjacent pixels from different blocks here. The user can always do this manually or rely on strategies such as the one implemented in SOPA, to achieve that.

Please let me know what you think about this.

giovp · 2024-07-10T06:49:02Z

This function could live in such a package and be implemented, as Giovanni suggested, by passing the correct lambda function to map_raster(). (CC @melonora. TLDR; since it's a long issue: you can skip all the text above, in short, we could have a function in the raster manipulation package to relabel existing labels objects. You can stop reading now :D ).

mmh I don't think this requires a separate package, since this processing step might be required right after a map_raster operation. Yes, it could also be needed in other methods that are not currently implemented, but my understanding from @ArneDefauw is that this is desirable in some instances of map_raster usage and so it could be added directly there

LucaMarconato · 2024-07-10T18:08:47Z

@giovp I meant the relabeling of raster types that I described under "my intepretation" here #597 (comment) (sorry for the confusion).

Instead the fix described under "your interpretation" should indeed be implemented here, and I am now convinced that the byte shifting approach would be good.

In conclusion, @ArneDefauw if you could implement the bit shifting approach that would be amazing! And to regard to the "relabeling of segmentation masks as of my interpretation", I'd skip the implementation for now.

ArneDefauw · 2024-07-12T05:46:00Z

@giovp I meant the relabeling of raster types that I described under "my intepretation" here #597 (comment) (sorry for the confusion).

Instead the fix described under "your interpretation" should indeed be implemented here, and I am now convinced that the byte shifting approach would be good.

In conclusion, @ArneDefauw if you could implement the bit shifting approach that would be amazing! And to regard to the "relabeling of segmentation masks as of my interpretation", I'd skip the implementation for now.

Ok, I will implement the bit shifting approach! Should not be too difficult to implement, i can recycle some code here and there :)

Re connecting adjacent pixels from different blocks. I have not tested the method implemented by SOPA, but in my opinion having to convert to a shapes layer to fix these chunking artifacts is an unnecessary conversion step; these chunking arifacts can be fixed working on raster data only, and prevents having to go back and forth between a shapes layer and a labels layer. Also, I wonder if a z dimension is supported if you convert to a shapes layer to fix these artefacts.
We can skip the implementation for now, but I would be happy to pick this up at a later time, if required necessary. I would then propose to put it in a separate function map_segment, which would take care of scaling any Callable that converts images to labels using map_overlap, while taking care of relabeling (bit shift) and fixing chunking artefacts.

So if I understand correctly, for now we do not implement the optional sequential relabeling?

LucaMarconato · 2024-07-12T14:14:41Z

Ok, I will implement the bit shifting approach! Should not be too difficult to implement, i can recycle some code here and there :)

Thanks a lot!

I would then propose to put it in a separate function map_segment, which would take care of scaling any Callable that converts images to labels using map_overlap,

Nice, let's keep this open for a follow up PR. I'd go for the name map_labels at that point, but let's think about this later on.

So if I understand correctly, for now we do not implement the optional sequential relabeling?

Maybe let's have this just as internal function but I would not expose this in a separate API now.

ArneDefauw mentioned this issue Jun 25, 2024

map raster c_coords #599

Merged

ArneDefauw mentioned this issue Aug 7, 2024

relabel block #664

Merged

giovp added element: images 🌌 Pictures of stuff method: rasterization labels Sep 7, 2024

LucaMarconato linked a pull request Nov 27, 2024 that will close this issue

relabel block #664

Merged

LucaMarconato closed this as completed in #664 Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

check on `c_coords` should be one the output data in `map_raster` #597

check on `c_coords` should be one the output data in `map_raster` #597

giovp commented Jun 24, 2024 •

edited

Loading

LucaMarconato commented Jun 24, 2024

ArneDefauw commented Jun 25, 2024

LucaMarconato commented Jun 25, 2024

ArneDefauw commented Jun 26, 2024

LucaMarconato commented Jun 26, 2024

giovp commented Jun 26, 2024

LucaMarconato commented Jun 27, 2024

LucaMarconato commented Jun 27, 2024

ArneDefauw commented Jun 28, 2024 •

edited

Loading

giovp commented Jul 4, 2024

LucaMarconato commented Jul 5, 2024

giovp commented Jul 8, 2024 •

edited

Loading

ArneDefauw commented Jul 8, 2024 •

edited

Loading

LucaMarconato commented Jul 9, 2024 •

edited

Loading

LucaMarconato commented Jul 9, 2024

LucaMarconato commented Jul 9, 2024

giovp commented Jul 10, 2024

LucaMarconato commented Jul 10, 2024

ArneDefauw commented Jul 12, 2024

LucaMarconato commented Jul 12, 2024

check on c_coords should be one the output data in map_raster #597

check on c_coords should be one the output data in map_raster #597

Comments

giovp commented Jun 24, 2024 • edited Loading

LucaMarconato commented Jun 24, 2024

ArneDefauw commented Jun 25, 2024

LucaMarconato commented Jun 25, 2024

ArneDefauw commented Jun 26, 2024

LucaMarconato commented Jun 26, 2024

giovp commented Jun 26, 2024

LucaMarconato commented Jun 27, 2024

LucaMarconato commented Jun 27, 2024

ArneDefauw commented Jun 28, 2024 • edited Loading

giovp commented Jul 4, 2024

LucaMarconato commented Jul 5, 2024

my interpretation

your interpretation

final comment

giovp commented Jul 8, 2024 • edited Loading

ArneDefauw commented Jul 8, 2024 • edited Loading

LucaMarconato commented Jul 9, 2024 • edited Loading

LucaMarconato commented Jul 9, 2024

LucaMarconato commented Jul 9, 2024

giovp commented Jul 10, 2024

LucaMarconato commented Jul 10, 2024

ArneDefauw commented Jul 12, 2024

LucaMarconato commented Jul 12, 2024

check on `c_coords` should be one the output data in `map_raster` #597

check on `c_coords` should be one the output data in `map_raster` #597

giovp commented Jun 24, 2024 •

edited

Loading

ArneDefauw commented Jun 28, 2024 •

edited

Loading

giovp commented Jul 8, 2024 •

edited

Loading

ArneDefauw commented Jul 8, 2024 •

edited

Loading

LucaMarconato commented Jul 9, 2024 •

edited

Loading