Attempted fix for Issues 26 and 28 by writing CSVs to different files #29

JoshuaHess12 · 2021-07-22T02:28:59Z

Adjusted code to write separate CSVs for each input mask rather than concatenating quantification output into a single CSV file.

ArtemSokolov · 2021-07-22T14:45:05Z

Thank you, @JoshuaHess12
I will test this today.

ArtemSokolov · 2021-07-22T17:23:15Z

This definitely addresses #26. However, #28 still seems to be an issue.

When doing --masks cytoRingMask.tif by itself, cell 2171 doesn't get quantified because it has zero area (which is correct):

  CellID X_centroid Y_centroid  FDX1 CD357  CD1D
    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>
 1   2170      1003.      1301. 2247. 1483. 1064.
 2   2172      1570.      1304. 3153. 1322. 1337.
 3   2173       675.      1304. 3661. 1151. 1253.

However, when quantifying multiple masks with --masks nulceiRingMask.tif cytoRingMask.tif, cells 2172 onward appear to be shifted up, which causes a mismatch between the expression columns and the cell position:

   CellID X_centroid Y_centroid  FDX1 CD357  CD1D
    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>
 1   2170      1001.      1301. 2247. 1483. 1064.
 2   2171      1040.      1299. 3153. 1322. 1337.  <-- The expression of FDX1, CD357 and CD1D is from Cell 2172 above
 3   2172      1570.      1303. 3661. 1151. 1253.  <-- The expression of FDX1, CD357 and CD1D is from Cell 2173 above
 4   2173       676.      1304. 2779. 1468. 1096.  <-- etc.

It seems that there is "cross-talk" between masks, where the cell position is taken from nucleiRingMask, while the expression is taken from cytoRingMask. Ideally, each mask should be quantified in isolation, without any merging or concatenation against other masks.

Steps to reproduce:

Ensure Nextflow and Docker are installed
Download the exemplar: nextflow run labsyspharm/mcmicro/exemplar.nf --name exemplar-001 --path .
Generate segmentation masks: nextflow run labsyspharm/mcmicro --in ./exemplar-001 --stop-at segmentation --s3seg-opts '--segmentCytoplasm segmentCytoplasm --cytoDilation 3 --cytoMethod ring'
Quantify cytoRingMask only:

cd exemplar-001/
mkdir cytoOnly
python CommandSingleCellExtraction.py \
 --image registration/exemplar-001.ome.tif \
 --masks segmentation/unmicst-exemplar-001/cytoRingMask.tif \
 --channel_names markers.csv \
 --output cytoOnly

Quantify both masks:

mkdir both
python CommandSingleCellExtraction.py \
 --image registration/exemplar-001.ome.tif \
 --masks segmentation/unmicst-exemplar-001/nucleiRingMask.tif segmentation/unmicst-exemplar-001/cytoRingMask.tif \
 --channel_names markers.csv \
 --output both

Compare the expression of markers for cells 2172 and 2173:

$ sed -n -e 1p -e '2171,2174p' cytoOnly/exemplar-001_cytoRingMask.csv | cut -d ',' -f 1,11-15 | \
  sed "s/,/\t/g" | sed 's/\(\.[0-9][0-9]\)[0-9]*/\1/g'

CellID  FDX1    CD357   CD1D    X_centroid      Y_centroid
2170    2247.09 1482.88 1064.37 1003.11 1301.05
2172    3153.31 1322.06 1337.24 1569.82 1303.80
2173    3660.94 1150.97 1252.94 675.15  1304.17
2174    2779.16 1468.5  1096.03 815.46  1301.94

$ sed -n -e 1p -e '2171,2174p' both/exemplar-001_cytoRingMask.csv | cut -d ',' -f 1,11-15 | \
  sed "s/,/\t/g" | sed 's/\(\.[0-9][0-9]\)[0-9]*/\1/g'

CellID  FDX1    CD357   CD1D    X_centroid      Y_centroid
2170    2247.09 1482.88 1064.37 1000.95 1301.36
2171    3153.31 1322.06 1337.24 1040.01 1298.77
2172    3660.94 1150.97 1252.94 1569.5  1303.27
2173    2779.16 1468.5  1096.03 675.52  1304.18

ArtemSokolov · 2021-07-22T20:59:55Z

Following up on the above, the likely culprit is in the following:

Here, IDs are extracted from the first mask:
https://github.com/JoshuaHess12/quantification/blob/6c4addabd5888397eb38cbf4a360171b28edede3/SingleCellDataExtraction.py#L133

but then get concatenated to all other tables:
https://github.com/JoshuaHess12/quantification/blob/6c4addabd5888397eb38cbf4a360171b28edede3/SingleCellDataExtraction.py#L151

This concatenation assumes that the same set of cells is present in every mask. Unfortunately, this assumption is violated when a cell has zero area (as in the cytoplasm example above). A suggested fix is to fully isolate the processing of a single mask file, including the extraction of Cell IDs. The outer loop can then call the corresponding function with a single mask a time, which will ensure that no "cross-talk" between masks happens.

JoshuaHess12 · 2021-07-22T21:40:42Z

I think the processing of each mask is already uncoupled in the for loop -- there isn't any crosstalk between the masks with the way this pull request exports the CSVs. The CellIDs are mismatched because regionprops in Python automatically enumerates the CellIDs for us by sweeping from left to right across the image. If there is no cytoplasm object for a cell, then all the other CellIDs for the cytoplasm mask will be shifted up by a value of one in the CellID column of the cytoplasm CSV compared to the nucleus CSV file.

I think one way to fix this would be to do a 1-nearest neighbor assignment from the other CSV files to the nuclei CSV file based on their spatial coordinates. If we assume that the cytoplasm of each cell is always going to be closest to its own nucleus then this may work. We could relabel all other CellID rows in the mismatched CSVs according to the index of their nearest neighbor in the nuclei CSV.

JoshuaHess12 · 2021-07-22T21:48:21Z

Wait, you may be right @ArtemSokolov . Sorry about that. I will look at this a little more.

ArtemSokolov · 2021-07-22T21:52:35Z

Thanks for looking into it, @JoshuaHess12

The CellIDs are mismatched because regionprops in Python automatically enumerates the CellIDs for us by sweeping from left to right across the image.

So, I actually had this concern before also, but I verified with Clarence that regionprops() extracts Cell IDs directly from the mask file, and the upstream segmentation module ensures that Cell IDs match between nucleus and cytoplasm masks, even if some cells are not captured by one of those masks. This is why we see skipped IDs, like in this example.

CellID X_centroid Y_centroid  FDX1 CD357  CD1D
    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>
 1   2170      1003.      1301. 2247. 1483. 1064.
 2   2172      1570.      1304. 3153. 1322. 1337.
 3   2173       675.      1304. 3661. 1151. 1253.

I think the end goal is just to ensure that the output exemplar-001_cytoRingMask.csv is the same, regardless of whether the user calls the tool with --masks cytoRingMask.tif alone or jointly with --masks nucleiRingMask.tif cytoRingMask.tif.

JoshuaHess12 · 2021-07-23T12:27:30Z

@ArtemSokolov No problem! I think this makes sense now. I moved the extraction of Cell IDs inside the loop so that it gets executed separately for each mask. Let me know if the latest commit addresses the issue.

ArtemSokolov · 2021-07-23T15:17:21Z

Work great, @JoshuaHess12! I can confirm that --masks cytoRingMask.tif and --masks nucleiRingMask.tif cytoRingMask.tif produce identical .csv files for the cytoplasm mask.

Attempted fix for MCMICRO Issues 26 and 28 write csvs to different files

6c4adda

uncouple CellIDs in each mask

7e1e295

ArtemSokolov merged commit 466f34d into labsyspharm:master Jul 23, 2021

This was referenced Jul 23, 2021

Provide an option to move the mask suffix from column names to filenames #26

Closed

[Major bug] Mismatch of cells when quantifying multiple masks #28

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempted fix for Issues 26 and 28 by writing CSVs to different files #29

Attempted fix for Issues 26 and 28 by writing CSVs to different files #29

JoshuaHess12 commented Jul 22, 2021

ArtemSokolov commented Jul 22, 2021

ArtemSokolov commented Jul 22, 2021 •

edited

Loading

ArtemSokolov commented Jul 22, 2021

JoshuaHess12 commented Jul 22, 2021

JoshuaHess12 commented Jul 22, 2021

ArtemSokolov commented Jul 22, 2021

JoshuaHess12 commented Jul 23, 2021

ArtemSokolov commented Jul 23, 2021

Attempted fix for Issues 26 and 28 by writing CSVs to different files #29

Attempted fix for Issues 26 and 28 by writing CSVs to different files #29

Conversation

JoshuaHess12 commented Jul 22, 2021

ArtemSokolov commented Jul 22, 2021

ArtemSokolov commented Jul 22, 2021 • edited Loading

Steps to reproduce:

ArtemSokolov commented Jul 22, 2021

JoshuaHess12 commented Jul 22, 2021

JoshuaHess12 commented Jul 22, 2021

ArtemSokolov commented Jul 22, 2021

JoshuaHess12 commented Jul 23, 2021

ArtemSokolov commented Jul 23, 2021

ArtemSokolov commented Jul 22, 2021 •

edited

Loading