Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempted fix for Issues 26 and 28 by writing CSVs to different files #29

Merged
merged 2 commits into from
Jul 23, 2021
Merged

Conversation

JoshuaHess12
Copy link
Contributor

Adjusted code to write separate CSVs for each input mask rather than concatenating quantification output into a single CSV file.

@ArtemSokolov
Copy link
Collaborator

Thank you, @JoshuaHess12
I will test this today.

@ArtemSokolov
Copy link
Collaborator

ArtemSokolov commented Jul 22, 2021

This definitely addresses #26. However, #28 still seems to be an issue.

When doing --masks cytoRingMask.tif by itself, cell 2171 doesn't get quantified because it has zero area (which is correct):

  CellID X_centroid Y_centroid  FDX1 CD357  CD1D
    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>
 1   2170      1003.      1301. 2247. 1483. 1064.
 2   2172      1570.      1304. 3153. 1322. 1337.
 3   2173       675.      1304. 3661. 1151. 1253.

However, when quantifying multiple masks with --masks nulceiRingMask.tif cytoRingMask.tif, cells 2172 onward appear to be shifted up, which causes a mismatch between the expression columns and the cell position:

   CellID X_centroid Y_centroid  FDX1 CD357  CD1D
    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>
 1   2170      1001.      1301. 2247. 1483. 1064.
 2   2171      1040.      1299. 3153. 1322. 1337.  <-- The expression of FDX1, CD357 and CD1D is from Cell 2172 above
 3   2172      1570.      1303. 3661. 1151. 1253.  <-- The expression of FDX1, CD357 and CD1D is from Cell 2173 above
 4   2173       676.      1304. 2779. 1468. 1096.  <-- etc.

It seems that there is "cross-talk" between masks, where the cell position is taken from nucleiRingMask, while the expression is taken from cytoRingMask. Ideally, each mask should be quantified in isolation, without any merging or concatenation against other masks.

Steps to reproduce:

  1. Ensure Nextflow and Docker are installed
  2. Download the exemplar: nextflow run labsyspharm/mcmicro/exemplar.nf --name exemplar-001 --path .
  3. Generate segmentation masks: nextflow run labsyspharm/mcmicro --in ./exemplar-001 --stop-at segmentation --s3seg-opts '--segmentCytoplasm segmentCytoplasm --cytoDilation 3 --cytoMethod ring'
  4. Quantify cytoRingMask only:
cd exemplar-001/
mkdir cytoOnly
python CommandSingleCellExtraction.py \
 --image registration/exemplar-001.ome.tif \
 --masks segmentation/unmicst-exemplar-001/cytoRingMask.tif \
 --channel_names markers.csv \
 --output cytoOnly
  1. Quantify both masks:
mkdir both
python CommandSingleCellExtraction.py \
 --image registration/exemplar-001.ome.tif \
 --masks segmentation/unmicst-exemplar-001/nucleiRingMask.tif segmentation/unmicst-exemplar-001/cytoRingMask.tif \
 --channel_names markers.csv \
 --output both
  1. Compare the expression of markers for cells 2172 and 2173:
$ sed -n -e 1p -e '2171,2174p' cytoOnly/exemplar-001_cytoRingMask.csv | cut -d ',' -f 1,11-15 | \
  sed "s/,/\t/g" | sed 's/\(\.[0-9][0-9]\)[0-9]*/\1/g'

CellID  FDX1    CD357   CD1D    X_centroid      Y_centroid
2170    2247.09 1482.88 1064.37 1003.11 1301.05
2172    3153.31 1322.06 1337.24 1569.82 1303.80
2173    3660.94 1150.97 1252.94 675.15  1304.17
2174    2779.16 1468.5  1096.03 815.46  1301.94

$ sed -n -e 1p -e '2171,2174p' both/exemplar-001_cytoRingMask.csv | cut -d ',' -f 1,11-15 | \
  sed "s/,/\t/g" | sed 's/\(\.[0-9][0-9]\)[0-9]*/\1/g'

CellID  FDX1    CD357   CD1D    X_centroid      Y_centroid
2170    2247.09 1482.88 1064.37 1000.95 1301.36
2171    3153.31 1322.06 1337.24 1040.01 1298.77
2172    3660.94 1150.97 1252.94 1569.5  1303.27
2173    2779.16 1468.5  1096.03 675.52  1304.18

@ArtemSokolov
Copy link
Collaborator

Following up on the above, the likely culprit is in the following:

Here, IDs are extracted from the first mask:
https://github.com/JoshuaHess12/quantification/blob/6c4addabd5888397eb38cbf4a360171b28edede3/SingleCellDataExtraction.py#L133

but then get concatenated to all other tables:
https://github.com/JoshuaHess12/quantification/blob/6c4addabd5888397eb38cbf4a360171b28edede3/SingleCellDataExtraction.py#L151

This concatenation assumes that the same set of cells is present in every mask. Unfortunately, this assumption is violated when a cell has zero area (as in the cytoplasm example above). A suggested fix is to fully isolate the processing of a single mask file, including the extraction of Cell IDs. The outer loop can then call the corresponding function with a single mask a time, which will ensure that no "cross-talk" between masks happens.

@JoshuaHess12
Copy link
Contributor Author

I think the processing of each mask is already uncoupled in the for loop -- there isn't any crosstalk between the masks with the way this pull request exports the CSVs. The CellIDs are mismatched because regionprops in Python automatically enumerates the CellIDs for us by sweeping from left to right across the image. If there is no cytoplasm object for a cell, then all the other CellIDs for the cytoplasm mask will be shifted up by a value of one in the CellID column of the cytoplasm CSV compared to the nucleus CSV file.

I think one way to fix this would be to do a 1-nearest neighbor assignment from the other CSV files to the nuclei CSV file based on their spatial coordinates. If we assume that the cytoplasm of each cell is always going to be closest to its own nucleus then this may work. We could relabel all other CellID rows in the mismatched CSVs according to the index of their nearest neighbor in the nuclei CSV.

@JoshuaHess12
Copy link
Contributor Author

Wait, you may be right @ArtemSokolov . Sorry about that. I will look at this a little more.

@ArtemSokolov
Copy link
Collaborator

Thanks for looking into it, @JoshuaHess12

The CellIDs are mismatched because regionprops in Python automatically enumerates the CellIDs for us by sweeping from left to right across the image.

So, I actually had this concern before also, but I verified with Clarence that regionprops() extracts Cell IDs directly from the mask file, and the upstream segmentation module ensures that Cell IDs match between nucleus and cytoplasm masks, even if some cells are not captured by one of those masks. This is why we see skipped IDs, like in this example.

CellID X_centroid Y_centroid  FDX1 CD357  CD1D
    <dbl>      <dbl>      <dbl> <dbl> <dbl> <dbl>
 1   2170      1003.      1301. 2247. 1483. 1064.
 2   2172      1570.      1304. 3153. 1322. 1337.
 3   2173       675.      1304. 3661. 1151. 1253.

I think the end goal is just to ensure that the output exemplar-001_cytoRingMask.csv is the same, regardless of whether the user calls the tool with --masks cytoRingMask.tif alone or jointly with --masks nucleiRingMask.tif cytoRingMask.tif.

@JoshuaHess12
Copy link
Contributor Author

@ArtemSokolov No problem! I think this makes sense now. I moved the extraction of Cell IDs inside the loop so that it gets executed separately for each mask. Let me know if the latest commit addresses the issue.

@ArtemSokolov
Copy link
Collaborator

Work great, @JoshuaHess12! I can confirm that --masks cytoRingMask.tif and --masks nucleiRingMask.tif cytoRingMask.tif produce identical .csv files for the cytoplasm mask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants