Tentative set of tools and scripts for analysing spatial transcriptomic data with the resolve platform Nextflow pipeline which runs image segmentation with cellpose and then counts the transcripts in each cell. The pipeline uses four Python3 scripts for:
- Filling in gaps left by tile registration.
- Deduplicating transcripts.
- Segmentation.
- Cleaning up the segmentation mask (removing too small cells)
- Making ImageJ and Seurat compatible ROIs
- Expression assignment = counting the transcripts in each cell.
These scripts can be used independently or as part of the Nextflow pipeline provided.
Dependencies
The pipeline automatically fetches this singularity container
The definition files are provided here.
- (1) Nextflow pipeline
- (1.1) Parameters
- (1.2) Input
- (1.3) Output
- (1.4) Example Run
- (2) Scripts
- (2.1) Gap Filling
- (2.2) Deduplication
- (2.3) Segmentation
- (2.3.1) Cellpose Segmentation
- (2.3.2) Mesmer Segmentation
- (2.4) Segmentation Mask Cleanup
- (2.5) ROI Generation
- (2.6) Expression assignment
The pipeline can be run on the cpu or the gpu. The only affected process is the segmentation process (both cellpose and mesmer).
To run the pipeline in gpu mode add gpu
to the -profile
option of nextflow
.
nextflow run main.f -c run.config -profile=gpu
For an example see the provided example config file
Input/output Parameters:
params.input_path
= Path to the resolve folder with the Panoramas to be processed.params.output_path
= Path for output.
Workflow Parameters:
params.fill_gaps
=true
Set tofalse
to skip image gap filling.params.deduplicate
=true
Set tofalse
to skip transcript deduplicationparams.do_zip
=true
Set tofalse
to skip making ImageJ ROIs (faster)params.segmentation_tool
="mesmer"
or"cellpose"
to select which tool to use for segmentation.
Deduplication Parameters:
params.tile_size
= Tile size (distance between gridlines). Default (2144).params.window_size
= Window arround gridlines to search for duplicates. Default (30)params.max_freq
= Maximum transcript count to calculate X/Y shifts (better to discard very common genes). Default (400)params.min_mode
= Minumum occurances of ~XYZ_shift to consider it valid. Default(10)
cellpose Segmentation Parameters:
params.model_name
= "cyto" (recommended) or any model that uses 1 DNA channel.params.probability_threshold
= floating point number between -6 and +6 see cellpose threshold documentation.params.cell_diameter
= Cell diameter orNone
for automatic estimation, see cellpose diameter documentation.
mesmer Segmentation Parameters:
params.maxima_threshold
= Decrease for over segmentation. Default(0.075)params.maxima_smooth
= Smoothing radius for maxima. Default(0)params.interior_threshold
= Decrease to identify more cells. Default(0.2)params.interior_smooth
= Smoothing radius for cell detection. Default(2)params.small_objects_threshold
= Minimum object size. Default(15)params.fill_holes_threshold
= Max Size for hole filling. Default(15)params.radius
= Undocuented in Mesmer. Default(2)
Folder with the panoramas to be processed. All panoramas are expected to have:
- DAPI image named:
Panorama_*_Channel3_R8_.tiff
- Transcripts coordinates named:
Panorama_*_results_withFP.txt
In params.output_path
:
sample_metadata.csv
: .csv file with one row per sample and 3 columns: sample (sample name), dapi (path to the dapi image), counts (path to the transcript coordinates)- For each sample a folder:
SAMPLE_NAME
with:SAMPLE_NAME-gridfilled.tiff
= Image with the registration grid lines smoothed out.SAMPLE_NAME-filtered_transcripts.txt
= Transcripts with duplicates marked.SAMPLE_NAME-cellpose-mask.tiff
orSAMPLE_NAME-memser-mask.tiff
= 16 bit segmentation mask (0 = background, N = pixels belonging to the Nth cell).SAMPLE_NAME-roi.zip
(optional) = ImageJ ROI file with the ROIs numbered according to the segmentation mask.SAMPLE_NAME-cell_data.csv
= Single cell data, numbered according to the semgentation mask.
nextflow run main.nf -profile cluster,gpu -c test.config
Breakdown:
-profile cluster,gpu
=cluster
is for running on a PBS based cluster (default is local execution).gpu
if or using the gpu for cell segmentation (default is to use the cpu).-c test.config
= Use the parameters specified in thetest.config
file. ALternatively, parameters can be passed from the command line.
Scripts used in the Nextflow pipeline, can also be run independently.
python3.9 -u /MindaGap/mindagap.py $dapi_path 3 > gapfilling_log.txt
mv *gridfilled.tif $sample_name-gridfilled.tiff
It requires the following arguments:
$dapi_path
= path to the image to fix$sample_name
= Sample name
The script:
- Runs MindaGap on the input image with a smoothing box size of 3.
- Renames the output image.
For more info on MindaGap see: https://github.com/ViriatoII/MindaGap
python3.8 -u /MindaGap/duplicate_finder.py $transcript_path $tile_size $window_size \
$max_freq $min_mode > deduplication_log.txt
mv *_markedDups.txt $sample_name-filtered_transcripts.txt 2>&1
It requires the following arguments:
$transcript_path
= path to the transcripts to deduplicate$tile_size_x
= X tile size (distance between gridlines). Default (2144).$tile_size_y
= Y tile size (distance between gridlines). Default (2144).$window_size
= Window arround gridlines to search for duplicates. Default (30)$max_freq
= Maximum transcript count to calculate X/Y shifts (better to discard very common genes). Default (400)$min_mode
= Minumum occurances of ~XYZ_shift to consider it valid. Default(10)
The script:
- Runs duplicater_finder.py on the input transcripts.
- Renames the output file with the filtered transcripts.
For more info on MindaGap see: https://github.com/ViriatoII/MindaGap
This Cellpose segmentation script is mostly a wrapper around cellpose. It assumes the input is a single channel grayscale image with the nuclei. It requires the following positional arguments:
tiff_path
= path to the image to segmentmodel_name
= model to use for the segmentationprob_thresh
= probability thresholdcell_diameter
= cell diameter for cellpose.output_mask_file
= path to the cell mask output
It also takes the following optional flag:
--gpu
= Use the first available GPU.
The script:
- Run CLAHE on the input image.
- Segemnt with cellpose.
- Sets to 0 all pixels at the image border.
Example
python3.9 cellpose_segmenter.py DAPI_IMAGE cyto 0 70 OUTPUT_SEGMENTATION_MASK_NAME
This Mesmer segmentation script is mainly a wrapper around mesmer. It assumes the input is a single channel grayscale image with the nuclei. It requires the following positional arguments:
tiff_path
= path to the image to segmentoutput_mask_file
= path to the cell mask output
It accepts the following optional parameters:
maxima_threshold
= Decrease for over segmentation. Default(0.075)maxima_smooth
= Smoothing radius for maxima. Default(0)interior_threshold
= Decrease to identify more cells. Default(0.2)interior_smooth
= Smoothing radius for cell detection. Default(2)small_objects_threshold
= Minimum object size. Default(15)fill_holes_threshold
= Max Size for hole filling. Default(15)radius
= Undocuented in Mesmer. Default(2)
Mesmer will run on the first available GPU if present, no need to specify additional parameters.
The script:
- Run CLAHE on the input image.
- Segemnt with mesmer.
- Sets to 0 all pixels at the image border.
Example
python3.8 mesmer_segmenter.py DAPI_IMAGE OUTPUT_SEGMENTATION_MASK_NAME
Segmentation mask cleanup script
It requires the following positional arguments:
mask_path
= path to the image to segmentdiameter
= cell diameter size filtering. Cells smaller thancell_diameter / 2
are discardedoutput_mask_file
= path to the cell mask output
The script prepares the cell mask for:
- ROI generation
- Expression assignment
The script:
- Sets to 0 all pixels at the image border.
- Remove cells smaller then
cell_diameter / 2
pixels in diameter.
Generates a zip file with FiJi ROIs from a segmentation mask.
It requires the following positional arguments:
mask_path
= Path to the segmentation mask.output_roi_file
= Path to the output zip file with the ROIs.
The script:
- Sets to 0 all pixels at the image boundaires.
- Generates an ROI for each cell.
- Saves the ROIs in a zip file.
Example
python3.9 roi_maker.py MASK_IMAGE OUTPUT_ROI_ZIP_NAME
Expression assignment script Counts the transcripts in each cell from the segmentation mask. Equivalent to the Polylux counts unless:
- Overlapping ROIs
- Transcripts outside the border of the image or lying exactly on the ROI border (resolution is 1 pixel)
It requires the following positional arguments:
mask_file
= Path to the input mask file.transcript_file
= Path to the input transcript file.output_file
= Path to the output single cell data file.
Notes:
- Removes all transcripts whose coordinates fall outside the size of the mask.
Example
python3.9 extracter.py SEGMENTATION_MASK TRANSCRIPT_COORDINATE_FILE OUTPUT_FILE_PATH.csv
To Do:
- Add option to filter by Z coordinate?
- Add option to filter by transcript quality?
- Add option to count transcripts from ROIs?