Page not found
+The page you requested cannot be found (perhaps it was moved or renamed).
+You may want to try searching to find the page's new location, or use +the table of contents to find the page you are looking for.
+The page you requested cannot be found (perhaps it was moved or renamed).
+You may want to try searching to find the page's new location, or use +the table of contents to find the page you are looking for.
+In Section 7.4 we observed staining/expression differences +between the individual samples. This can arise due to technical (e.g., +differences in sample processing) as well as biological (e.g., differential +expression between patients/indications) effects. However, the combination of these effects +hinders cell phenotyping via clustering as highlighted in Section 9.2.
+To integrate cells across samples, we can use computational +strategies developed for correcting batch effects in single-cell RNA sequencing +data. In the following sections, we will use functions of the +batchelor, +harmony and +Seurat +packages to correct for such batch effects.
+Of note: the correction approaches presented here aim at removing any +differences between samples. This will also remove biological differences +between the patients/indications. Nevertheless, integrating cells across samples +can facilitate the detection of cell phenotypes via clustering.
+First, we will read in the SpatialExperiment
object containing the single-cell
+data.
The batchelor
package provides the mnnCorrect
and fastMNN
functions to
+correct for differences between samples/batches. Both functions build up on
+finding mutual nearest neighbors (MNN) among the cells of different samples and
+correct expression differences between the batches (Haghverdi et al. 2018). The mnnCorrect
function
+returns corrected expression counts while the fastMNN
functions performs the
+correction in reduced dimension space. As such, fastMNN
returns integrated
+cells in form of a low dimensional embedding.
Paper: Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors
+Documentation: batchelor
Here, we apply the fastMNN
function to integrate cells between
+patients. By setting auto.merge = TRUE
the function estimates the best
+batch merging order by maximizing the number of MNN pairs at each merging step.
+This is more time consuming than merging sequentially based on how batches appear in the
+dataset (default). We again select the markers defined in Section 5.2
+for sample correction.
The function returns a SingleCellExperiment
object which contains corrected
+low-dimensional coordinates for each cell in the reducedDim(out, "corrected")
+slot. This low-dimensional embedding can be further used for clustering and
+non-linear dimensionality reduction. We check that the order of cells is the
+same between the input and output object and then transfer the corrected
+coordinates to the main SpatialExperiment
object.
library(batchelor)
+set.seed(220228)
+out <- fastMNN(spe, batch = spe$patient_id,
+ auto.merge = TRUE,
+ subset.row = rowData(spe)$use_channel,
+ assay.type = "exprs")
+
+# Check that order of cells is the same
+stopifnot(all.equal(colnames(spe), colnames(out)))
+
+# Transfer the correction results to the main spe object
+reducedDim(spe, "fastMNN") <- reducedDim(out, "corrected")
The computational time of the fastMNN
function call is
+1.67 minutes.
Of note, the warnings that the fastMNN
function produces can be avoided as follows:
BSPARAM = BiocSingular::ExactParam()
Warning in (function (A, nv = 5, nu = nv, maxit = 1000, work = nv + 7, reorth = TRUE, :
+ You're computing too large a percentage of total singular values, use a standard svd instead.
+d = 30
In check_numbers(k = k, nu = nu, nv = nv, limit = min(dim(x)) - :
+ more singular values/vectors requested than available
+The fastMNN
function further returns outputs that can be used to assess the
+quality of the batch correction. The metadata(out)$merge.info
entry collects
+diagnostics for each individual merging step. Here, the batch.size
and
+lost.var
entries are important. The batch.size
entry reports the relative
+magnitude of the batch effect and the lost.var
entry represents the percentage
+of lost variance per merging step. A large batch.size
and low lost.var
+indicate sufficient batch correction.
## DataFrame with 3 rows and 3 columns
+## left right batch.size
+## <List> <List> <numeric>
+## 1 Patient4 Patient2 0.381635
+## 2 Patient4,Patient2 Patient1 0.581013
+## 3 Patient4,Patient2,Patient1 Patient3 0.767376
+
+## Patient1 Patient2 Patient3 Patient4
+## [1,] 0.000000000 0.031154864 0.00000000 0.046198914
+## [2,] 0.043363546 0.009772150 0.00000000 0.011931892
+## [3,] 0.005394755 0.003023119 0.07219394 0.005366304
+We observe that Patient4 and Patient2 are most similar with a low batch effect. +Merging cells of Patient3 into the combined batch of Patient1, +Patient2 and Patient4 resulted in the highest percentage of lost variance and +the detection of the largest batch effect. In the next paragraph we can +visualize the correction results.
+The simplest option to check if the sample effects were corrected is by using +non-linear dimensionality reduction techniques and observe mixing of cells across +samples. We will recompute the UMAP embedding using the corrected +low-dimensional coordinates for each cell.
+library(scater)
+
+set.seed(220228)
+spe <- runUMAP(spe, dimred= "fastMNN", name = "UMAP_mnnCorrected")
Next, we visualize the corrected UMAP while overlaying patient IDs.
+library(cowplot)
+library(dittoSeq)
+library(viridis)
+
+# visualize patient id
+p1 <- dittoDimPlot(spe, var = "patient_id",
+ reduction.use = "UMAP", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP before correction")
+p2 <- dittoDimPlot(spe, var = "patient_id",
+ reduction.use = "UMAP_mnnCorrected", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP after correction")
+
+plot_grid(p1, p2)
We observe an imperfect merging of Patient3 into all other samples. This +was already seen when displaying the merging information above. +We now also visualize the expression of selected markers across all cells +before and after batch correction.
+markers <- c("Ecad", "CD45RO", "CD20", "CD3", "FOXP3", "CD206", "MPO", "SMA", "Ki67")
+
+# Before correction
+plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP",
+ assay = "exprs", size = 0.2, list.out = TRUE)
+plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
+plot_grid(plotlist = plot_list)
# After correction
+plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP_mnnCorrected",
+ assay = "exprs", size = 0.2, list.out = TRUE)
+plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
+plot_grid(plotlist = plot_list)
We observe that immune cells across patients are merged after batch correction
+using fastMNN
. However, the tumor cells of different patients still cluster
+separately.
The harmony
algorithm performs batch correction by iteratively clustering and
+correcting the positions of cells in PCA space (Korsunsky et al. 2019). We will first
+perform PCA on the asinh-transformed counts and then call the RunHarmony
+function to perform data integration.
Paper: Fast, sensitive and accurate integration of single-cell data with Harmony
+Documentation: harmony
Similar to the fastMNN
function, harmony
returns the corrected
+low-dimensional coordinates for each cell. These can be transfered to the
+reducedDim
slot of the original SpatialExperiment
object.
library(harmony)
+library(BiocSingular)
+
+spe <- runPCA(spe,
+ subset_row = rowData(spe)$use_channel,
+ exprs_values = "exprs",
+ ncomponents = 30,
+ BSPARAM = ExactParam())
+
+set.seed(230616)
+out <- RunHarmony(spe, group.by.vars = "patient_id")
+
+# Check that order of cells is the same
+stopifnot(all.equal(colnames(spe), colnames(out)))
+
+reducedDim(spe, "harmony") <- reducedDim(out, "HARMONY")
The computational time of the HarmonyMatrix
function call is
+0.98 minutes.
We will now again visualize the cells in low dimensions after UMAP embedding.
+ +# visualize patient id
+p1 <- dittoDimPlot(spe, var = "patient_id",
+ reduction.use = "UMAP", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP before correction")
+p2 <- dittoDimPlot(spe, var = "patient_id",
+ reduction.use = "UMAP_harmony", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP after correction")
+
+plot_grid(p1, p2)
And we visualize selected marker expression as defined above.
+# Before correction
+plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP",
+ assay = "exprs", size = 0.2, list.out = TRUE)
+plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
+plot_grid(plotlist = plot_list)
# After correction
+plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP_harmony",
+ assay = "exprs", size = 0.2, list.out = TRUE)
+plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
+plot_grid(plotlist = plot_list)
We observe a more aggressive merging of cells from different patients compared
+to the results after fastMNN
correction. Importantly, immune cell and epithelial
+markers are expressed in distinct regions of the UMAP.
The Seurat
package provides a number of functionalities to analyze single-cell
+data. As such it also allows the integration of cells across different samples.
+Conceptually, Seurat
performs batch correction similarly to fastMNN
by
+finding mutual nearest neighbors (MNN) in low dimensional space before
+correcting the expression values of cells (Stuart et al. 2019).
Paper: Comprehensive Integration of Single-Cell Data
+Documentation: Seurat
To use Seurat
, we will first create a Seurat
object from the SpatialExperiment
+object and add relevant metadata. The object also needs to be split by patient
+prior to integration.
library(Seurat)
+library(SeuratObject)
+seurat_obj <- as.Seurat(spe, counts = "counts", data = "exprs")
+seurat_obj <- AddMetaData(seurat_obj, as.data.frame(colData(spe)))
+
+seurat.list <- SplitObject(seurat_obj, split.by = "patient_id")
To avoid long run times, we will use an approach that relies on reciprocal PCA
+instead of canonical correlation analysis for dimensionality reduction and
+initial alignment. For an extended tutorial on how to use Seurat
for data
+integration, please refer to their
+vignette.
We will first define the features used for integration and perform PCA on cells
+of each patient individually. The FindIntegrationAnchors
function detects MNNs between
+cells of different patients and the IntegrateData
function corrects the
+expression values of cells. We slightly increase the number of neighbors to be
+considered for MNN detection (the k.anchor
parameter). This increases the integration
+strength.
features <- rownames(spe)[rowData(spe)$use_channel]
+seurat.list <- lapply(X = seurat.list, FUN = function(x) {
+ x <- ScaleData(x, features = features, verbose = FALSE)
+ x <- RunPCA(x, features = features, verbose = FALSE, approx = FALSE)
+ return(x)
+})
+
+anchors <- FindIntegrationAnchors(object.list = seurat.list,
+ anchor.features = features,
+ reduction = "rpca",
+ k.anchor = 20)
+
+combined <- IntegrateData(anchorset = anchors)
We now select the integrated
assay and perform PCA dimensionality reduction.
+The cell coordinates in PCA reduced space can then be transferred to the
+original SpatialExperiment
object. Of note: by splitting the object into
+individual batch-specific objects, the ordering of cells in the integrated
+object might not match the ordering of cells in the input object. In this case,
+columns will need to be reordered. Here, we test if the ordering of cells in the
+integrated Seurat
object matches the ordering of cells in the main
+SpatialExperiment
object.
DefaultAssay(combined) <- "integrated"
+
+combined <- ScaleData(combined, verbose = FALSE)
+combined <- RunPCA(combined, npcs = 30, verbose = FALSE, approx = FALSE)
+
+# Check that order of cells is the same
+stopifnot(all.equal(colnames(spe), colnames(combined)))
+
+reducedDim(spe, "seurat") <- Embeddings(combined, reduction = "pca")
The computational time of the Seurat
function calls is
+3.15 minutes.
As above, we recompute the UMAP embeddings based on Seurat
integrated results
+and visualize the embedding.
Visualize patient IDs.
+# visualize patient id
+p1 <- dittoDimPlot(spe, var = "patient_id",
+ reduction.use = "UMAP", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP before correction")
+p2 <- dittoDimPlot(spe, var = "patient_id",
+ reduction.use = "UMAP_seurat", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP after correction")
+
+plot_grid(p1, p2)
Visualization of marker expression.
+# Before correction
+plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP",
+ assay = "exprs", size = 0.2, list.out = TRUE)
+plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
+plot_grid(plotlist = plot_list)
# After correction
+plot_list <- multi_dittoDimPlot(spe, var = markers, reduction.use = "UMAP_seurat",
+ assay = "exprs", size = 0.2, list.out = TRUE)
+plot_list <- lapply(plot_list, function(x) x + scale_color_viridis())
+plot_grid(plotlist = plot_list)
Similar to the methods presented above, Seurat
integrates immune cells correctly.
+When visualizing the patient IDs, slight patient-to-patient differences within tumor
+cells can be detected.
Choosing the correct integration approach is challenging without having ground truth +cell labels available. It is recommended to compare different techniques and different +parameter settings. Please refer to the documentation of the individual tools +to become familiar with the possible parameter choices. Furthermore, in the following +section, we will discuss clustering and classification approaches in light of +expression differences between samples.
+In general, it appears that MNN-based approaches are less conservative in terms
+of merging compared to harmony
. On the other hand, harmony
could well merge
+cells in a way that regresses out biological signals.
The modified SpatialExperiment
object is saved for further downstream analysis.
## R version 4.3.2 (2023-10-31)
+## Platform: x86_64-pc-linux-gnu (64-bit)
+## Running under: Ubuntu 22.04.3 LTS
+##
+## Matrix products: default
+## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
+## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
+##
+## locale:
+## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
+## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
+## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
+## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
+## [9] LC_ADDRESS=C LC_TELEPHONE=C
+## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
+##
+## time zone: Etc/UTC
+## tzcode source: system (glibc)
+##
+## attached base packages:
+## [1] stats4 stats graphics grDevices utils datasets methods
+## [8] base
+##
+## other attached packages:
+## [1] testthat_3.2.0 SeuratObject_5.0.0
+## [3] Seurat_4.4.0 harmony_1.1.0
+## [5] Rcpp_1.0.11 viridis_0.6.4
+## [7] viridisLite_0.4.2 dittoSeq_1.14.0
+## [9] cowplot_1.1.1 scater_1.30.0
+## [11] ggplot2_3.4.4 scuttle_1.12.0
+## [13] BiocSingular_1.18.0 SpatialExperiment_1.12.0
+## [15] batchelor_1.18.0 SingleCellExperiment_1.24.0
+## [17] SummarizedExperiment_1.32.0 Biobase_2.62.0
+## [19] GenomicRanges_1.54.1 GenomeInfoDb_1.38.0
+## [21] IRanges_2.36.0 S4Vectors_0.40.1
+## [23] BiocGenerics_0.48.0 MatrixGenerics_1.14.0
+## [25] matrixStats_1.0.0
+##
+## loaded via a namespace (and not attached):
+## [1] RcppAnnoy_0.0.21 splines_4.3.2
+## [3] later_1.3.1 bitops_1.0-7
+## [5] tibble_3.2.1 polyclip_1.10-6
+## [7] lifecycle_1.0.3 rprojroot_2.0.3
+## [9] globals_0.16.2 lattice_0.21-9
+## [11] MASS_7.3-60 magrittr_2.0.3
+## [13] plotly_4.10.3 sass_0.4.7
+## [15] rmarkdown_2.25 jquerylib_0.1.4
+## [17] yaml_2.3.7 httpuv_1.6.12
+## [19] sctransform_0.4.1 spam_2.10-0
+## [21] spatstat.sparse_3.0-3 sp_2.1-1
+## [23] reticulate_1.34.0 pbapply_1.7-2
+## [25] RColorBrewer_1.1-3 ResidualMatrix_1.12.0
+## [27] pkgload_1.3.3 abind_1.4-5
+## [29] zlibbioc_1.48.0 Rtsne_0.16
+## [31] purrr_1.0.2 RCurl_1.98-1.12
+## [33] GenomeInfoDbData_1.2.11 ggrepel_0.9.4
+## [35] irlba_2.3.5.1 listenv_0.9.0
+## [37] spatstat.utils_3.0-4 pheatmap_1.0.12
+## [39] goftest_1.2-3 spatstat.random_3.2-1
+## [41] fitdistrplus_1.1-11 parallelly_1.36.0
+## [43] DelayedMatrixStats_1.24.0 leiden_0.4.3
+## [45] codetools_0.2-19 DelayedArray_0.28.0
+## [47] tidyselect_1.2.0 farver_2.1.1
+## [49] ScaledMatrix_1.10.0 spatstat.explore_3.2-5
+## [51] jsonlite_1.8.7 BiocNeighbors_1.20.0
+## [53] ellipsis_0.3.2 progressr_0.14.0
+## [55] ggridges_0.5.4 survival_3.5-7
+## [57] tools_4.3.2 ica_1.0-3
+## [59] glue_1.6.2 gridExtra_2.3
+## [61] SparseArray_1.2.0 xfun_0.40
+## [63] dplyr_1.1.3 withr_2.5.1
+## [65] fastmap_1.1.1 fansi_1.0.5
+## [67] digest_0.6.33 rsvd_1.0.5
+## [69] R6_2.5.1 mime_0.12
+## [71] colorspace_2.1-0 scattermore_1.2
+## [73] tensor_1.5 spatstat.data_3.0-3
+## [75] RhpcBLASctl_0.23-42 utf8_1.2.4
+## [77] tidyr_1.3.0 generics_0.1.3
+## [79] data.table_1.14.8 httr_1.4.7
+## [81] htmlwidgets_1.6.2 S4Arrays_1.2.0
+## [83] uwot_0.1.16 pkgconfig_2.0.3
+## [85] gtable_0.3.4 lmtest_0.9-40
+## [87] XVector_0.42.0 brio_1.1.3
+## [89] htmltools_0.5.6.1 dotCall64_1.1-0
+## [91] bookdown_0.36 scales_1.2.1
+## [93] png_0.1-8 knitr_1.44
+## [95] rstudioapi_0.15.0 reshape2_1.4.4
+## [97] rjson_0.2.21 nlme_3.1-163
+## [99] cachem_1.0.8 zoo_1.8-12
+## [101] stringr_1.5.0 KernSmooth_2.23-22
+## [103] parallel_4.3.2 miniUI_0.1.1.1
+## [105] vipor_0.4.5 desc_1.4.2
+## [107] pillar_1.9.0 grid_4.3.2
+## [109] vctrs_0.6.4 RANN_2.6.1
+## [111] promises_1.2.1 beachmat_2.18.0
+## [113] xtable_1.8-4 cluster_2.1.4
+## [115] waldo_0.5.1 beeswarm_0.4.0
+## [117] evaluate_0.22 magick_2.8.1
+## [119] cli_3.6.1 compiler_4.3.2
+## [121] rlang_1.1.1 crayon_1.5.2
+## [123] future.apply_1.11.0 labeling_0.4.3
+## [125] plyr_1.8.9 ggbeeswarm_0.7.2
+## [127] stringi_1.7.12 deldir_1.0-9
+## [129] BiocParallel_1.36.0 munsell_0.5.0
+## [131] lazyeval_0.2.2 spatstat.geom_3.2-7
+## [133] Matrix_1.6-1.1 patchwork_1.1.3
+## [135] sparseMatrixStats_1.14.0 future_1.33.0
+## [137] shiny_1.7.5.1 ROCR_1.0-11
+## [139] igraph_1.5.1 bslib_0.5.1
+A common step during single-cell data analysis is the annotation of cells based +on their phenotype. Defining cell phenotypes is often subjective and relies +on previous biological knowledge. The Orchestrating Single Cell Analysis with Bioconductor book +presents a number of approaches to phenotype cells detected by single-cell RNA +sequencing based on reference datasets or gene set analysis.
+In highly-multiplexed imaging, target proteins or molecules are manually +selected based on the biological question at hand. It narrows down the feature +space and facilitates the manual annotation of clusters to derive cell +phenotypes. We will therefore discuss and compare a number of clustering +approaches to group cells based on their similarity in marker expression in +Section 9.2.
+Unlike single-cell RNA sequencing or CyTOF data, single-cell data derived from +highly-multiplexed imaging data often suffers from “lateral spillover” between +neighboring cells. This spillover caused by imperfect segmentation often hinders +accurate clustering to define specific cell phenotypes in multiplexed imaging +data. Tools have been developed to correct lateral spillover between cells +(Bai et al. 2021) but the approach requires careful selection of the markers to +correct. In Section 9.3 we will train and apply a random +forest classifier to classify cell phenotypes in the dataset as alternative +approach to clustering-based cell phenotyping. This approach has been previously used to +identify major cell phenotypes in metastatic melanoma and avoids clustering of +cells (Hoch et al. 2022).
+We will first read in the previously generated SpatialExperiment
object and
+sample 2000 cells to visualize cluster membership.
In the first section, we will present clustering approaches to identify cellular +phenotypes in the dataset. These methods group cells based on their similarity +in marker expression or by their proximity in low dimensional space. A number of +approaches have been developed to cluster data derived from single-cell RNA +sequencing technologies (Yu et al. 2022) or CyTOF (Weber and Robinson 2016). For demonstration +purposes, we will highlight common clustering approaches that are available in R +and have been used for clustering cells obtained from IMC. Two approaches rely +on graph-based clustering and one approach uses self organizing maps (SOM).
+The PhenoGraph clustering approach was first described to group cells of a CyTOF
+dataset (Levine et al. 2015). The algorithm first constructs a graph by detecting the
+k
nearest neighbours based on euclidean distance in expression space. In the
+next step, edges between nodes (cells) are weighted by their overlap in nearest
+neighbor sets. To quantify the overlap in shared nearest neighbor sets, the
+jaccard index is used. The Louvain modularity optimization approach is used to
+detect connected communities and partition the graph into clusters of cells.
+This clustering strategy was used by Jackson, Fischer et al. and Schulz et
+al. to cluster IMC data (Jackson et al. 2020; Schulz et al. 2018).
There are several different PhenoGraph implementations available in R. Here, we +use the one available at +https://github.com/i-cyto/Rphenograph. +For large datasets, +https://github.com/stuchly/Rphenoannoy +offers a more performant implementation of the algorithm.
+In the following code chunk, we select the asinh-transformed mean pixel
+intensities per cell and channel and subset the channels to the ones containing
+biological variation. This matrix is transposed to store cells in rows. Within
+the Rphenograph
function, we select the 45 nearest neighbors for graph
+building and louvain community detection (default). The function returns a list
+of length 2, the first entry being the graph and the second entry containing the
+community object. Calling membership
on the community object will return
+cluster IDs for each cell. These cluster IDs are then stored within the
+colData
of the SpatialExperiment
object. Cluster IDs are mapped on top of
+the UMAP embedding and single-cell marker expression within each cluster are
+visualized in form of a heatmap.
It is recommended to test different inputs to k
as shown in the next section.
+Selecting larger values for k
results in larger clusters.
library(Rphenograph)
+library(igraph)
+library(dittoSeq)
+library(viridis)
+
+mat <- t(assay(spe, "exprs")[rowData(spe)$use_channel,])
+
+set.seed(230619)
+out <- Rphenograph(mat, k = 45)
+
+clusters <- factor(membership(out[[2]]))
+
+spe$pg_clusters <- clusters
+
+dittoDimPlot(spe, var = "pg_clusters",
+ reduction.use = "UMAP", size = 0.2,
+ do.label = TRUE) +
+ ggtitle("Phenograph clusters on UMAP")
dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("pg_clusters", "patient_id"),
+ annot.colors = c(dittoColors(1)[1:length(unique(spe$pg_clusters))],
+ metadata(spe)$color_vectors$patient_id))
The Rphenograph
function call took
+1.67 minutes.
We can observe that some of the clusters only contain cells of a single patient.
+This can often be observed in the tumor compartment. In the next step, we
+use the integrated cells (see Section 8) in low dimensional
+embedding for clustering. Here, the low dimensional embedding can
+be directly accessed from the reducedDim
slot.
mat <- reducedDim(spe, "fastMNN")
+
+set.seed(230619)
+out <- Rphenograph(mat, k = 45)
+
+clusters <- factor(membership(out[[2]]))
+
+spe$pg_clusters_corrected <- clusters
+
+dittoDimPlot(spe, var = "pg_clusters_corrected",
+ reduction.use = "UMAP_mnnCorrected", size = 0.2,
+ do.label = TRUE) +
+ ggtitle("Phenograph clusters on UMAP, integrated cells")
dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("pg_clusters_corrected","patient_id"),
+ annot.colors = c(dittoColors(1)[1:length(unique(spe$pg_clusters_corrected))],
+ metadata(spe)$color_vectors$patient_id))
Clustering using the integrated embedding leads to clusters that contain cells +of different patients. Cluster annotation can now be performed by manually +labeling cells based on their marker expression (see Notes in Section +9.2.5).
+The bluster +package provides a simple interface to cluster cells using a number of different +clustering approaches and different metrics to access cluster stability.
+For simplicity, we will focus on graph based clustering as this is the most
+popular and a fast method for single-cell clustering. The bluster
package
+provides functionalities to build k-nearest neighbor (KNN) graphs and its weighted
+version, shared nearest neighbor (SNN) graphs where nodes represent cells.
+The user can chose the number of neighbors to consider (parameter k
),
+the edge weighting method (parameter type
) and the community detection
+function to use (parameter cluster.fun
). As all parameters affect the clustering
+results, the bluster
package provides the clusterSweep
function to test
+a number of parameter settings in parallel. In the following code chunk,
+we select the asinh-transformed mean pixel intensities and subset the markers
+of interest. The resulting matrix is transposed to fit to the requirements of
+the bluster package (cells in rows).
We test two different settings for k
, two for type
and fix the cluster.fun
+to louvain
as this is one of the most common approaches for community detection.
+This function call is parallelized by setting the BPPARAM
parameter.
library(bluster)
+library(BiocParallel)
+library(ggplot2)
+
+mat <- t(assay(spe, "exprs")[rowData(spe)$use_channel,])
+
+combinations <- clusterSweep(mat,
+ BLUSPARAM=SNNGraphParam(),
+ k=c(10L, 20L),
+ type = c("rank", "jaccard"),
+ cluster.fun = "louvain",
+ BPPARAM = MulticoreParam(RNGseed = 220427))
We next calculate two metrics to estimate cluster stability: the average +silhouette width and the neighborhood purity.
+We use the approxSilhouette
function to compute the silhouette width for each
+cell and compute the average across all cells per parameter setting. Please see
+?silhouette
for more information on how the silhouette width is computed for
+each cell. A large average silhouette width indicates a cluster parameter
+setting for which cells that are well clustered.
The neighborPurity
function computes the fraction of cells around each cell
+with the same cluster ID. Per parameter setting, we compute the average
+neighborhood purity across all cells. A large average neighborhood purity
+indicates a cluster parameter setting for which cells that are well clustered.
sil <- vapply(as.list(combinations$clusters),
+ function(x) mean(approxSilhouette(mat, x)$width),
+ 0)
+
+ggplot(data.frame(method = names(sil),
+ sil = sil)) +
+ geom_point(aes(method, sil)) +
+ theme_classic(base_size = 15) +
+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
+ xlab("Cluster parameter combination") +
+ ylab("Average silhouette width")
pur <- vapply(as.list(combinations$clusters),
+ function(x) mean(neighborPurity(mat, x)$purity),
+ 0)
+
+ggplot(data.frame(method = names(pur),
+ pur = pur)) +
+ geom_point(aes(method, pur)) +
+ theme_classic(base_size = 15) +
+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
+ xlab("Cluster parameter combination") +
+ ylab("Average neighborhood purity")
The cluster parameter sweep took +4.67 minutes.
+Performing a cluster sweep takes some time as multiple function calls are run in parallel. +We do however recommend testing a number of different parameter settings to +assess clustering performance.
+Once parameter settings are known, we can either use the clusterRows
function
+of the bluster
package to cluster cells or its convenient wrapper function
+exported by the
+scran package.
+The scran::clusterCells
function accepts a SpatialExperiment
(or
+SingleCellExperiment
) object which stores cells in columns. By default, the
+function detects the 10 nearest neighbours for each cell, performs rank-based
+weighting of edges (see ?makeSNNGraph
for more information) and uses the
+cluster_walktrap
function to detect communities in the graph.
As we can see above, the clustering approach in this dataset with k
being 20
+and rank-based edge weighting leads to the highest silhouette width and highest
+neighborhood purity.
library(scran)
+
+set.seed(220620)
+clusters <- clusterCells(spe[rowData(spe)$use_channel,],
+ assay.type = "exprs",
+ BLUSPARAM = SNNGraphParam(k=20,
+ cluster.fun = "louvain",
+ type = "rank"))
+
+spe$nn_clusters <- clusters
+
+dittoDimPlot(spe, var = "nn_clusters",
+ reduction.use = "UMAP", size = 0.2,
+ do.label = TRUE) +
+ ggtitle("SNN clusters on UMAP")
dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("nn_clusters", "patient_id"),
+ annot.colors = c(dittoColors(1)[1:length(unique(spe$nn_clusters))],
+ metadata(spe)$color_vectors$patient_id))
The shared nearest neighbor graph clustering approach took +1.15 minutes.
+This function was used by (Tietscher et al. 2022) to cluster cells obtained by IMC. Setting
+type = "jaccard"
performs clustering similar to Rphenograph
above and Seurat.
Similar to the results obtained by Rphenograph
, some of the clusters are
+patient-specific. We can now perform clustering of the integrated cells
+by directly specifying which low-dimensional embedding to use:
set.seed(220621)
+clusters <- clusterCells(spe,
+ use.dimred = "fastMNN",
+ BLUSPARAM = SNNGraphParam(k = 20,
+ cluster.fun = "louvain",
+ type = "rank"))
+
+spe$nn_clusters_corrected <- clusters
+
+dittoDimPlot(spe, var = "nn_clusters_corrected",
+ reduction.use = "UMAP_mnnCorrected", size = 0.2,
+ do.label = TRUE) +
+ ggtitle("SNN clusters on UMAP, integrated cells")
dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("nn_clusters_corrected","patient_id"),
+ annot.colors = c(dittoColors(1)[1:length(unique(spe$nn_clusters_corrected))],
+ metadata(spe)$color_vectors$patient_id))
An alternative to graph-based clustering is offered by the
+CATALYST
+package. The cluster
function internally uses the
+FlowSOM
+package to group cells into 100 (default) clusters based on self organizing maps
+(SOM). In the next step, the
+ConsensusClusterPlus
+package is used to perform hierarchical consensus clustering of the previously
+detected 100 SOM nodes into 2 to maxK
clusters. Cluster stability for each k
+can be assessed by plotting the delta_area(spe)
. The optimal number
+of clusters can be found by selecting the k
at which a plateau is reached.
+In the example below, an optimal k
lies somewhere around 13.
library(CATALYST)
+
+# Run FlowSOM and ConsensusClusterPlus clustering
+set.seed(220410)
+spe <- cluster(spe,
+ features = rownames(spe)[rowData(spe)$use_channel],
+ maxK = 30)
+
+# Assess cluster stability
+delta_area(spe)
spe$som_clusters <- cluster_ids(spe, "meta13")
+
+dittoDimPlot(spe, var = "som_clusters",
+ reduction.use = "UMAP", size = 0.2,
+ do.label = TRUE) +
+ ggtitle("SOM clusters on UMAP")
dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("som_clusters", "patient_id"),
+ annot.colors = c(dittoColors(1)[1:length(unique(spe$som_clusters))],
+ metadata(spe)$color_vectors$patient_id))
Running FlowSOM clustering took 0.18 minutes.
+The CATALYST
package does not provide functionality to perform FlowSOM
and
+ConsensusClusterPlus
clustering directly on the batch-corrected, integrated cells. As an
+alternative to the CATALYST
package, the bluster
package provides SOM
+clustering when specifying the SomParam()
parameter. Similar to the CATALYST
+approach, we will first cluster the dataset into 100 clusters (also called
+“codes”). These codes are then further clustered into a maximum of 30 clusters
+using ConsensusClusterPlus
(using hierarchical clustering and euclidean
+distance). The delta area plot can be accessed using the (not exported)
+.plot_delta_area
function from CATALYST
. Here, it seems that the plateau is
+reached at a k
of 16 and we will store the final cluster IDs within the
+SpatialExperiment
object.
library(kohonen)
+library(ConsensusClusterPlus)
+
+# Select integrated cells
+mat <- reducedDim(spe, "fastMNN")
+
+# Perform SOM clustering
+set.seed(220410)
+som.out <- clusterRows(mat, SomParam(100), full = TRUE)
+
+# Cluster the 100 SOM codes into larger clusters
+ccp <- ConsensusClusterPlus(t(som.out$objects$som$codes[[1]]),
+ maxK = 30,
+ reps = 100,
+ distance = "euclidean",
+ seed = 220410,
+ plot = NULL)
# Link ConsensusClusterPlus clusters with SOM codes and save in object
+som.cluster <- ccp[[16]][["consensusClass"]][som.out$clusters]
+spe$som_clusters_corrected <- as.factor(som.cluster)
+
+dittoDimPlot(spe, var = "som_clusters_corrected",
+ reduction.use = "UMAP_mnnCorrected", size = 0.2,
+ do.label = TRUE) +
+ ggtitle("SOM clusters on UMAP, integrated cells")
dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("som_clusters_corrected","patient_id"),
+ annot.colors = c(dittoColors(1)[1:length(unique(spe$som_clusters_corrected))],
+ metadata(spe)$color_vectors$patient_id))
The FlowSOM
clustering approach has been used by (Hoch et al. 2022) to sub-cluster tumor
+cells as measured by IMC.
Finally, we can compare the results of different clustering approaches. For +this, we visualize the number of cells that are shared between different +clustering results in a pairwise fashion. In the following heatmaps a high match +between clustering results can be seen for those clusters that are uniquely +detected in both approaches.
+First, we will visualize the match between the three different approaches +applied to the asinh-transformed counts.
+library(patchwork)
+library(pheatmap)
+library(gridExtra)
+
+tab1 <- table(paste("Rphenograph", spe$pg_clusters),
+ paste("SNN", spe$nn_clusters))
+tab2 <- table(paste("Rphenograph", spe$pg_clusters),
+ paste("SOM", spe$som_clusters))
+tab3 <- table(paste("SNN", spe$nn_clusters),
+ paste("SOM", spe$som_clusters))
+
+pheatmap(log10(tab1 + 10), color = viridis(100))
We observe that Rphenograph
and the shared nearest neighbor (SNN) approach by
+scran
show similar results (first heatmap above). For example, Rphenograph
+cluster 20 (a tumor cluster) is perfectly captured by SNN cluster 12. On the
+other hand, the Neutrophil cluster (SNN cluster 6) is split into Rphenograph
+cluster 2 and Rphenograph cluster 6. A common approach
+is to now merge clusters that contain similar cell types and annotate them by
+hand (see below).
Below, a comparison between the clustering results of the integrated cells +is shown.
+tab1 <- table(paste("Rphenograph", spe$pg_clusters_corrected),
+ paste("SNN", spe$nn_clusters_corrected))
+tab2 <- table(paste("Rphenograph", spe$pg_clusters_corrected),
+ paste("SOM", spe$som_clusters_corrected))
+tab3 <- table(paste("SNN", spe$nn_clusters_corrected),
+ paste("SOM", spe$som_clusters_corrected))
+
+pheatmap(log10(tab1 + 10), color = viridis(100))
In comparison to clustering on the non-integrated cells, the clustering results +of the integrated cells show higher overlap. The SNN approach resulted in fewer +clusters and therefore matches better with the SOM clustering approach.
+The bluster
package provides a number of metrics to assess cluster stability
+here.
+For brevity we only highlighted the use of the silhouette width and the
+neighborhood purity but different metrics should be tested to assess cluster
+stability.
To assign cell types to clusters, we manually annotate clusters based on their +marker expression. For example, SNN cluster 12 (clustering of the integrated +cells) shows high, homogeneous expression of CD20 and we might therefore label +this cluster as B cells. The next chapter 10 will +highlight single-cell visualization methods that can be helpful for manual +cluster annotations.
+An example how to label clusters can be seen below:
+library(dplyr)
+cluster_celltype <- recode(spe$nn_clusters_corrected,
+ "1" = "Tumor_proliferating",
+ "2" = "Myeloid",
+ "3" = "Tumor",
+ "4" = "Tumor",
+ "5" = "Stroma",
+ "6" = "Proliferating",
+ "7" = "Myeloid",
+ "8" = "Plasma_cell",
+ "9" = "CD8",
+ "10" = "CD4",
+ "11" = "Neutrophil",
+ "12" = "Bcell",
+ "13" = "Stroma")
+
+spe$cluster_celltype <- cluster_celltype
In this section, we will highlight a cell type classification approach based +on ground truth labeling and random forest classification. The rational for +this supervised cell phenotyping approach is to use the information contained +in the pre-defined markers to detect cells of interest. This approach was +used by Hoch et al. to classify cell types in a metastatic melanoma IMC +dataset (Hoch et al. 2022).
+The antibody panel used in the example data set mainly focuses on immune cell +types and little on tumor cell phenotypes. Therefore we will label the following +cell types:
+The “B next to T cell” phenotype (BnTcell
) is commonly observed in immune
+infiltrated regions measured by IMC. We include this phenotype to account for B
+cell/T cell interactions where precise classification into B cells or T cells is
+not possible. The exact gating scheme can be seen at
+img/Gating_scheme.pdf.
As related approaches, Astir and +Garnett use pre-defined panel +information to classify cell phenotypes based on their marker expression.
+The cytomapper
+package provides the cytomapperShiny
function that allows gating of cells
+based on their marker expression and visualization of selected cells directly
+on the images.
library(cytomapper)
+if (interactive()) {
+
+ images <- readRDS("data/images.rds")
+ masks <- readRDS("data/masks.rds")
+
+ cytomapperShiny(object = spe, mask = masks, image = images,
+ cell_id = "ObjectNumber", img_id = "sample_id")
+}
The labeled cells for this data set can be accessed at
+10.5281/zenodo.6554544 and were downloaded
+in Section 4. Gating is performed per image and the
+cytomapperShiny
function allows the export of gated cells in form of a
+SingleCellExperiment
or SpatialExperiment
object. The cell label is stored
+in colData(object)$cytomapper_CellLabel
and the gates are stored in
+metadata(object)
. In the next section, we will read in and consolidate the
+labeled data.
For consistent visualization of cell types, we will now pre-define their colors:
+celltype <- setNames(c("#3F1B03", "#F4AD31", "#894F36", "#1C750C", "#EF8ECC",
+ "#6471E2", "#4DB23B", "grey", "#F4800C", "#BF0A3D", "#066970"),
+ c("Tumor", "Stroma", "Myeloid", "CD8", "Plasma_cell",
+ "Treg", "CD4", "undefined", "BnTcell", "Bcell", "Neutrophil"))
+
+metadata(spe)$color_vectors$celltype <- celltype
Here, we will read in the individual SpatialExperiment
objects containing the
+labeled cells and concatenate them. In the process of concatenating the
+SpatialExperiment
objects along their columns, the sample_id
entry is
+appended by .1, .2, .3, ...
due to replicated entries.
library(SingleCellExperiment)
+label_files <- list.files("data/gated_cells",
+ full.names = TRUE, pattern = ".rds$")
+
+# Read in SPE objects
+spes <- lapply(label_files, readRDS)
+
+# Merge SPE objects
+concat_spe <- do.call("cbind", spes)
In the following code chunk we will identify cells that were labeled multiple +times. This occurs when different cell phenotypes are gated per image and can +affect immune cells that are located inside the tumor compartment.
+We will first identify those cells that were uniquely labeled. In the next step,
+we will identify those cells that were labeled twice AND were labeled as Tumor
+cells. These cells will be assigned their immune cell label. Finally, we will
+save the unique labels within the original SpatialExperiment
object.
Of note: this concatenation strategy is specific for cell phenotypes contained in this +example dataset. The gated cell labels might need to be processed in a slightly +different way when working with other samples.
+For these tasks, we will define a filter function:
+filter_labels <- function(object,
+ label = "cytomapper_CellLabel") {
+ cur_tab <- unclass(table(colnames(object), object[[label]]))
+
+ cur_labels <- colnames(cur_tab)[apply(cur_tab, 1, which.max)]
+ names(cur_labels) <- rownames(cur_tab)
+
+ cur_labels <- cur_labels[rowSums(cur_tab) == 1]
+
+ return(cur_labels)
+}
This function is now applied to all cells and then only non-tumor cells.
+labels <- filter_labels(concat_spe)
+
+cur_spe <- concat_spe[,concat_spe$cytomapper_CellLabel != "Tumor"]
+
+non_tumor_labels <- filter_labels(cur_spe)
+
+additional_cells <- setdiff(names(non_tumor_labels), names(labels))
+
+final_labels <- c(labels, non_tumor_labels[additional_cells])
+
+# Transfer labels to SPE object
+spe_labels <- rep("unlabeled", ncol(spe))
+names(spe_labels) <- colnames(spe)
+spe_labels[names(final_labels)] <- final_labels
+spe$cell_labels <- spe_labels
+
+# Number of cells labeled per patient
+table(spe$cell_labels, spe$patient_id)
##
+## Patient1 Patient2 Patient3 Patient4
+## Bcell 152 131 234 263
+## BnTcell 396 37 240 1029
+## CD4 45 342 167 134
+## CD8 60 497 137 128
+## Myeloid 183 378 672 517
+## Neutrophil 97 4 17 16
+## Plasma_cell 34 536 87 59
+## Stroma 84 37 85 236
+## Treg 139 149 49 24
+## Tumor 2342 906 1618 1133
+## unlabeled 7214 9780 7826 9580
+Based on these labels, we can now train a random forest classifier to classify +all remaining, unlabeled cells.
+In this section, we will use the +caret framework for machine +learning in R. This package provides an interface to train a number of +regression and classification models in a coherent fashion. We use a random +forest classifier due to low number of parameters, high speed and an observed +high performance for cell type classification (Hoch et al. 2022).
+In the following section, we will first split the SpatialExperiment
object
+into labeled and unlabeled cells. Based on the labeled cells, we split
+the data into a train (75% of the data) and test (25% of the data) dataset.
+We currently do not provide an independently labeled validation dataset.
The caret
package provides the trainControl
function, which specifies model
+training parameters and the train
function, which performs the actual model
+training. While training the model, we also want to estimate the best model
+parameters. In the case of the chosen random forest model (method = "rf"
), we
+only need to estimate a single parameters (mtry
) which corresponds to the
+number of variables randomly sampled as candidates at each split. To estimate
+the best parameter, we will perform a 5-fold cross validation (set within
+trainControl
) over a tune length of 5 entries to mtry
. In the following
+code chunk, the createDataPartition
and the train
function are not deterministic,
+meaning they return different results across different runs. We therefore set
+a seed
here for both functions.
library(caret)
+
+# Split between labeled and unlabeled cells
+lab_spe <- spe[,spe$cell_labels != "unlabeled"]
+unlab_spe <- spe[,spe$cell_labels == "unlabeled"]
+
+# Randomly split into train and test data
+set.seed(221029)
+trainIndex <- createDataPartition(factor(lab_spe$cell_labels), p = 0.75)
+
+train_spe <- lab_spe[,trainIndex$Resample1]
+test_spe <- lab_spe[,-trainIndex$Resample1]
+
+# Define fit parameters for 5-fold cross validation
+fitControl <- trainControl(method = "cv",
+ number = 5)
+
+# Select the arsinh-transformed counts for training
+cur_mat <- t(assay(train_spe, "exprs")[rowData(train_spe)$use_channel,])
+
+# Train a random forest classifier
+rffit <- train(x = cur_mat,
+ y = factor(train_spe$cell_labels),
+ method = "rf", ntree = 1000,
+ tuneLength = 5,
+ trControl = fitControl)
+
+rffit
## Random Forest
+##
+## 10049 samples
+## 37 predictor
+## 10 classes: 'Bcell', 'BnTcell', 'CD4', 'CD8', 'Myeloid', 'Neutrophil', 'Plasma_cell', 'Stroma', 'Treg', 'Tumor'
+##
+## No pre-processing
+## Resampling: Cross-Validated (5 fold)
+## Summary of sample sizes: 8040, 8039, 8038, 8038, 8041
+## Resampling results across tuning parameters:
+##
+## mtry Accuracy Kappa
+## 2 0.9643726 0.9524051
+## 10 0.9780071 0.9707483
+## 19 0.9801973 0.9736577
+## 28 0.9787052 0.9716635
+## 37 0.9779095 0.9705890
+##
+## Accuracy was used to select the optimal model using the largest value.
+## The final value used for the model was mtry = 19.
+Training the classifier took +9.66 minutes.
+We next observe the accuracy of the classifer when predicting cell phenotypes +across the cross-validation and when applying the classifier to the test +dataset.
+First, we can visualize the classification accuracy during parameter +tuning:
+ggplot(rffit) +
+ geom_errorbar(data = rffit$results,
+ aes(ymin = Accuracy - AccuracySD,
+ ymax = Accuracy + AccuracySD),
+ width = 0.4) +
+ theme_classic(base_size = 15)
The best value for mtry
is 19 and is used when predicting new data.
It is often recommended to visualize the variable importance of the +classifier. The following plot specifies which variables (markers) are +most important for classifying the data.
+ + +As expected, the markers that were used for gating (Ecad, CD3, CD20, HLADR, +CD8a, CD38, FOXP3) were important for classification.
+To assess the accuracy, sensitivity, specificity, among other quality measures of +the classifier, we will now predict cell phenotypes in the test data.
+# Select the arsinh-transformed counts of the test data
+cur_mat <- t(assay(test_spe, "exprs")[rowData(test_spe)$use_channel,])
+
+# Predict the cell phenotype labels of the test data
+set.seed(231019)
+cur_pred <- predict(rffit, newdata = cur_mat)
While the overall classification accuracy can appear high, we also want
+to check if each cell phenotype class is correctly predicted.
+For this, we will calculate the confusion matrix between predicted and actual
+cell labels. This measure may highlight individual cell phenotype classes that
+were not correctly predicted by the classifier. When setting mode = "everything"
,
+the confusionMatrix
function returns all available prediction measures including
+sensitivity, specificity, precision, recall and the F1 score per cell
+phenotype class.
cm <- confusionMatrix(data = cur_pred,
+ reference = factor(test_spe$cell_labels),
+ mode = "everything")
+
+cm
## Confusion Matrix and Statistics
+##
+## Reference
+## Prediction Bcell BnTcell CD4 CD8 Myeloid Neutrophil Plasma_cell Stroma
+## Bcell 186 2 0 0 0 0 6 0
+## BnTcell 4 423 1 0 0 0 0 0
+## CD4 0 0 163 0 0 2 3 2
+## CD8 0 0 0 199 0 0 8 0
+## Myeloid 0 0 2 1 437 0 0 0
+## Neutrophil 0 0 0 0 0 30 0 0
+## Plasma_cell 1 0 3 2 0 0 158 0
+## Stroma 0 0 2 0 0 0 0 108
+## Treg 0 0 0 0 0 0 3 0
+## Tumor 4 0 1 3 0 1 1 0
+## Reference
+## Prediction Treg Tumor
+## Bcell 0 1
+## BnTcell 0 1
+## CD4 0 5
+## CD8 0 3
+## Myeloid 0 0
+## Neutrophil 0 0
+## Plasma_cell 1 0
+## Stroma 0 0
+## Treg 89 2
+## Tumor 0 1487
+##
+## Overall Statistics
+##
+## Accuracy : 0.9806
+## 95% CI : (0.9753, 0.985)
+## No Information Rate : 0.4481
+## P-Value [Acc > NIR] : < 2.2e-16
+##
+## Kappa : 0.9741
+##
+## Mcnemar's Test P-Value : NA
+##
+## Statistics by Class:
+##
+## Class: Bcell Class: BnTcell Class: CD4 Class: CD8
+## Sensitivity 0.95385 0.9953 0.94767 0.97073
+## Specificity 0.99714 0.9979 0.99622 0.99650
+## Pos Pred Value 0.95385 0.9860 0.93143 0.94762
+## Neg Pred Value 0.99714 0.9993 0.99716 0.99809
+## Precision 0.95385 0.9860 0.93143 0.94762
+## Recall 0.95385 0.9953 0.94767 0.97073
+## F1 0.95385 0.9906 0.93948 0.95904
+## Prevalence 0.05830 0.1271 0.05142 0.06129
+## Detection Rate 0.05561 0.1265 0.04873 0.05949
+## Detection Prevalence 0.05830 0.1283 0.05232 0.06278
+## Balanced Accuracy 0.97549 0.9966 0.97195 0.98361
+## Class: Myeloid Class: Neutrophil Class: Plasma_cell
+## Sensitivity 1.0000 0.909091 0.88268
+## Specificity 0.9990 1.000000 0.99779
+## Pos Pred Value 0.9932 1.000000 0.95758
+## Neg Pred Value 1.0000 0.999095 0.99340
+## Precision 0.9932 1.000000 0.95758
+## Recall 1.0000 0.909091 0.88268
+## F1 0.9966 0.952381 0.91860
+## Prevalence 0.1306 0.009865 0.05351
+## Detection Rate 0.1306 0.008969 0.04723
+## Detection Prevalence 0.1315 0.008969 0.04933
+## Balanced Accuracy 0.9995 0.954545 0.94024
+## Class: Stroma Class: Treg Class: Tumor
+## Sensitivity 0.98182 0.98889 0.9920
+## Specificity 0.99938 0.99846 0.9946
+## Pos Pred Value 0.98182 0.94681 0.9933
+## Neg Pred Value 0.99938 0.99969 0.9935
+## Precision 0.98182 0.94681 0.9933
+## Recall 0.98182 0.98889 0.9920
+## F1 0.98182 0.96739 0.9927
+## Prevalence 0.03288 0.02691 0.4481
+## Detection Rate 0.03229 0.02661 0.4445
+## Detection Prevalence 0.03288 0.02810 0.4475
+## Balanced Accuracy 0.99060 0.99368 0.9933
+To easily visualize these results, we can now plot the true positive rate +(sensitivity) versus the false positive rate (1 - specificity). The size of the +point is determined by the number of true positives divided by the total number +of cells.
+library(tidyverse)
+
+data.frame(cm$byClass) %>%
+ mutate(class = sub("Class: ", "", rownames(cm$byClass))) %>%
+ ggplot() +
+ geom_point(aes(1 - Specificity, Sensitivity,
+ size = Detection.Rate,
+ fill = class),
+ shape = 21) +
+ scale_fill_manual(values = metadata(spe)$color_vectors$celltype) +
+ theme_classic(base_size = 15) +
+ ylab("Sensitivity (TPR)") +
+ xlab("1 - Specificity (FPR)")
We observe high sensitivity and specificity for most cell types. Plasma cells +show the lowest true positive rate with 88% being sufficiently high.
+Finally, to observe which cell phenotypes were wrongly classified, we can visualize +the distribution of classification probabilities per cell phenotype class:
+set.seed(231019)
+cur_pred <- predict(rffit,
+ newdata = cur_mat,
+ type = "prob")
+cur_pred$truth <- factor(test_spe$cell_labels)
+
+cur_pred %>%
+ pivot_longer(cols = Bcell:Tumor) %>%
+ ggplot() +
+ geom_boxplot(aes(x = name, y = value, fill = name), outlier.size = 0.5) +
+ facet_wrap(. ~ truth, ncol = 1) +
+ scale_fill_manual(values = metadata(spe)$color_vectors$celltype) +
+ theme(panel.background = element_blank(),
+ axis.text.x = element_text(angle = 45, hjust = 1))
The boxplots indicate the classification probabilities per class. The classifier +is well trained if classification probabilities are only high for the one +specific class.
+In the final section, we will now use the tuned and tested random forest +classifier to predict the cell phenotypes of the unlabeled data.
+First, we predict the cell phenotypes and extract their classification +probabilities.
+# Select the arsinh-transformed counts of the unlabeled data for prediction
+cur_mat <- t(assay(unlab_spe, "exprs")[rowData(unlab_spe)$use_channel,])
+
+# Predict the cell phenotype labels of the unlabeled data
+set.seed(231014)
+cell_class <- as.character(predict(rffit,
+ newdata = cur_mat,
+ type = "raw"))
+names(cell_class) <- rownames(cur_mat)
+
+table(cell_class)
## cell_class
+## Bcell BnTcell CD4 CD8 Myeloid Neutrophil
+## 817 979 3620 2716 6302 559
+## Plasma_cell Stroma Treg Tumor
+## 2692 4904 1170 10641
+# Extract prediction probabilities for each cell
+set.seed(231014)
+cell_prob <- predict(rffit,
+ newdata = cur_mat,
+ type = "prob")
Each cell is assigned to the class with highest probability. There are however
+cases, where the highest probability is low meaning the cell can not be uniquely
+assigned to a class. We next want to identify these cells and label them as
+“undefined”. Here, we select a maximum classification probability threshold
+of 40% but this threshold needs to be adjusted for other datasets. The adjusted
+cell labels are then stored in the SpatialExperiment
object.
library(ggridges)
+
+# Distribution of maximum probabilities
+tibble(max_prob = rowMax(as.matrix(cell_prob)),
+ type = cell_class) %>%
+ ggplot() +
+ geom_density_ridges(aes(x = max_prob, y = cell_class, fill = cell_class)) +
+ scale_fill_manual(values = metadata(spe)$color_vectors$celltype) +
+ theme_classic(base_size = 15) +
+ xlab("Maximum probability") +
+ ylab("Cell type") +
+ xlim(c(0,1.2))
## Picking joint bandwidth of 0.0238
+
+# Label undefined cells
+cell_class[rowMax(as.matrix(cell_prob)) < 0.4] <- "undefined"
+
+# Store labels in SpatialExperiment onject
+cell_labels <- spe$cell_labels
+cell_labels[colnames(unlab_spe)] <- cell_class
+spe$celltype <- cell_labels
+
+table(spe$celltype, spe$patient_id)
##
+## Patient1 Patient2 Patient3 Patient4
+## Bcell 179 527 431 458
+## BnTcell 416 586 594 1078
+## CD4 391 1370 699 1385
+## CD8 518 1365 479 1142
+## Myeloid 1369 2197 1723 2731
+## Neutrophil 348 9 148 176
+## Plasma_cell 650 2122 351 274
+## Stroma 633 676 736 3261
+## Treg 553 409 243 310
+## Tumor 5560 3334 5648 2083
+## undefined 129 202 80 221
+We can now compare the cell labels derived by classification to the different +clustering strategies. The first comparison is against the clustering results +using the asinh-transformed counts.
+tab1 <- table(spe$celltype,
+ paste("Rphenograph", spe$pg_clusters))
+tab2 <- table(spe$celltype,
+ paste("SNN", spe$nn_clusters))
+tab3 <- table(spe$celltype,
+ paste("SOM", spe$som_clusters))
+
+pheatmap(log10(tab1 + 10), color = viridis(100))
We can see that Tumor and Myeloid cells span multiple clusters while +Neutrophiles are detected as an individual cluster by all clustering approaches.
+We next compare the cell classification against clustering results using the +integrated cells.
+tab1 <- table(spe$celltype,
+ paste("Rphenograph", spe$pg_clusters_corrected))
+tab2 <- table(spe$celltype,
+ paste("SNN", spe$nn_clusters_corrected))
+tab3 <- table(spe$celltype,
+ paste("SOM", spe$som_clusters_corrected))
+
+pheatmap(log10(tab1 + 10), color = viridis(100))
We observe a high agreement between the shared nearest neighbor clustering +approach using the integrated cells and the cell phenotypes derived by +classification.
+In the next sections, we will highlight visualization strategies to verify the +correctness of the phenotyping approach. Specifically, Section +11.2.3 shows how to outline identified cell phenotypes on +composite images.
+Finally, we save the updated SpatialExperiment
object.
## R version 4.3.2 (2023-10-31)
+## Platform: x86_64-pc-linux-gnu (64-bit)
+## Running under: Ubuntu 22.04.3 LTS
+##
+## Matrix products: default
+## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
+## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
+##
+## locale:
+## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
+## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
+## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
+## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
+## [9] LC_ADDRESS=C LC_TELEPHONE=C
+## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
+##
+## time zone: Etc/UTC
+## tzcode source: system (glibc)
+##
+## attached base packages:
+## [1] stats4 stats graphics grDevices utils datasets methods
+## [8] base
+##
+## other attached packages:
+## [1] testthat_3.2.0 ggridges_0.5.4
+## [3] lubridate_1.9.3 forcats_1.0.0
+## [5] stringr_1.5.0 purrr_1.0.2
+## [7] readr_2.1.4 tidyr_1.3.0
+## [9] tibble_3.2.1 tidyverse_2.0.0
+## [11] caret_6.0-94 lattice_0.21-9
+## [13] cytomapper_1.14.0 EBImage_4.44.0
+## [15] dplyr_1.1.3 gridExtra_2.3
+## [17] pheatmap_1.0.12 patchwork_1.1.3
+## [19] ConsensusClusterPlus_1.66.0 kohonen_3.0.12
+## [21] CATALYST_1.26.0 scran_1.30.0
+## [23] scuttle_1.12.0 BiocParallel_1.36.0
+## [25] bluster_1.12.0 viridis_0.6.4
+## [27] viridisLite_0.4.2 dittoSeq_1.14.0
+## [29] ggplot2_3.4.4 igraph_1.5.1
+## [31] Rphenograph_0.99.1.9003 SpatialExperiment_1.12.0
+## [33] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
+## [35] Biobase_2.62.0 GenomicRanges_1.54.1
+## [37] GenomeInfoDb_1.38.0 IRanges_2.36.0
+## [39] S4Vectors_0.40.1 BiocGenerics_0.48.0
+## [41] MatrixGenerics_1.14.0 matrixStats_1.0.0
+##
+## loaded via a namespace (and not attached):
+## [1] bitops_1.0-7 RColorBrewer_1.1-3
+## [3] doParallel_1.0.17 tools_4.3.2
+## [5] backports_1.4.1 utf8_1.2.4
+## [7] R6_2.5.1 HDF5Array_1.30.0
+## [9] rhdf5filters_1.14.0 GetoptLong_1.0.5
+## [11] withr_2.5.1 sp_2.1-1
+## [13] cli_3.6.1 sandwich_3.0-2
+## [15] labeling_0.4.3 sass_0.4.7
+## [17] nnls_1.5 mvtnorm_1.2-3
+## [19] randomForest_4.7-1.1 proxy_0.4-27
+## [21] systemfonts_1.0.5 colorRamps_2.3.1
+## [23] svglite_2.1.2 scater_1.30.0
+## [25] parallelly_1.36.0 plotrix_3.8-2
+## [27] limma_3.58.0 flowCore_2.14.0
+## [29] rstudioapi_0.15.0 generics_0.1.3
+## [31] shape_1.4.6 gtools_3.9.4
+## [33] car_3.1-2 Matrix_1.6-1.1
+## [35] RProtoBufLib_2.14.0 waldo_0.5.1
+## [37] ggbeeswarm_0.7.2 fansi_1.0.5
+## [39] abind_1.4-5 terra_1.7-55
+## [41] lifecycle_1.0.3 multcomp_1.4-25
+## [43] yaml_2.3.7 edgeR_4.0.1
+## [45] carData_3.0-5 rhdf5_2.46.0
+## [47] recipes_1.0.8 SparseArray_1.2.0
+## [49] Rtsne_0.16 grid_4.3.2
+## [51] promises_1.2.1 dqrng_0.3.1
+## [53] crayon_1.5.2 shinydashboard_0.7.2
+## [55] beachmat_2.18.0 cowplot_1.1.1
+## [57] magick_2.8.1 pillar_1.9.0
+## [59] knitr_1.44 ComplexHeatmap_2.18.0
+## [61] metapod_1.10.0 rjson_0.2.21
+## [63] future.apply_1.11.0 codetools_0.2-19
+## [65] glue_1.6.2 data.table_1.14.8
+## [67] vctrs_0.6.4 png_0.1-8
+## [69] gtable_0.3.4 cachem_1.0.8
+## [71] gower_1.0.1 xfun_0.40
+## [73] S4Arrays_1.2.0 mime_0.12
+## [75] prodlim_2023.08.28 survival_3.5-7
+## [77] timeDate_4022.108 iterators_1.0.14
+## [79] cytolib_2.14.0 hardhat_1.3.0
+## [81] lava_1.7.2.1 statmod_1.5.0
+## [83] ellipsis_0.3.2 TH.data_1.1-2
+## [85] ipred_0.9-14 nlme_3.1-163
+## [87] rprojroot_2.0.3 bslib_0.5.1
+## [89] irlba_2.3.5.1 svgPanZoom_0.3.4
+## [91] vipor_0.4.5 rpart_4.1.21
+## [93] colorspace_2.1-0 raster_3.6-26
+## [95] nnet_7.3-19 tidyselect_1.2.0
+## [97] compiler_4.3.2 BiocNeighbors_1.20.0
+## [99] desc_1.4.2 DelayedArray_0.28.0
+## [101] bookdown_0.36 scales_1.2.1
+## [103] tiff_0.1-11 digest_0.6.33
+## [105] fftwtools_0.9-11 rmarkdown_2.25
+## [107] XVector_0.42.0 htmltools_0.5.6.1
+## [109] pkgconfig_2.0.3 jpeg_0.1-10
+## [111] sparseMatrixStats_1.14.0 fastmap_1.1.1
+## [113] rlang_1.1.1 GlobalOptions_0.1.2
+## [115] htmlwidgets_1.6.2 shiny_1.7.5.1
+## [117] DelayedMatrixStats_1.24.0 farver_2.1.1
+## [119] jquerylib_0.1.4 zoo_1.8-12
+## [121] jsonlite_1.8.7 ModelMetrics_1.2.2.2
+## [123] BiocSingular_1.18.0 RCurl_1.98-1.12
+## [125] magrittr_2.0.3 GenomeInfoDbData_1.2.11
+## [127] Rhdf5lib_1.24.0 munsell_0.5.0
+## [129] Rcpp_1.0.11 ggnewscale_0.4.9
+## [131] pROC_1.18.4 stringi_1.7.12
+## [133] brio_1.1.3 zlibbioc_1.48.0
+## [135] MASS_7.3-60 plyr_1.8.9
+## [137] listenv_0.9.0 parallel_4.3.2
+## [139] ggrepel_0.9.4 splines_4.3.2
+## [141] hms_1.1.3 circlize_0.4.15
+## [143] locfit_1.5-9.8 ggpubr_0.6.0
+## [145] ggsignif_0.6.4 pkgload_1.3.3
+## [147] reshape2_1.4.4 ScaledMatrix_1.10.0
+## [149] XML_3.99-0.14 drc_3.0-1
+## [151] evaluate_0.22 tzdb_0.4.0
+## [153] foreach_1.5.2 tweenr_2.0.2
+## [155] httpuv_1.6.12 RANN_2.6.1
+## [157] polyclip_1.10-6 future_1.33.0
+## [159] clue_0.3-65 ggforce_0.4.1
+## [161] rsvd_1.0.5 broom_1.0.5
+## [163] xtable_1.8-4 e1071_1.7-13
+## [165] rstatix_0.7.2 later_1.3.1
+## [167] class_7.3-22 FlowSOM_2.10.0
+## [169] beeswarm_0.4.0 cluster_2.1.4
+## [171] timechange_0.2.0 globals_0.16.2
+The following section discusses possible quality indicators for data obtained +by IMC and other highly multiplexed imaging technologies. Here, we will focus +on describing quality metrics on the single-cell as well as image level.
+ +The first step after image segmentation is to observe its accuracy.
+Without having ground-truth data readily available, a common approach to
+segmentation quality control is to overlay segmentation masks on composite images
+displaying channels that were used for segmentation.
+The cytomapper
+package supports exactly this tasks by using the plotPixels
function.
Here, we select 3 random images and perform image- and channel-wise +normalization (channels are first min-max normalized and scaled to a range of +0-1 before clipping the maximum intensity to 0.2).
+library(cytomapper)
+set.seed(20220118)
+img_ids <- sample(seq_along(images), 3)
+
+# Normalize and clip images
+cur_images <- images[img_ids]
+cur_images <- cytomapper::normalize(cur_images, separateImages = TRUE)
+cur_images <- cytomapper::normalize(cur_images, inputRange = c(0, 0.2))
+
+plotPixels(cur_images,
+ mask = masks[img_ids],
+ img_id = "sample_id",
+ missing_colour = "white",
+ colour_by = c("CD163", "CD20", "CD3", "Ecad", "DNA1"),
+ colour = list(CD163 = c("black", "yellow"),
+ CD20 = c("black", "red"),
+ CD3 = c("black", "green"),
+ Ecad = c("black", "cyan"),
+ DNA1 = c("black", "blue")),
+ image_title = NULL,
+ legend = list(colour_by.title.cex = 0.7,
+ colour_by.labels.cex = 0.7))
We can see that nuclei are centered within the segmentation masks and all cell
+types are correctly segmented (note: to zoom into the image you can right click
+and select Open Image in New Tab
). A common challenge here is to segment large (e.g.,
+epithelial cells - in cyan) versus small (e.g., B cells - in red). However, the
+segmentation approach here appears to correctly segment cells across different
+sizes.
An easier and interactive way of observing segmentation quality is to use the +interactive image viewer provided by the +cytoviewer R/Bioconductor +package (Meyer, Eling, and Bodenmiller 2023). Under “Image-level” > “Basic controls”, up to six markers +can be selected for visualization. The contrast of each marker can be adjusted. +Under “Image-level” > “Advanced controls”, click the “Show cell outlines” box +to outline segmented cells on the images.
+library(cytoviewer)
+
+app <- cytoviewer(image = images,
+ mask = masks,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id")
+
+if (interactive()) {
+ shiny::runApp(app, launch.browser = TRUE)
+}
An additional approach to observe cell segmentation quality and potentially also +antibody specificity issues is to visualize single-cell expression in form of a +heatmap. Here, we sub-sample the dataset to 2000 cells for visualization +purposes and overlay the cancer type from which the cells were extracted.
+library(dittoSeq)
+library(viridis)
+cur_cells <- sample(seq_len(ncol(spe)), 2000)
+
+dittoHeatmap(spe[,cur_cells],
+ genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs",
+ cluster_cols = TRUE,
+ scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = "indication",
+ annotation_colors = list(indication = metadata(spe)$color_vectors$indication))
We can differentiate between epithelial cells (Ecad+) and immune cells +(CD45RO+). Some of the markers are detected in specific cells (e.g., Ki67, CD20, +Ecad) while others are more broadly expressed across cells (e.g., HLADR, B2M, +CD4).
+Image-level quality control is often performed using tools that offer a +graphical user interface such as QuPath, +FIJI and the previously mentioned +cytoviewer package. Viewers +that were specifically developed for IMC data can be seen +here. In this +section, we will specifically focus on quantitative metrics to assess image +quality.
+It is often of interest to calculate the signal-to-noise ratio (SNR) for +individual channels and markers. Here, we define the SNR as:
+\[SNR = I_s/I_n\]
+where \(I_s\) is the intensity of the signal (mean intensity of pixels with true
+signal) and \(I_n\) is the intensity of the noise (mean intensity of pixels
+containing noise). This definition of the SNR is just one of many and other
+measures can be applied. Finding a threshold that separates pixels containing
+signal and pixels containing noise is not trivial and different approaches can
+be chosen. Here, we use the otsu
thresholding approach to find pixels of the
+“foreground” (i.e., signal) and “background” (i.e., noise). The SNR is then
+defined as the mean intensity of foreground pixels divided by the mean intensity
+of background pixels. We compute this measure as well as the mean signal
+intensity per image. The plot below shows the average SNR versus the average
+signal intensity across all images.
library(tidyverse)
+library(ggrepel)
+library(EBImage)
+
+cur_snr <- lapply(names(images), function(x){
+ img <- images[[x]]
+ mat <- apply(img, 3, function(ch){
+ # Otsu threshold
+ thres <- otsu(ch, range = c(min(ch), max(ch)), levels = 65536)
+ # Signal-to-noise ratio
+ snr <- mean(ch[ch > thres]) / mean(ch[ch <= thres])
+ # Signal intensity
+ ps <- mean(ch[ch > thres])
+
+ return(c(snr = snr, ps = ps))
+ })
+ t(mat) %>% as.data.frame() %>%
+ mutate(image = x,
+ marker = colnames(mat)) %>%
+ pivot_longer(cols = c(snr, ps))
+})
+
+cur_snr <- do.call(rbind, cur_snr)
+
+cur_snr %>%
+ group_by(marker, name) %>%
+ summarize(log_mean = log2(mean(value))) %>%
+ pivot_wider(names_from = name, values_from = log_mean) %>%
+ ggplot() +
+ geom_point(aes(ps, snr)) +
+ geom_label_repel(aes(ps, snr, label = marker)) +
+ theme_minimal(base_size = 15) + ylab("Signal-to-noise ratio [log2]") +
+ xlab("Signal intensity [log2]")
We observe PD1, LAG3 and cleaved PARP to have high SNR but low signal intensity +meaning that in general these markers are not abundantly expressed. The Iridium +intercalator (here marked as DNA1 and DNA2) has the highest signal intensity +but low SNR. This might be due to staining differences between individual nuclei +where some nuclei are considered as background. We do however observe high +SNR and sufficient signal intensity for the majority of markers.
+Otsu thesholding and SNR calculation does not perform well if the markers are +lowly abundant. In the next code chunk, we will remove markers that have +a positive signal of below 2 per image.
+cur_snr <- cur_snr %>%
+ pivot_wider(names_from = name, values_from = value) %>%
+ filter(ps > 2) %>%
+ pivot_longer(cols = c(snr, ps))
+
+cur_snr %>%
+ group_by(marker, name) %>%
+ summarize(log_mean = log2(mean(value))) %>%
+ pivot_wider(names_from = name, values_from = log_mean) %>%
+ ggplot() +
+ geom_point(aes(ps, snr)) +
+ geom_label_repel(aes(ps, snr, label = marker)) +
+ theme_minimal(base_size = 15) + ylab("Signal-to-noise ratio [log2]") +
+ xlab("Signal intensity [log2]")
This visualization shows a reduces SNR for PD1, LAG3 and cleaved PARP which was +previously inflated due to low signal.
+Another quality indicator is the image area covered by cells (or biological
+tissue). This metric identifies ROIs where little cells are present, possibly
+hinting at incorrect selection of the ROI. We can compute the percentage of
+covered image area using the metadata contained in the SpatialExperiment
+object:
cell_density <- colData(spe) %>%
+ as.data.frame() %>%
+ group_by(sample_id) %>%
+ # Compute the number of pixels covered by cells and
+ # the total number of pixels
+ summarize(cell_area = sum(area),
+ no_pixels = mean(width_px) * mean(height_px)) %>%
+ # Divide the total number of pixels
+ # by the number of pixels covered by cells
+ mutate(covered_area = cell_area / no_pixels)
+
+# Visualize the image area covered by cells per image
+ggplot(cell_density) +
+ geom_point(aes(reorder(sample_id,covered_area), covered_area)) +
+ theme_minimal(base_size = 15) +
+ theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 15)) +
+ ylim(c(0, 1)) +
+ ylab("% covered area") + xlab("")
We observe that two of the 14 images show unusually low cell coverage. These
+two images can now be visualized using cytomapper
.
# Normalize and clip images
+cur_images <- images[c("Patient4_005", "Patient4_007")]
+cur_images <- cytomapper::normalize(cur_images, separateImages = TRUE)
+cur_images <- cytomapper::normalize(cur_images, inputRange = c(0, 0.2))
+
+plotPixels(cur_images,
+ mask = masks[c("Patient4_005", "Patient4_007")],
+ img_id = "sample_id",
+ missing_colour = "white",
+ colour_by = c("CD163", "CD20", "CD3", "Ecad", "DNA1"),
+ colour = list(CD163 = c("black", "yellow"),
+ CD20 = c("black", "red"),
+ CD3 = c("black", "green"),
+ Ecad = c("black", "cyan"),
+ DNA1 = c("black", "blue")),
+ legend = list(colour_by.title.cex = 0.7,
+ colour_by.labels.cex = 0.7))
These two images display less dense tissue structure but overall the images are +intact and appear to be segmented correctly.
+Finally, it can be beneficial to visualize the mean marker expression per image
+to identify images with outlying marker expression. This check does not
+indicate image quality per se but can highlight biological differences. Here,
+we will use the aggregateAcrossCells
function of the
+scuttle package to compute the mean expression per
+image. For visualization purposes, we again asinh
transform the mean expression
+values.
library(scuttle)
+
+image_mean <- aggregateAcrossCells(spe,
+ ids = spe$sample_id,
+ statistics="mean",
+ use.assay.type = "counts")
+assay(image_mean, "exprs") <- asinh(counts(image_mean))
+
+dittoHeatmap(image_mean, genes = rownames(spe)[rowData(spe)$use_channel],
+ assay = "exprs", cluster_cols = TRUE, scale = "none",
+ heatmap.colors = viridis(100),
+ annot.by = c("indication", "patient_id", "ROI"),
+ annotation_colors = list(indication = metadata(spe)$color_vectors$indication,
+ patient_id = metadata(spe)$color_vectors$patient_id,
+ ROI = metadata(spe)$color_vectors$ROI),
+ show_colnames = TRUE)
We observe extensive biological variation across the 14 images specifically for +some of the cell phenotype markers including the macrophage marker CD206, the B +cell marker CD20, the neutrophil marker CD15, and the proliferation marker Ki67. +These differences will be further studied in the following chapters.
+In the following paragraphs we will look at different metrics and visualization +approaches to assess data quality (as well as biological differences) on the +single-cell level.
+Related to the signal-to-noise ratio (SNR) calculated above on the pixel-level, +a similar measure can be derived on the single-cell level. Here, we will use +a two component Gaussian mixture model for each marker to find cells +with positive and negative expression. The SNR is defined as:
+\[SNR = I_s/I_n\]
+where \(I_s\) is the intensity of the signal (mean intensity of cells with
+positive signal) and \(I_n\) is the intensity of the noise (mean intensity of
+cells lacking expression). To define cells with positive and negative marker
+expression, we fit the mixture model across the transformed counts of all cells
+contained in the SpatialExperiment
object. Next, for each marker we calculate
+the mean of the non-transformed counts for the positive and the negative cells.
+The SNR is then the ratio between the mean of the positive signal and the mean
+of the negative signal.
library(mclust)
+
+set.seed(220224)
+mat <- sapply(seq_len(nrow(spe)), function(x){
+ cur_exprs <- assay(spe, "exprs")[x,]
+ cur_counts <- assay(spe, "counts")[x,]
+
+ cur_model <- Mclust(cur_exprs, G = 2)
+ mean1 <- mean(cur_counts[cur_model$classification == 1])
+ mean2 <- mean(cur_counts[cur_model$classification == 2])
+
+ signal <- ifelse(mean1 > mean2, mean1, mean2)
+ noise <- ifelse(mean1 > mean2, mean2, mean1)
+
+ return(c(snr = signal/noise, ps = signal))
+})
+
+cur_snr <- t(mat) %>% as.data.frame() %>%
+ mutate(marker = rownames(spe))
+
+cur_snr %>% ggplot() +
+ geom_point(aes(log2(ps), log2(snr))) +
+ geom_label_repel(aes(log2(ps), log2(snr), label = marker)) +
+ theme_minimal(base_size = 15) + ylab("Signal-to-noise ratio [log2]") +
+ xlab("Signal intensity [log2]")
Next, we observe the distributions of cell size across the individual images. +Differences in cell size distributions can indicate segmentation biases due to +differences in cell density or can indicate biological differences due to cell +type compositions (tumor cells tend to be larger than immune cells).
+dittoPlot(spe, var = "area",
+ group.by = "sample_id",
+ plots = "boxplot") +
+ ylab("Cell area") + xlab("")
## Min. 1st Qu. Median Mean 3rd Qu. Max.
+## 3.00 47.00 70.00 76.38 98.00 466.00
+The median cell size is 70 pixels with a median major axis
+length of 11.3. The largest cell
+has an area of 466 pixels which relates to a diameter of
+21.6 pixels assuming a circular shape.
+Overall, the distribution of cell sizes is similar across images with images from
+Patient4_005
and Patient4_007
showing a reduced average cell size. These
+images contain fewer tumor cells which can explain the smaller average cell size.
We detect very small cells in the dataset and will remove them. +The chosen threshold is arbitrary and needs to be adjusted per dataset.
+ +## [1] 65
+
+Another quality indicator can be an absolute measure of cell density often +reported in cells per mm\(^2\).
+cell_density <- colData(spe) %>%
+ as.data.frame() %>%
+ group_by(sample_id) %>%
+ summarize(cell_count = n(),
+ no_pixels = mean(width_px) * mean(height_px)) %>%
+ mutate(cells_per_mm2 = cell_count/(no_pixels/1000000))
+
+ggplot(cell_density) +
+ geom_point(aes(reorder(sample_id,cells_per_mm2), cells_per_mm2)) +
+ theme_minimal(base_size = 15) +
+ theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) +
+ ylab("Cells per mm2") + xlab("")
The number of cells per mm\(^2\) varies across images which also depends on the +number of tumor/non-tumor cells. As we can see in the following sections, some +immune cells appear in cell dense regions while other stromal regions are less +dense.
+The data presented here originate from samples from different locations with +potential differences in pre-processing and each sample was stained individually. +These (and other) technical aspects can induce staining differences between +samples or batches of samples. Observing potential staining differences can be +crucial to assess data quality. We will use ridgeline visualizations to check +differences in staining patterns:
+multi_dittoPlot(spe, vars = rownames(spe)[rowData(spe)$use_channel],
+ group.by = "patient_id", plots = "ridgeplot",
+ assay = "exprs",
+ color.panel = metadata(spe)$color_vectors$patient_id)
We observe variations in the distributions of marker expression across patients. +These variations may arise partly from different abundances of cells in +different images (e.g., Patient3 may have higher numbers of CD11c+ and PD1+ +cells) as well as staining differences between samples. While most of the +selected markers are specifically expressed in immune cell subtypes, we can see +that E-Cadherin (a marker for epithelial (tumor) cells) shows a similar +expression range across all patients.
+Finally, we will use non-linear dimensionality reduction methods to project
+cells from a high-dimensional (40) down to a low-dimensional (2) space. For this
+the scater package provides the runUMAP
and
+runTSNE
function. To ensure reproducibility, we will need to set a seed;
+however different seeds and different parameter settings (e.g., the perplexity
+parameter in the runTSNE
function) need to be tested to avoid
+over-interpretation of visualization artefacts. For dimensionality reduction, we
+will use all channels that show biological variation across the dataset.
+However, marker selection can be performed with different biological questions
+in mind. Here, both the runUMAP
and runTSNE
function are not deterministic,
+meaning they produce different results across different runs. We therefore
+set a seed
in this chunk for reproducibility purposes.
library(scater)
+
+set.seed(220225)
+spe <- runUMAP(spe, subset_row = rowData(spe)$use_channel, exprs_values = "exprs")
+spe <- runTSNE(spe, subset_row = rowData(spe)$use_channel, exprs_values = "exprs")
After dimensionality reduction, the low-dimensional embeddings are stored in the
+reducedDim
slot.
## List of length 2
+## names(2): UMAP TSNE
+
+## UMAP1 UMAP2
+## Patient1_001_1 -4.810167 -3.777362
+## Patient1_001_2 -4.397347 -3.456036
+## Patient1_001_3 -4.369883 -3.445561
+## Patient1_001_4 -4.081614 -3.162119
+## Patient1_001_5 -6.234012 -2.433976
+## Patient1_001_6 -5.666597 -3.428058
+Visualization of the low-dimensional embedding facilitates assessment of
+potential “batch effects”. The dittoDimPlot
+function allows flexible visualization. It returns ggplot
objects which
+can be further modified.
library(patchwork)
+
+# visualize patient id
+p1 <- dittoDimPlot(spe, var = "patient_id", reduction.use = "UMAP", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on UMAP")
+p2 <- dittoDimPlot(spe, var = "patient_id", reduction.use = "TSNE", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$patient_id) +
+ ggtitle("Patient ID on TSNE")
+
+# visualize region of interest id
+p3 <- dittoDimPlot(spe, var = "ROI", reduction.use = "UMAP", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$ROI) +
+ ggtitle("ROI ID on UMAP")
+p4 <- dittoDimPlot(spe, var = "ROI", reduction.use = "TSNE", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$ROI) +
+ ggtitle("ROI ID on TSNE")
+
+# visualize indication
+p5 <- dittoDimPlot(spe, var = "indication", reduction.use = "UMAP", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$indication) +
+ ggtitle("Indication on UMAP")
+p6 <- dittoDimPlot(spe, var = "indication", reduction.use = "TSNE", size = 0.2) +
+ scale_color_manual(values = metadata(spe)$color_vectors$indication) +
+ ggtitle("Indication on TSNE")
+
+(p1 + p2) / (p3 + p4) / (p5 + p6)
# visualize marker expression
+p1 <- dittoDimPlot(spe, var = "Ecad", reduction.use = "UMAP",
+ assay = "exprs", size = 0.2) +
+ scale_color_viridis(name = "Ecad") +
+ ggtitle("E-Cadherin expression on UMAP")
+p2 <- dittoDimPlot(spe, var = "CD45RO", reduction.use = "UMAP",
+ assay = "exprs", size = 0.2) +
+ scale_color_viridis(name = "CD45RO") +
+ ggtitle("CD45RO expression on UMAP")
+p3 <- dittoDimPlot(spe, var = "Ecad", reduction.use = "TSNE",
+ assay = "exprs", size = 0.2) +
+ scale_color_viridis(name = "Ecad") +
+ ggtitle("Ecad expression on TSNE")
+p4 <- dittoDimPlot(spe, var = "CD45RO", reduction.use = "TSNE",
+ assay = "exprs", size = 0.2) +
+ scale_color_viridis(name = "CD45RO") +
+ ggtitle("CD45RO expression on TSNE")
+
+(p1 + p2) / (p3 + p4)
We observe a strong separation of tumor cells (Ecad+ cells) between the +patients. Here, each patient was diagnosed with a different tumor type. The +separation of tumor cells could be of biological origin since tumor cells tend +to display differences in expression between patients and cancer types and/or of +technical origin: the panel only contains a single tumor marker (E-Cadherin) and +therefore slight technical differences in staining causes visible separation +between cells of different patients. Nevertheless, the immune compartment +(CD45RO+ cells) mix between patients and we can rule out systematic staining +differences between patients.
+The modified SpatialExperiment
object is saved for further downstream analysis.
## R version 4.3.2 (2023-10-31)
+## Platform: x86_64-pc-linux-gnu (64-bit)
+## Running under: Ubuntu 22.04.3 LTS
+##
+## Matrix products: default
+## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
+## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
+##
+## locale:
+## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
+## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
+## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
+## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
+## [9] LC_ADDRESS=C LC_TELEPHONE=C
+## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
+##
+## time zone: Etc/UTC
+## tzcode source: system (glibc)
+##
+## attached base packages:
+## [1] stats4 stats graphics grDevices utils datasets methods
+## [8] base
+##
+## other attached packages:
+## [1] testthat_3.2.0 patchwork_1.1.3
+## [3] scater_1.30.0 mclust_6.0.0
+## [5] scuttle_1.12.0 ggrepel_0.9.4
+## [7] lubridate_1.9.3 forcats_1.0.0
+## [9] stringr_1.5.0 dplyr_1.1.3
+## [11] purrr_1.0.2 readr_2.1.4
+## [13] tidyr_1.3.0 tibble_3.2.1
+## [15] tidyverse_2.0.0 SpatialExperiment_1.12.0
+## [17] viridis_0.6.4 viridisLite_0.4.2
+## [19] dittoSeq_1.14.0 ggplot2_3.4.4
+## [21] cytoviewer_1.2.0 cytomapper_1.14.0
+## [23] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
+## [25] Biobase_2.62.0 GenomicRanges_1.54.1
+## [27] GenomeInfoDb_1.38.0 IRanges_2.36.0
+## [29] S4Vectors_0.40.1 BiocGenerics_0.48.0
+## [31] MatrixGenerics_1.14.0 matrixStats_1.0.0
+## [33] EBImage_4.44.0
+##
+## loaded via a namespace (and not attached):
+## [1] RColorBrewer_1.1-3 rstudioapi_0.15.0
+## [3] jsonlite_1.8.7 magrittr_2.0.3
+## [5] ggbeeswarm_0.7.2 magick_2.8.1
+## [7] farver_2.1.1 rmarkdown_2.25
+## [9] zlibbioc_1.48.0 vctrs_0.6.4
+## [11] memoise_2.0.1 DelayedMatrixStats_1.24.0
+## [13] RCurl_1.98-1.12 terra_1.7-55
+## [15] svgPanZoom_0.3.4 htmltools_0.5.6.1
+## [17] S4Arrays_1.2.0 BiocNeighbors_1.20.0
+## [19] raster_3.6-26 Rhdf5lib_1.24.0
+## [21] SparseArray_1.2.0 rhdf5_2.46.0
+## [23] sass_0.4.7 bslib_0.5.1
+## [25] desc_1.4.2 htmlwidgets_1.6.2
+## [27] fontawesome_0.5.2 cachem_1.0.8
+## [29] mime_0.12 lifecycle_1.0.3
+## [31] pkgconfig_2.0.3 rsvd_1.0.5
+## [33] colourpicker_1.3.0 Matrix_1.6-1.1
+## [35] R6_2.5.1 fastmap_1.1.1
+## [37] GenomeInfoDbData_1.2.11 shiny_1.7.5.1
+## [39] digest_0.6.33 colorspace_2.1-0
+## [41] shinycssloaders_1.0.0 rprojroot_2.0.3
+## [43] irlba_2.3.5.1 pkgload_1.3.3
+## [45] beachmat_2.18.0 labeling_0.4.3
+## [47] fansi_1.0.5 nnls_1.5
+## [49] timechange_0.2.0 abind_1.4-5
+## [51] compiler_4.3.2 withr_2.5.1
+## [53] tiff_0.1-11 BiocParallel_1.36.0
+## [55] HDF5Array_1.30.0 DelayedArray_0.28.0
+## [57] rjson_0.2.21 tools_4.3.2
+## [59] vipor_0.4.5 beeswarm_0.4.0
+## [61] httpuv_1.6.12 glue_1.6.2
+## [63] rhdf5filters_1.14.0 promises_1.2.1
+## [65] grid_4.3.2 Rtsne_0.16
+## [67] generics_0.1.3 gtable_0.3.4
+## [69] tzdb_0.4.0 hms_1.1.3
+## [71] ScaledMatrix_1.10.0 BiocSingular_1.18.0
+## [73] sp_2.1-1 utf8_1.2.4
+## [75] XVector_0.42.0 RcppAnnoy_0.0.21
+## [77] pillar_1.9.0 later_1.3.1
+## [79] lattice_0.21-9 tidyselect_1.2.0
+## [81] locfit_1.5-9.8 miniUI_0.1.1.1
+## [83] knitr_1.44 gridExtra_2.3
+## [85] bookdown_0.36 svglite_2.1.2
+## [87] xfun_0.40 shinydashboard_0.7.2
+## [89] brio_1.1.3 pheatmap_1.0.12
+## [91] stringi_1.7.12 fftwtools_0.9-11
+## [93] yaml_2.3.7 evaluate_0.22
+## [95] codetools_0.2-19 archive_1.1.6
+## [97] BiocManager_1.30.22 cli_3.6.1
+## [99] uwot_0.1.16 xtable_1.8-4
+## [101] systemfonts_1.0.5 munsell_0.5.0
+## [103] jquerylib_0.1.4 Rcpp_1.0.11
+## [105] png_0.1-8 parallel_4.3.2
+## [107] ellipsis_0.3.2 jpeg_0.1-10
+## [109] sparseMatrixStats_1.14.0 bitops_1.0-7
+## [111] scales_1.2.1 ggridges_0.5.4
+## [113] crayon_1.5.2 BiocStyle_2.30.0
+## [115] rlang_1.1.1 cowplot_1.1.1
+The following section describes how to visualize the abundance of biomolecules +(e.g., protein or RNA) as well as cell-specific metadata on images. Section +11.1 focuses on visualizing pixel-level information +including the generation of pseudo-color composite images. Section +11.2 highlights the visualization of cell metadata (e.g., +cell phenotype) as well as summarized pixel intensities on cell segmentation +masks.
+The +cytomapper +R/Bioconductor package was developed to support the handling and visualization +of multiple multi-channel images and segmentation masks (Eling et al. 2020). The main +data object for image handling is the +CytoImageList +container which we used in Section 5 to store multi-channel +images and segmentation masks.
+We will first read in the previously processed data and randomly select 3 images +for visualization purposes.
+library(SpatialExperiment)
+library(cytomapper)
+spe <- readRDS("data/spe.rds")
+images <- readRDS("data/images.rds")
+masks <- readRDS("data/masks.rds")
+
+# Sample images
+set.seed(220517)
+cur_id <- sample(unique(spe$sample_id), 3)
+cur_images <- images[names(images) %in% cur_id]
+cur_masks <- masks[names(masks) %in% cur_id]
The following section gives examples for visualizing individual channels or
+multiple channels as pseudo-color composite images. For this the cytomapper
+package exports the plotPixels
function which expects a CytoImageList
object
+storing one or multiple multi-channel images. In the simplest use case, a
+single channel can be visualized as follows:
The plot above shows the tissue expression of the epithelial tumor marker
+E-cadherin on the 3 selected images. The bcg
parameter (default c(0, 1, 1)
)
+stands for “background”, “contrast”, “gamma” and controls these attributes of
+the image. This parameter takes a named list where each entry specifies these
+attributes per channel. The first value of the numeric vector will be added to
+the pixel intensities (background); pixel intensities will be multiplied by the
+second entry of the vector (contrast); pixel intensities will be exponentiated
+by the third entry of the vector (gamma). In most cases, it is sufficient to
+adjust the second (contrast) entry of the vector.
The following example highlights the visualization of 6 markers (maximum allowed +number of markers) at once per image. The markers indicate the spatial +distribution of tumor cells (E-cadherin), T cells (CD3), B cells (CD20), CD8+ T +cells (CD8a), plasma cells (CD38) and proliferating cells (Ki67).
+plotPixels(cur_images,
+ colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"),
+ bcg = list(Ecad = c(0, 5, 1),
+ CD3 = c(0, 5, 1),
+ CD20 = c(0, 5, 1),
+ CD8a = c(0, 5, 1),
+ CD38 = c(0, 8, 1),
+ Ki67 = c(0, 5, 1)))
The default colors for visualization are chosen by the additive RGB (red, green,
+blue) color model. For six markers the default colors are: red, green, blue,
+cyan (green + blue), magenta (red + blue), yellow (green + red). These colors
+are the easiest to distinguish by eye. However, you can select other colors for
+each channel by setting the colour
parameter:
plotPixels(cur_images,
+ colour_by = c("Ecad", "CD3", "CD20"),
+ bcg = list(Ecad = c(0, 5, 1),
+ CD3 = c(0, 5, 1),
+ CD20 = c(0, 5, 1)),
+ colour = list(Ecad = c("black", "burlywood1"),
+ CD3 = c("black", "cyan2"),
+ CD20 = c("black", "firebrick1")))
The colour
parameter takes a named list in which each entry specifies the
+colors from which a color gradient is constructed via colorRampPalette
. These
+are usually vectors of length 2 in which the first entry is "black"
and the
+second entry specifies the color of choice. Although not recommended, you can
+also specify more than two colors to generate a more complex color gradient.
As an alternative to setting the bcg
parameter, images can first be
+normalized. Normalization here means to scale the pixel intensities per channel
+between 0 and 1 (or a range specified by the ft
parameter in the normalize
+function). By default, the normalize
function scales pixel intensities across
+all images contained in the CytoImageList
object (separateImages = FALSE
).
+Each individual channel is scaled independently (separateChannels = TRUE
).
After 0-1 normalization, maximum pixel intensities can be clipped to enhance the
+contrast of the image (setting the inputRange
parameter). In the following
+example, the clipping to 0 and 0.2 is the same as multiplying the pixel
+intensities by a factor of 5.
# 0 - 1 channel scaling across all images
+norm_images <- cytomapper::normalize(cur_images)
+
+# Clip channel at 0.2
+norm_images <- cytomapper::normalize(norm_images, inputRange = c(0, 0.2))
+
+plotPixels(norm_images,
+ colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))
The default setting of scaling pixel intensities across all images ensures +comparable intensity levels across images. Pixel intensities can also be +scaled per image therefore correcting for staining/expression differences +between images:
+# 0 - 1 channel scaling per image
+norm_images <- cytomapper::normalize(cur_images, separateImages = TRUE)
+
+# Clip channel at 0.2
+norm_images <- cytomapper::normalize(norm_images, inputRange = c(0, 0.2))
+
+plotPixels(norm_images,
+ colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))
As we can see, the marker Ki67 appears brighter on image 2 and 3 in comparison +to scaling the channel across all images.
+Finally, the normalize
function also accepts a named list input for the
+inputRange
argument. In this list, the clipping range per channel can be set
+individually:
# 0 - 1 channel scaling per image
+norm_images <- cytomapper::normalize(cur_images,
+ separateImages = TRUE,
+ inputRange = list(Ecad = c(0, 50),
+ CD3 = c(0, 30),
+ CD20 = c(0, 40),
+ CD8a = c(0, 50),
+ CD38 = c(0, 10),
+ Ki67 = c(0, 70)))
+
+plotPixels(norm_images,
+ colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"))
In the following section, we will show examples on how to visualize single +cells either as segmentation masks or outlined on composite images. This type +of visualization allows to observe the spatial distribution of cell phenotypes, +the visual assessment of morphological features and quality control in terms +of cell segmentation and phenotyping.
+The cytomapper
package provides the plotCells
function that accepts a
+CytoImageList
object containing segmentation masks. These are defined as
+single channel images where sets of pixels with the same integer ID identify
+individual cells. This integer ID can be found as an entry in the colData(spe)
+slot and as pixel information in the segmentation masks. The entry in
+colData(spe)
needs to be specified via the cell_id
argument to the
+plotCells
function. In that way, data contained in the SpatialExperiment
+object can be mapped to segmentation masks. For the current dataset, the cell
+IDs are stored in colData(spe)$ObjectNumber
.
As cell IDs are only unique within a single image, plotCells
also requires
+the img_id
argument. This argument specifies the colData(spe)
as well as the
+mcols(masks)
entry that stores the unique image name from which each cell was
+extracted. In the current dataset the unique image names are stored in
+colData(spe)$sample_id
and mcols(masks)$sample_id
.
Providing these two entries that allow mapping between the SpatialExperiment
+object and segmentation masks, we can now color individual cells based on their
+cell type:
plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = "celltype")
For consistent visualization, the plotCells
function takes a named list as
+color
argument. The entry name must match the colour_by
argument.
plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = "celltype",
+ colour = list(celltype = metadata(spe)$color_vectors$celltype))
If only individual cell types should be visualized, the SpatialExperiment
+object can be subsetted (e.g., to only contain CD8+ T cells). In the following
+example CD8+ T cells are colored in red and all other cells that are not
+contained in the dataset are colored in white (as set by the missing_color
+argument).
CD8 <- spe[,spe$celltype == "CD8"]
+
+plotCells(cur_masks,
+ object = CD8,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = "celltype",
+ colour = list(celltype = c(CD8 = "red")),
+ missing_colour = "white")
In terms of visualizing metadata, any entry in the colData(spe)
slot can be
+visualized. The plotCells
function automatically detects if the entry
+is continuous or discrete. In this fashion, we can now visualize the area of each
+cell:
Similar to visualizing single-cell metadata on segmentation masks, we can
+use the plotCells
function to visualize the aggregated pixel intensities
+per cell. In the current dataset pixel intensities were aggregated by computing
+the mean pixel intensity per cell and per channel. The plotCells
function
+accepts the exprs_values
argument (default counts
) that allows selecting
+the assay which stores the expression values that should be visualized.
In the following example, we visualize the asinh-transformed mean pixel +intensities of the epithelial marker E-cadherin on segmentation masks.
+plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = "Ecad",
+ exprs_values = "exprs")
We will now visualize the maximum number of +allowed markers as composites on the segmentation masks. As above the markers +indicate the spatial distribution of tumor cells (E-cadherin), T cells (CD3), B +cells (CD20), CD8+ T cells (CD8a), plasma cells (CD38) and proliferating cells +(Ki67).
+plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"),
+ exprs_values = "exprs")
While visualizing 6 markers on the pixel-level may still allow the distinction +of different tissue structures, observing single-cell expression levels is +difficult when visualizing many markers simultaneously due to often overlapping +expression.
+Similarly to adjusting marker colors when visualizing pixel intensities, we
+can change the color gradients per marker by setting the color
argument:
The following section highlights the combined visualization of pixel- and
+cell-level information at once. For this, besides the SpatialExperiment
object,
+the plotPixels
function accepts two CytoImageList
objects. One for the
+multi-channel images and one for the segmentation masks. By specifying the
+outline_by
parameter, the outlines of cells can now be colored based on their
+metadata.
The following example first generates a 3-channel composite images displaying +the expression of E-cadherin, CD3 and CD20 before coloring the cells’ outlines +by their cell phenotype.
+plotPixels(image = cur_images,
+ mask = cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = c("Ecad", "CD3", "CD20"),
+ outline_by = "celltype",
+ bcg = list(Ecad = c(0, 5, 1),
+ CD3 = c(0, 5, 1),
+ CD20 = c(0, 5, 1)),
+ colour = list(celltype = metadata(spe)$color_vectors$celltype),
+ thick = TRUE)
Distinguishing individual cell phenotypes is nearly impossible in the images +above.
+However, the SpatialExperiment
object can be subsetted to only contain cells
+of a single or few phenotypes. This allows the selective visualization of cell
+outlines on composite images.
Here, we select all CD8+ T cells from the dataset and outline them on a 2-channel +composite image displaying the expression of CD3 and CD8a.
+CD8 <- spe[,spe$celltype == "CD8"]
+
+plotPixels(image = cur_images,
+ mask = cur_masks,
+ object = CD8,
+ cell_id = "ObjectNumber", img_id = "sample_id",
+ colour_by = c("CD3", "CD8a"),
+ outline_by = "celltype",
+ bcg = list(CD3 = c(0, 5, 1),
+ CD8a = c(0, 5, 1)),
+ colour = list(celltype = c("CD8" = "white")),
+ thick = TRUE)
This type of visualization allows the quality control of two things: 1. +segmentation quality of individual cell types can be checked and 2. cell +phenotyping accuracy can be visually assessed against expected marker expression.
+The cytomapper
package provides a number of function arguments to adjust the
+visual appearance of figures that are shared between the plotPixels
and
+plotCells
function.
For a full overview of the arguments please refer to ?plotting-param
.
We use the following example to highlight how to adjust the scale bar, the image +title, the legend appearance and the margin between images.
+plotPixels(cur_images,
+ colour_by = c("Ecad", "CD3", "CD20", "CD8a", "CD38", "Ki67"),
+ bcg = list(Ecad = c(0, 5, 1),
+ CD3 = c(0, 5, 1),
+ CD20 = c(0, 5, 1),
+ CD8a = c(0, 5, 1),
+ CD38 = c(0, 8, 1),
+ Ki67 = c(0, 5, 1)),
+ scale_bar = list(length = 100,
+ label = expression("100 " ~ mu * "m"),
+ cex = 0.7,
+ lwidth = 10,
+ colour = "grey",
+ position = "bottomleft",
+ margin = c(5,5),
+ frame = 3),
+ image_title = list(text = mcols(cur_images)$indication,
+ position = "topright",
+ colour = "grey",
+ margin = c(5,5),
+ font = 2,
+ cex = 2),
+ legend = list(colour_by.title.cex = 0.7,
+ margin = 10),
+ margin = 40)
By default, all images are displayed on the same graphics device. This can be
+useful when saving all images at once (see next section) to zoom into the
+individual images instead of opening each image individually. However, when
+displaying images in a markdown document these are more accessible when
+visualized individually. For this, the plotPixels
and plotCells
function
+accepts the display
parameter that when set to "single"
displays each
+resulting image in its own graphics device:
The final section addresses how to save composite images and how to return them +for integration with other plots.
+The plotPixels
and plotCells
functions accept the save_plot
argument which
+takes a named list of the following entries: filename
indicates the location
+and file type of the image saved to disk; scale
adjusts the resolution of the
+saved image (this only needs to be adjusted for small images).
plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = "celltype",
+ colour = list(celltype = metadata(spe)$color_vectors$celltype),
+ save_plot = list(filename = "data/celltype_image.png"))
The composite images (together with their annotation) can also be returned. In
+the following code chunk we save two example plots to variables (out1
and
+out2
).
out1 <- plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = "celltype",
+ colour = list(celltype = metadata(spe)$color_vectors$celltype),
+ return_plot = TRUE)
out2 <- plotCells(cur_masks,
+ object = spe,
+ cell_id = "ObjectNumber",
+ img_id = "sample_id",
+ colour_by = c("Ecad", "CD3", "CD20"),
+ exprs_values = "exprs",
+ return_plot = TRUE)
The composite images are stored in out1$plot
and out2$plot
and can be
+converted into a graph object recognized by the
+cowplot
+package.
The final function call of the following chunk plots both object next to each +other.
+ + +The +cytoviewer +package allows the interactive visualization of multi-channel images and +segmentation masks. It also allows to map cellular metadata onto segmentation +masks and outlining of cells on composite images. For a full introduction to the +package, please refer to the vignette.
+ +## R version 4.3.2 (2023-10-31)
+## Platform: x86_64-pc-linux-gnu (64-bit)
+## Running under: Ubuntu 22.04.3 LTS
+##
+## Matrix products: default
+## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
+## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
+##
+## locale:
+## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
+## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
+## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
+## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
+## [9] LC_ADDRESS=C LC_TELEPHONE=C
+## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
+##
+## time zone: Etc/UTC
+## tzcode source: system (glibc)
+##
+## attached base packages:
+## [1] grid stats4 stats graphics grDevices utils datasets
+## [8] methods base
+##
+## other attached packages:
+## [1] cytoviewer_1.2.0 gridGraphics_0.5-1
+## [3] cowplot_1.1.1 cytomapper_1.14.0
+## [5] EBImage_4.44.0 SpatialExperiment_1.12.0
+## [7] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
+## [9] Biobase_2.62.0 GenomicRanges_1.54.1
+## [11] GenomeInfoDb_1.38.0 IRanges_2.36.0
+## [13] S4Vectors_0.40.1 BiocGenerics_0.48.0
+## [15] MatrixGenerics_1.14.0 matrixStats_1.0.0
+##
+## loaded via a namespace (and not attached):
+## [1] splines_4.3.2 later_1.3.1
+## [3] bitops_1.0-7 tibble_3.2.1
+## [5] svgPanZoom_0.3.4 polyclip_1.10-6
+## [7] XML_3.99-0.14 lifecycle_1.0.3
+## [9] rstatix_0.7.2 doParallel_1.0.17
+## [11] lattice_0.21-9 MASS_7.3-60
+## [13] backports_1.4.1 magrittr_2.0.3
+## [15] sass_0.4.7 rmarkdown_2.25
+## [17] jquerylib_0.1.4 yaml_2.3.7
+## [19] plotrix_3.8-2 httpuv_1.6.12
+## [21] sp_2.1-1 RColorBrewer_1.1-3
+## [23] ConsensusClusterPlus_1.66.0 multcomp_1.4-25
+## [25] abind_1.4-5 zlibbioc_1.48.0
+## [27] Rtsne_0.16 purrr_1.0.2
+## [29] RCurl_1.98-1.12 TH.data_1.1-2
+## [31] tweenr_2.0.2 sandwich_3.0-2
+## [33] circlize_0.4.15 GenomeInfoDbData_1.2.11
+## [35] ggrepel_0.9.4 irlba_2.3.5.1
+## [37] CATALYST_1.26.0 terra_1.7-55
+## [39] svglite_2.1.2 DelayedMatrixStats_1.24.0
+## [41] codetools_0.2-19 DelayedArray_0.28.0
+## [43] scuttle_1.12.0 ggforce_0.4.1
+## [45] tidyselect_1.2.0 shape_1.4.6
+## [47] raster_3.6-26 farver_2.1.1
+## [49] ScaledMatrix_1.10.0 viridis_0.6.4
+## [51] jsonlite_1.8.7 BiocNeighbors_1.20.0
+## [53] GetoptLong_1.0.5 ellipsis_0.3.2
+## [55] ggridges_0.5.4 survival_3.5-7
+## [57] scater_1.30.0 iterators_1.0.14
+## [59] systemfonts_1.0.5 foreach_1.5.2
+## [61] tools_4.3.2 ggnewscale_0.4.9
+## [63] Rcpp_1.0.11 glue_1.6.2
+## [65] gridExtra_2.3 SparseArray_1.2.0
+## [67] xfun_0.40 dplyr_1.1.3
+## [69] HDF5Array_1.30.0 shinydashboard_0.7.2
+## [71] withr_2.5.1 fastmap_1.1.1
+## [73] rhdf5filters_1.14.0 fansi_1.0.5
+## [75] rsvd_1.0.5 digest_0.6.33
+## [77] R6_2.5.1 mime_0.12
+## [79] colorspace_2.1-0 gtools_3.9.4
+## [81] jpeg_0.1-10 utf8_1.2.4
+## [83] tidyr_1.3.0 generics_0.1.3
+## [85] data.table_1.14.8 htmlwidgets_1.6.2
+## [87] S4Arrays_1.2.0 pkgconfig_2.0.3
+## [89] gtable_0.3.4 ComplexHeatmap_2.18.0
+## [91] RProtoBufLib_2.14.0 XVector_0.42.0
+## [93] htmltools_0.5.6.1 carData_3.0-5
+## [95] bookdown_0.36 fftwtools_0.9-11
+## [97] clue_0.3-65 scales_1.2.1
+## [99] png_0.1-8 colorRamps_2.3.1
+## [101] knitr_1.44 rstudioapi_0.15.0
+## [103] reshape2_1.4.4 rjson_0.2.21
+## [105] cachem_1.0.8 zoo_1.8-12
+## [107] rhdf5_2.46.0 GlobalOptions_0.1.2
+## [109] stringr_1.5.0 shinycssloaders_1.0.0
+## [111] miniUI_0.1.1.1 parallel_4.3.2
+## [113] vipor_0.4.5 pillar_1.9.0
+## [115] vctrs_0.6.4 promises_1.2.1
+## [117] ggpubr_0.6.0 BiocSingular_1.18.0
+## [119] car_3.1-2 cytolib_2.14.0
+## [121] beachmat_2.18.0 xtable_1.8-4
+## [123] cluster_2.1.4 archive_1.1.6
+## [125] beeswarm_0.4.0 evaluate_0.22
+## [127] magick_2.8.1 mvtnorm_1.2-3
+## [129] cli_3.6.1 locfit_1.5-9.8
+## [131] compiler_4.3.2 rlang_1.1.1
+## [133] crayon_1.5.2 ggsignif_0.6.4
+## [135] FlowSOM_2.10.0 flowCore_2.14.0
+## [137] plyr_1.8.9 ggbeeswarm_0.7.2
+## [139] stringi_1.7.12 viridisLite_0.4.2
+## [141] BiocParallel_1.36.0 nnls_1.5
+## [143] munsell_0.5.0 tiff_0.1-11
+## [145] colourpicker_1.3.0 Matrix_1.6-1.1
+## [147] sparseMatrixStats_1.14.0 ggplot2_3.4.4
+## [149] Rhdf5lib_1.24.0 shiny_1.7.5.1
+## [151] fontawesome_0.5.2 drc_3.0-1
+## [153] memoise_2.0.1 igraph_1.5.1
+## [155] broom_1.0.5 bslib_0.5.1
+Compiled: 2023-11-28
+This workflow highlights the use of common R/Bioconductor packages +to analyze single-cell data obtained from segmented multi-channel images. We will not perform multi-channel image processing and segmentation in R +but rather link to available approaches in Section 3. While we +use imaging mass cytometry (IMC) data as an example, the concepts presented here can be applied to images +obtained by other highly-multiplexed imaging technologies (e.g. CODEX, MIBI, +mIF, etc.).
+We will give an introduction to IMC in Section 2 and highlight +strategies to extract single-cell data from multi-channel images in Section +3.
+Reproducible code written in R is available from Section 4 +onwards and the workflow can be largely divided into the following parts:
+Multi-channel image and spatial, single-cell analysis is complex and we +highlight an example workflow here. However, this workflow is not complete and +does not cover all possible aspects of exploratory data analysis. Instead, we +demonstrate this workflow as a solid basis that supports other aspects of data +analysis. It offers interoperability with other packages for single-cell and +spatial analysis and the user will need to become familiar with the general +framework to efficiently analyse data obtained from multiplexed imaging +technologies.
+This workflow has been actively developed until December 2023. At that time
+we used the most recent (v.0.16.0
) version of steinbock
to process the
+example data. If you are having issues when using newer versions of steinbock
+please open an issue here.
We provide the workflow as an open-source resource. It does not mean that +this workflow is tested on all possible datasets or biological questions and +there exist multiple ways of analysing data. It is therefore recommended to +check the results and question their biological interpretation.
+If you notice an issue or missing information, please report an issue +here. We also +welcome contributions in form of pull requests or feature requests in form of +issues. Have a look at the source code at:
+ +Nils Eling
+Vito Zanotelli
+Daniel Schulz
+Jonas Windhager
+Michelle Daniel
+Lasse Meyer
The workflow has been published in
+https://www.nature.com/articles/s41596-023-00881-0
+which you can cite as follows:
Windhager, J., Zanotelli, V.R.T., Schulz, D. et al. An end-to-end workflow for multiplexed image processing and analysis.
+ Nat Protoc (2023).
+Version 1.0.0 [2023-06-30]
+Version 1.0.1 [2023-10-19]
+predict
call after training a classifierVersion 1.0.2 [2023-11-27]
+* nils.eling@uzh.ch
+1: Department for Quantitative Biomedicine, University of Zurich
+2: Institute for Molecular Health Sciences, ETH Zurich
Version 1.0.0 [2023-06-30]
+Version 1.0.1 [2023-10-19]
+predict
call after training a classifierVersion 1.0.2 [2023-11-27]
+Highly multiplexed imaging (HMI) enables the simultaneous detection of dozens of +biological molecules (e.g., proteins, transcripts; also referred to as +“markers”) in tissues. Recently established multiplexed tissue imaging +technologies rely on cyclic staining with fluorescently-tagged antibodies +(Lin et al. 2018; Gut, Herrmann, and Pelkmans 2018), or the use of oligonucleotide-tagged (Goltsev et al. 2018; Saka et al. 2019) or metal-tagged (Giesen et al. 2014; Angelo et al. 2014) antibodies, among others. +The key strength of these technologies is that they allow in-depth analysis of +single cells within their spatial tissue context. As a result, these methods +have enabled analysis of the spatial architecture of the tumor microenvironment +(Lin et al. 2018; Jackson et al. 2020; Ali et al. 2020; Schürch et al. 2020), determination of nucleic acid +and protein abundances for assessment of spatial co-localization of cell types +and chemokines (Hoch et al. 2022) and spatial niches of virus infected cells (Jiang et al. 2022), +and characterization of pathological features during COVID-19 infection +(Rendeiro et al. 2021; Mitamura et al. 2021), Type 1 diabetes progression (Damond et al. 2019) and +autoimmune disease (Ferrian et al. 2021).
+Imaging mass cytometry (IMC) utilizes metal-tagged antibodies to detect over 40 +proteins and other metal-tagged molecules in biological samples. IMC can be used +to perform highly multiplexed imaging and is particularly suited to profiling +selected areas of tissues across many samples.
++Overview of imaging mass cytometry data acquisition. Taken from (Giesen et al. 2014)
+IMC has first been published in 2014 (Giesen et al. 2014) and has been commercialized by +Standard BioToolsTM to be distributed as the Hyperion Imaging +SystemTM (documentation is available +here). +Similar to other HMI technologies such as MIBI (Angelo et al. 2014), CyCIF (Lin et al. 2018), +4i (Gut, Herrmann, and Pelkmans 2018), CODEX (Goltsev et al. 2018) and SABER (Saka et al. 2019), IMC captures the spatial +expression of multiple proteins in parallel. With a nominal 1 μm resolution, +IMC is able to detect cytoplasmic and nuclear localization of proteins. The +current ablation frequency of IMC is 200Hz, meaning that a 1 mm\(^2\) area +can be imaged within about 2 hours.
+Technical aspects of how data acquisition works can be found in the original +publication (Giesen et al. 2014). Briefly, antibodies to detect targets in biological +material are labeled with heavy metals (e.g., lanthanides) that do not occur in +biological systems and thus can be used upon binding to their target as a +readout similar to fluorophores in fluorescence microscopy. Thin sections of the +biological sample on a glass slide are stained with an antibody cocktail. +Stained microscopy slides are mounted on a precise motor-driven stage inside the +ablation chamber of the IMC instrument. A high-energy UV laser is focused on the +tissue, and each individual laser shot ablates tissue from an area of roughly 1 +μm\(^2\). The energy of the laser is absorbed by the tissue resulting +in vaporization followed by condensation of the ablated material. The ablated +material from each laser shot is transported in the gas phase into the plasma of +the mass cytometer, where first atomization of the particles and then ionization +of the atoms occurs. The ion cloud is then transferred into a vacuum, and all +ions below a mass of 80 m/z are filtered using a quadrupole mass filter. The +remaining ions (mostly those used to tag antibodies) are analyzed in a +time-of-flight mass spectrometer to ultimately obtain an accumulated mass +spectrum from all ions that correspond to a single laser shot. One can regard +this spectrum as the information underlying a 1 μm\(^2\) pixel. With +repetitive laser shots (e.g., at 200 Hz) and a simultaneous lateral sample +movement, a tissue can be ablated pixel by pixel. Ultimately an image is +reconstructed from each pixel mass spectrum.
+In principle, IMC can be applied to the same type of samples as conventional +fluorescence microscopy. The largest distinction from fluorescence microscopy is +that for IMC, primary-labeled antibodies are commonly used, whereas in +fluorescence microscopy secondary antibodies carrying fluorophores are widely +applied. Additionally, for IMC, samples are dried before acquisition and can be +stored for years. Formalin-fixed and paraffin-embedded (FFPE) samples are widely +used for IMC. The FFPE blocks are cut to 2-5 μm thick sections and are +stained, dried, and analyzed with IMC.
+Metal-labeled antibodies are used to stain molecules in tissues enabling to +delineate tissue structures, cells, and subcellular structures. Metal-conjugated +antibodies can either be purchased directly from Standard BioToolsTM (MaxPar IMC Antibodies), +or antibodies can be purchased and labeled individually (MaxPar Antibody +Labeling). +Antibody labeling using the MaxPar kits is performed via TCEP antibody reduction +followed by crosslinking with sulfhydryl-reactive maleimide-bearing metal +polymers. For each antibody it is essential to validate its functionality, +specificity and optimize its usage to provide optimal signal to noise. To +facilitate antibody handling, a database is highly useful. +Airlab is such a platform; it +allows antibody lot tracking, validation data uploads, and panel generation for +subsequent upload to the IMC acquisition software from Standard BioToolsTM
+Depending on the sample type, different staining protocols can be used. +Generally, once antibodies of choice have been conjugated to a metal tag, +titration experiments are performed to identify the optimal staining +concentration. For FFPE samples, different staining protocols have been +described, and different antibodies show variable staining with different +protocols. Protocols such as the one provided by Standard BioToolsTM or the one describe by +(Ijsselsteijn et al. 2019) are recommended. Briefly, for FFPE tissues, a dewaxing +step is performed to remove the paraffin used to embed the material, followed by +a graded re-hydration of the samples. Thereafter, heat-induced epitope retrieval +(HIER), a step aiming at the reversal of formalin-based fixation, is used to +unmask epitopes within tissues and make them accessible to antibodies. Epitope +unmasking is generally performed in either basic, EDTA-based buffers (pH 9.2) or +acidic, citrate-based buffers (pH 6). Next, a buffer containing bovine serum +albumin (BSA) is used to block non-specific binding. This buffer is also used to +dilute antibody stocks for the actual antibody staining. Staining time and +temperature may vary and optimization must be performed to ensure that each +single antibody performs well. However, overnight staining at 4°C or 3-5 +hours at room temperature seem to be suitable in many cases.
+Following antibody incubation, unbound antibodies are washed away and a +counterstain comparable to DAPI is applied to enable the identification of +nuclei. The Iridium intercalator +from Standard BioToolsTM is a reagent of choice and applied in a brief 5 minute staining. +Finally, the samples are washed again and then dried under an airflow. Once +dried, the samples are ready for analysis using IMC and are +usually stable for a long period of time (at least one year).
+Data is acquired using the CyTOF software from Standard BioToolsTM (see manuals +here).
+The regions of interest are selected by providing coordinates for ablation. To +determine the region to be imaged, so called “panoramas” can be generated. These +are stitched images of single fields of views of about 200 μm in diameter. +Panoramas provide an optical overview of the tissue with a resolution similar to +10x in microscopy and are intended to help with the selection of regions of +interest for ablation. The tissue should be centered on the glass side, since +the imaging mass cytometer cannot access roughly 5 mm from each of the slide +edges. Currently, the instruments can process one slide at a time and usually one MCD +file per sample slide is generated.
+Many regions of interest can be defined on a single slide and acquisition +parameters such as channels to acquire, acquisition speed (100 Hz or 200 Hz), +ablation energy, and other parameters are user-defined. It is recommended that +all isotope channels are recorded. This will result in larger raw data files but valuable information such as +potential contamination of the argon gas (e.g., Xenon) or of the samples (e.g., +lead, barium) is stored.
+To process a large number of slides or to select regions on whole-slide samples, +panoramas may not provide sufficient information. If this is the case, +multi-color immunofluorescence of the same slide prior to staining with +metal-labeled antibodies may be performed. To allow for region selection based +on immunofluorescence images and to align those images with a panorama of the +same or consecutive sections of the sample, we developed +napping.
+Acquisition time is directly proportional to the total size of ablation, and run +times for samples of large area or for large sample numbers can roughly be calculated by +dividing the ablation area in square micrometer by the ablation speed (e.g., +200Hz). In addition to the proprietary MCD file format, TXT files can also +be generated for each region of interest. This is recommended as a back-up +option in case of errors that may corrupt MCD files but not TXT files.
+Upon completion of the acquisition an MCD file of variable size is generated. A +single MCD file can hold raw acquisition data for multiple regions of interest, +optical images providing a slide level overview of the sample (“panoramas”), and +detailed metadata about the experiment. Additionally, for each acquisition a +TXT file is generated which holds the same pixel information as the matched +acquisition in the MCD file.
+The Hyperion Imaging SystemTM produces files in the following folder structure:
+.
++-- {XYZ}_ROI_001_1.txt
++-- {XYZ}_ROI_002_2.txt
++-- {XYZ}_ROI_003_3.txt
++-- {XYZ}.mcd
+Here, {XYZ}
defines the filename, ROI_001
, ROI_002
, ROI_003
are
+user-defined names (descriptions) for the selected regions of interest (ROI),
+and 1
, 2
, 3
indicate the unique acquisition identifiers. The ROI
+description entry can be specified in the Standard BioTools software when
+selecting ROIs. The MCD file contains the raw imaging data and the full metadata
+of all acquired ROIs, while each TXT file contains data of a single ROI without
+metadata. To follow a consistent naming scheme and to bundle all metadata, we
+recommend to zip the folder. Each ZIP file should only contain data from a
+single MCD file, and the name of the ZIP file should match the name of the MCD
+file.
We refer to this data as raw data and the further +processing of this data is described in Section 3.
+ +