deploy: 6cef56e

BodenmillerGroup · Oct 19, 2023 · 03f0b77 · 03f0b77
commit 03f0b77
Show file tree

Hide file tree

Showing 195 changed files with 22,511 additions and 0 deletions.
diff --git a/.nojekyll b/.nojekyll
diff --git a/01-intro.md b/01-intro.md
@@ -0,0 +1,185 @@
+# Introduction {#intro}
+
+Highly multiplexed imaging (HMI) enables the simultaneous detection of dozens of
+biological molecules (e.g., proteins, transcripts; also referred to as
+“markers”) in tissues. Recently established multiplexed tissue imaging
+technologies rely on cyclic staining with fluorescently-tagged antibodies
+[@Lin2018; @Gut2018], or the use of oligonucleotide-tagged [@Goltsev2018;
+@Saka2019] or metal-tagged [@Giesen2014; @Angelo2014] antibodies, among others.
+The key strength of these technologies is that they allow in-depth analysis of
+single cells within their spatial tissue context. As a result, these methods
+have enabled analysis of the spatial architecture of the tumor microenvironment
+[@Lin2018; @Jackson2020; @Ali2020; @Schurch2020], determination of nucleic acid
+and protein abundances for assessment of spatial co-localization of cell types
+and chemokines [@Hoch2022] and spatial niches of virus infected cells [@Jiang2022],
+and characterization of pathological features during COVID-19 infection
+[@Rendeiro2021; @Mitamura2021], Type 1 diabetes progression [@Damond2019] and
+autoimmune disease [@Ferrian2021].
+
+Imaging mass cytometry (IMC) utilizes metal-tagged antibodies to detect over 40
+proteins and other metal-tagged molecules in biological samples. IMC can be used
+to perform highly multiplexed imaging and is particularly suited to profiling
+selected areas of tissues across many samples.
+
+![IMC_workflow](img/IMC_workflow.png)
+*Overview of imaging mass cytometry data acquisition. Taken from [@Giesen2014]*
+
+IMC has first been published in 2014 [@Giesen2014] and has been commercialized by
+Standard BioTools<sup><font size="1">TM</font></sup> to be distributed as the Hyperion Imaging
+System<sup><font size="1">TM</font></sup> (documentation is available
+[here](https://www.fluidigm.com/products-services/instruments/hyperion)).
+Similar to other HMI technologies such as MIBI [@Angelo2014], CyCIF [@Lin2018],
+4i [@Gut2018], CODEX [@Goltsev2018] and SABER [@Saka2019], IMC captures the spatial
+expression of multiple proteins in parallel. With a nominal 1 &mu;m resolution,
+IMC is able to detect cytoplasmic and nuclear localization of proteins. The
+current ablation frequency of IMC is 200Hz, meaning that a 1 mm$^2$ area
+can be imaged within about 2 hours.
+
+## Technical details of IMC
+
+Technical aspects of how data acquisition works can be found in the original
+publication [@Giesen2014]. Briefly, antibodies to detect targets in biological
+material are labeled with heavy metals (e.g., lanthanides) that do not occur in
+biological systems and thus can be used upon binding to their target as a
+readout similar to fluorophores in fluorescence microscopy. Thin sections of the
+biological sample on a glass slide are stained with an antibody cocktail.
+Stained microscopy slides are mounted on a precise motor-driven stage inside the
+ablation chamber of the IMC instrument. A high-energy UV laser is focused on the
+tissue, and each individual laser shot ablates tissue from an area of roughly 1
+&mu;m$^2$. The energy of the laser is absorbed by the tissue resulting
+in vaporization followed by condensation of the ablated material. The ablated
+material from each laser shot is transported in the gas phase into the plasma of
+the mass cytometer, where first atomization of the particles and then ionization
+of the atoms occurs. The ion cloud is then transferred into a vacuum, and all
+ions below a mass of 80 m/z are filtered using a quadrupole mass filter. The
+remaining ions (mostly those used to tag antibodies) are analyzed in a
+time-of-flight mass spectrometer to ultimately obtain an accumulated mass
+spectrum from all ions that correspond to a single laser shot. One can regard
+this spectrum as the information underlying a 1 &mu;m$^2$ pixel. With
+repetitive laser shots (e.g., at 200 Hz) and a simultaneous lateral sample
+movement, a tissue can be ablated pixel by pixel. Ultimately an image is
+reconstructed from each pixel mass spectrum.
+
+In principle, IMC can be applied to the same type of samples as conventional
+fluorescence microscopy. The largest distinction from fluorescence microscopy is
+that for IMC, primary-labeled antibodies are commonly used, whereas in
+fluorescence microscopy secondary antibodies carrying fluorophores are widely
+applied. Additionally, for IMC, samples are dried before acquisition and can be
+stored for years. Formalin-fixed and paraffin-embedded (FFPE) samples are widely
+used for IMC. The FFPE blocks are cut to 2-5 &mu;m thick sections and are
+stained, dried, and analyzed with IMC.
+
+### Metal-conjugated antobodies and staining
+
+Metal-labeled antibodies are used to stain molecules in tissues enabling to
+delineate tissue structures, cells, and subcellular structures. Metal-conjugated
+antibodies can either be purchased directly from Standard BioTools<sup><font size="1">TM</font></sup> ([MaxPar IMC Antibodies](https://store.fluidigm.com/Cytometry/ConsumablesandReagentsCytometry/MaxparAntibodies?cclcl=en_US)),
+or antibodies can be purchased and labeled individually ([MaxPar Antibody
+Labeling](https://store.fluidigm.com/Cytometry/ConsumablesandReagentsCytometry/MaxparAntibodyLabelingKits?cclcl=en_US)).
+Antibody labeling using the MaxPar kits is performed via TCEP antibody reduction
+followed by crosslinking with sulfhydryl-reactive maleimide-bearing metal
+polymers. For each antibody it is essential to validate its functionality,
+specificity and optimize its usage to provide optimal signal to noise. To
+facilitate antibody handling, a database is highly useful.
+[Airlab](https://github.com/BodenmillerGroup/airlab-web) is such a platform; it
+allows antibody lot tracking, validation data uploads, and panel generation for
+subsequent upload to the IMC acquisition software from Standard BioTools<sup><font size="1">TM</font></sup>
+
+Depending on the sample type, different staining protocols can be used.
+Generally, once antibodies of choice have been conjugated to a metal tag,
+titration experiments are performed to identify the optimal staining
+concentration. For FFPE samples, different staining protocols have been
+described, and different antibodies show variable staining with different
+protocols. Protocols such as the one provided by Standard BioTools<sup><font size="1">TM</font></sup> or the one describe by
+[@Ijsselsteijn2019] are recommended. Briefly, for FFPE tissues, a dewaxing
+step is performed to remove the paraffin used to embed the material, followed by
+a graded re-hydration of the samples. Thereafter, heat-induced epitope retrieval
+(HIER), a step aiming at the reversal of formalin-based fixation, is used to
+unmask epitopes within tissues and make them accessible to antibodies. Epitope
+unmasking is generally performed in either basic, EDTA-based buffers (pH 9.2) or
+acidic, citrate-based buffers (pH 6). Next, a buffer containing bovine serum
+albumin (BSA) is used to block non-specific binding. This buffer is also used to
+dilute antibody stocks for the actual antibody staining. Staining time and
+temperature may vary and optimization must be performed to ensure that each
+single antibody performs well. However, overnight staining at 4&deg;C or 3-5
+hours at room temperature seem to be suitable in many cases.
+
+Following antibody incubation, unbound antibodies are washed away and a
+counterstain comparable to DAPI is applied to enable the identification of
+nuclei. The [Iridium intercalator](https://store.fluidigm.com/Cytometry/ConsumablesandReagentsCytometry/MassCytometryReagents/Cell-ID%E2%84%A2%20Intercalator-Ir%E2%80%94125%20%C2%B5M)
+from Standard BioTools<sup><font size="1">TM</font></sup> is a reagent of choice and applied in a brief 5 minute staining.
+Finally, the samples are washed again and then dried under an airflow. Once
+dried, the samples are ready for analysis using IMC and are
+usually stable for a long period of time (at least one year).
+
+### Data acquisition
+
+Data is acquired using the CyTOF software from Standard BioTools<sup><font size="1">TM</font></sup> (see manuals
+[here](https://go.fluidigm.com/hyperion-support-documents)).
+
+The regions of interest are selected by providing coordinates for ablation. To
+determine the region to be imaged, so called "panoramas" can be generated. These
+are stitched images of single fields of views of about 200 &mu;m in diameter.
+Panoramas provide an optical overview of the tissue with a resolution similar to
+10x in microscopy and are intended to help with the selection of regions of
+interest for ablation. The tissue should be centered on the glass side, since
+the imaging mass cytometer cannot access roughly 5 mm from each of the slide
+edges. Currently, the instruments can process one slide at a time and usually one MCD
+file per sample slide is generated.
+
+Many regions of interest can be defined on a single slide and acquisition
+parameters such as channels to acquire, acquisition speed (100 Hz or 200 Hz),
+ablation energy, and other parameters are user-defined. It is recommended that
+all isotope channels are recorded. This will result in larger raw data files but valuable information such as
+potential contamination of the argon gas (e.g., Xenon) or of the samples (e.g.,
+lead, barium) is stored.
+
+To process a large number of slides or to select regions on whole-slide samples,
+panoramas may not  provide sufficient information. If this is the case,
+multi-color immunofluorescence of the same slide prior to staining with
+metal-labeled antibodies may be performed. To allow for region selection based
+on immunofluorescence images and to align those images with a panorama of the
+same or consecutive sections of the sample, we developed
+[napping](https://github.com/BodenmillerGroup/napping).
+
+Acquisition time is directly proportional to the total size of ablation, and run
+times for samples of large area or for large sample numbers can roughly be calculated by
+dividing the ablation area in square micrometer by the ablation speed (e.g.,
+200Hz). In addition to the proprietary MCD file format, TXT files can also
+be generated for each region of interest. This is recommended as a back-up
+option in case of errors that may corrupt MCD files but not TXT files.
+
+## IMC data format {#data-format}
+
+Upon completion of the acquisition an MCD file of variable size is generated. A
+single MCD file can hold raw acquisition data for multiple regions of interest,
+optical images providing a slide level overview of the sample ("panoramas"), and
+detailed metadata about the experiment. Additionally, for each acquisition a
+TXT file is generated which holds the same pixel information as the matched
+acquisition in the MCD file. 
+
+The Hyperion Imaging System<sup><font size="1">TM</font></sup> produces files in the following folder structure:
+
+```
+.
++-- {XYZ}_ROI_001_1.txt
++-- {XYZ}_ROI_002_2.txt
++-- {XYZ}_ROI_003_3.txt
++-- {XYZ}.mcd
+```
+
+Here, `{XYZ}` defines the filename, `ROI_001`, `ROI_002`, `ROI_003` are
+user-defined names (descriptions) for the selected regions of interest (ROI),
+and `1`, `2`, `3` indicate the unique acquisition identifiers. The ROI
+description entry can be specified in the Standard BioTools software when
+selecting ROIs. The MCD file contains the raw imaging data and the full metadata
+of all acquired ROIs, while each TXT file contains data of a single ROI without
+metadata. To follow a consistent naming scheme and to bundle all metadata, we
+recommend to zip the folder. Each ZIP file should only contain data from a
+single MCD file, and the name of the ZIP file should match the name of the MCD
+file.
+
+We refer to this data as raw data and the further
+processing of this data is described in Section \@ref(processing).
+
+
diff --git a/02-processing.md b/02-processing.md
@@ -0,0 +1,122 @@
+# Multi-channel image processing {#processing}
+
+This book focuses on common analysis steps of spatially-resolved single-cell data
+**after** image segmentation and feature extraction. In this chapter, the sections
+describe the processing of multiplexed imaging data, including file type
+conversion, image segmentation, feature extraction and data export. To obtain
+more detailed information on the individual image processing approaches, please
+visit their repositories:
+
+[steinbock](https://github.com/BodenmillerGroup/steinbock): The `steinbock`
+toolkit offers tools for multi-channel image processing using the command-line
+or Python code [@Windhager2021]. Supported tasks include IMC data pre-processing,
+multi-channel image segmentation, object quantification and data
+export to a variety of file formats. It supports functionality similar to those
+of the IMC Segmentation Pipeline (see below) and further allows deep-learning enabled image
+segmentation. The toolkit is available as platform-independent Docker
+container, ensuring reproducibility and user-friendly installation. Read more in
+the [Docs](https://bodenmillergroup.github.io/steinbock/latest/).
+
+[IMC Segmentation
+Pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline): The IMC
+segmentation pipeline offers a rather manual way of segmenting multi-channel
+images using a pixel classification-based approach. We continue to maintain the
+pipeline but recommend the use of the `steinbock` toolkit for multi-channel
+image processing.  Raw IMC data pre-processing is performed using the
+[readimc](https://github.com/BodenmillerGroup/readimc) Python package to convert
+raw MCD files into OME-TIFF and TIFF files. After image cropping, an
+[Ilastik](https://www.ilastik.org/) pixel classifier is trained for image
+classification prior to image segmentation using
+[CellProfiler](https://cellprofiler.org/). Features (i.e., mean pixel intensity)
+of segmented objects (i.e., cells) are quantified and exported. Read more in the
+[Docs](https://bodenmillergroup.github.io/ImcSegmentationPipeline/).
+
+## Image pre-processing (IMC specific)
+
+Image pre-processing is technology dependent. While most multiplexed imaging
+technologies generated TIFF or OME-TIFF files which can be directly segmented
+using the `steinbock` toolkit, IMC produces data in the proprietary
+data format MCD. 
+
+To facilitate IMC data pre-processing, the
+[readimc](https://github.com/BodenmillerGroup/readimc) open-source Python
+package allows extracting the multi-modal (IMC acquisitions, panoramas),
+multi-region, multi-channel information contained in raw IMC images. Both the
+IMC Segmentation Pipeline and the `steinbock` toolkit use the `readimc`
+package for IMC data pre-processing. Starting from IMC raw data and a "panel"
+file, individual acquisitions are extracted as TIFF files and OME-TIFF files if
+using the IMC Segmentation Pipeline. The panel contains information of
+antibodies used in the experiment and the user can specify which channels to
+keep for downstream analysis. When using the IMC Segmentation Pipeline, random
+tiles are cropped from images for convenience of pixel labelling.
+
+## Image segmentation
+
+The IMC Segmentation Pipeline supports pixel classification-based image
+segmentation while `steinbock` supports pixel classification-based and deep
+learning-based segmentation.
+
+**Pixel classification-based** image segmentation is performed by training a 
+random forest classifier using [Ilastik](https://www.ilastik.org/) on the
+randomly extracted image crops and selected image channels. Pixels are
+classified as nuclear, cytoplasmic, or background. Employing a customizable
+[CellProfiler](https://cellprofiler.org/) pipeline, the probabilities are then
+thresholded for segmenting nuclei, and nuclei are expanded into cytoplasmic
+regions to obtain cell masks.
+
+**Deep learning-based** image segmentation is performed as presented by
+[@Greenwald2021]. Briefly, `steinbock` first aggregates user-defined
+image channels to generate two-channel images representing nuclear and
+cytoplasmic signals. Next, the
+[DeepCell](https://github.com/vanvalenlab/intro-to-deepcell) Python package is
+used to run `Mesmer`, a deep learning-enabled segmentation algorithm pre-trained
+on `TissueNet`, to automatically obtain cell masks without any further user
+input.
+
+Segmentation masks are single-channel images that match the input images in
+size, with non-zero grayscale values indicating the IDs of segmented objects
+(e.g., cells). These masks are written out as TIFF files after segmentation.
+
+## Feature extraction {#feature-extraction}
+
+Using the segmentation masks together with their corresponding multi-channel
+images, the IMC Segmentation Pipeline as well as the `steinbock` toolkit extract
+object-specific features. These include the mean pixel intensity per object and
+channel, morphological features (e.g., object area) and the objects' locations.
+Object-specific features are written out as CSV files where rows represent
+individual objects and columns represent features.
+
+Furthermore, the IMC Segmentation Pipeline and the `steinbock` toolkit compute
+_spatial object graphs_, in which nodes correspond to objects, and nodes in
+spatial proximity are connected by an edge. These graphs serve as a proxy for
+interactions between neighboring cells. They are stored as edge list in form of
+one CSV file per image.
+
+Both approaches also write out image-specific metadata (e.g., width and height)
+as a CSV file.
+
+## Data export
+
+To further facilitate compatibility with downstream analysis, `steinbock`
+exports data to a variety of file formats such as OME-TIFF for images, FCS for
+single-cell data, the _anndata_ format [@Virshup2021] for data analysis in Python,
+and various graph file formats for network analysis using software such as
+[CytoScape](https://cytoscape.org/) [@Shannon2003]. For export to OME-TIFF,
+steinbock uses [xtiff](https://github.com/BodenmillerGroup/xtiff), a Python
+package developed for writing multi-channel TIFF stacks.
+
+## Data import into R
+
+In Section \@ref(read-data), we will highlight the use of the
+[imcRtools](https://github.com/BodenmillerGroup/imcRtools) and
+[cytomapper](https://github.com/BodenmillerGroup/cytomapper) R/Bioconductor
+packages to read spatially-resolved, single-cell and images as generated by the
+IMC Segmentation Pipeline and the `steinbock` toolkit into the statistical
+programming language R. All further downstream analyses are performed in R and
+detailed in the following sections.
+
+
+
+
+
+