Imaging mass cytometry

A package for processing and analysis of imaging mass cytometry (IMC) data.

It implements image- and channel-wise quality control, quantification of cell intenstity and morphology, cell type discovery through clustering, automated cell type labeling, community and super-community finding and differential comparisons between sample groups, in addition to many handy visualization tools. Above all, it is a tool for the use of IMC data at scale.

Development is still underway, so use at your own risk.

Requirements and installation

Requires Python >= 3.8. imc uses a pyproject.toml configuration only, so you'll need a up-to-date version of pip before installing. Base packages as gcc and g++ will also need to be installed on system using the command sudo apt install g++ or likewise. We also highly recommend installing the package on a conda environment to avoid dependency issues.

To install the most updated version of the program:

git clone https://github.com/ElementoLab/imc.git
cd imc
make install

Install from PyPI with pip or with poetry:

pip install imc
# or
poetry install imc

Quick start

Install the package from PyPI with extra packages required for all steps:

pip install imc[extra]
# or
poetry install imc[extra]

Use case 1 (pipeline processing)

Example: Lung sample processing from MCD to single-cell h5ad

One-line IMC data processing:

# Run pipeline in one step with remote MCD file
MCD_URL=https://zenodo.org/record/4110560/files/data/20200612_FLU_1923/20200612_FLU_1923.mcd
imc process $MCD_URL

imc also supports TXT or TIFF files as input, local or remote files:

# Run pipeline in one step with remote TXT file
TXT_URL=https://zenodo.org/record/5018260/files/COVID19_brain_Patient03_ROI3_COVID19_olfactorybulb.txt?download=1
imc process $TXT_URL

Input can be MCD, TIFF, or TXT files. Several files can be given to imc process at once. See more with the --help option.

imc is nonetheless very modular and allows the user to run any of the step seperately as well.

The above is also equivalent to the following:

MCD_URL=https://zenodo.org/record/4110560/files/data/20200612_FLU_1923/20200612_FLU_1923.mcd
SAMPLE=20200612_FLU_1923

wget -O data/${SAMPLE}.mcd $MCD_URL

## output description of acquired data
imc inspect data/${SAMPLE}.mcd

## convert MCD to TIFFs and auxiliary files
imc prepare \
  --ilastik \
  --n-crops 0 \
  --ilastik-compartment nuclear \
  data/${SAMPLE}.mcd

## For each TIFF file, output prediction of mask probabilities and segment them 
TIFFS=processed/${SAMPLE}/tiffs/${SAMPLE}*_full.tiff

## Output pixel probabilities of nucleus, membrane and background using ilastik
imc predict $TIFFS

## Segment cell instances with DeepCell
imc segment \
  --from-probabilities \
  --model deepcell \
  --compartment both $TIFFS

## Quantify channel intensity and morphology for each single cell in every image
imc quantify $TIFFS

Once all MCD files have been processed for the project, create a concatenated AnnData object containing all cells within a project.

from glob import glob
import os
import anndata
pattern = glob('processed/*.h5ad')
adatas = [anndata.read(f) for f in pattern if os.path.exists(f)]
adata = anndata.concat(adatas)
adata.write('results/quant.h5ad')

To perform batch correction and cell clustering:

## Phenotype cells into clusters
imc phenotype processed/quant.h5ad

There are many customization options for each step. Do imc --help or imc <subcommand> --help to see all.

imc also includes a lightweight interactive image viewer:

imc view $TIFFS

There is also an interface to the more full fledged napari image viwer:

imc view --napari data/${SAMPLE}.mcd  # view MCD file
napari $TIFFS  # view TIFF files directly with napari. Requires napari

A quick example of further analysis steps of single cell data downstream in IPython/Jupyter notebook:

import scanpy as sc
a = sc.read('processed/quantification.h5ad')
sc.pp.log1p(a)
sc.pp.pca(a)
sc.pp.neighbors(a)
sc.tl.umap(a)
sc.pl.umap(a, color=a.var.index)

Use case 2 (API usage)

Demo data (synthetic)

>>> from imc.demo import generate_project
>>> prj = generate_project(n_samples=2, n_rois_per_sample=3, shape=(8, 8))
>>> prj
Project 'project' with 2 samples and 6 ROIs in total.

>>> prj.samples  # type: List[IMCSample]
[Sample 'test_sample_01' with 3 ROIs,
 Sample 'test_sample_02' with 3 ROIs]

>>> prj.rois  # type: List[ROI]
[Region 1 of sample 'test_sample_01',
 Region 2 of sample 'test_sample_01',
 Region 3 of sample 'test_sample_01',
 Region 1 of sample 'test_sample_02',
 Region 2 of sample 'test_sample_02',
 Region 3 of sample 'test_sample_02']

>>> prj.samples[0].rois  # type: List[ROI]
[Region 1 of sample 'test_sample_01',
 Region 2 of sample 'test_sample_01',
 Region 3 of sample 'test_sample_01']

>>> roi = prj.rois[0]  # Let's assign one ROI to explore it
>>> roi.channel_labels  # type: pandas.Series; `channel_names`, `channel_metals` also available
0    Ch01(Ch01)
1    Ch02(Ch02)
2    Ch03(Ch03)
Name: channel, dtype: object

>>> roi.mask  # type: numpy.ndarray
array([[0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 2, 0, 0, 0, 3, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 4, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)

>>> roi.stack.shape  # roi.stack -> type: numpy.ndarray
(3, 8, 8)

>>> # QC
>>> prj.channel_correlation()
>>> prj.channel_summary()

>>> # Cell type discovery
>>> prj.cluster_cells()
>>> prj.find_communities()

Demo data (real)

>>> import imc.demo
>>> imc.demo.datasets
['jackson_2019_short', 'jackson_2019_short_joint']

>>> prj = imc.demo.get_dataset('jackson_2019_short')
>>> prj  # type: Project
Project 'jackson_2019_short' with 4 samples and 4 ROIs in total.

>>> prj.samples  # type: List[IMCSample]
[Sample 'BaselTMA_SP41_15.475kx12.665ky_10000x8500_5_20170905_90_88_X11Y5_242_a0' with 1 ROI,
 Sample 'BaselTMA_SP41_25.475kx12.665ky_8000x8500_3_20170905_90_88_X11Y5_235_a0' with 1 ROI,
 Sample 'BaselTMA_SP41_33.475kx12.66ky_8500x8500_2_20170905_24_61_X3Y4_207_a0' with 1 ROI,
 Sample 'BaselTMA_SP41_33.475kx12.66ky_8500x8500_2_20170905_33_61_X4Y4_215_a0' with 1 ROI]

>>> prj.samples[0].channel_labels  # type: pandas.Series
chanel
0                                  Ar80(Ar80)
1                                  Ru96(Ru96)
2                                  Ru98(Ru98)
3                                  Ru99(Ru99)
4                                Ru100(Ru100)
5                                Ru101(Ru101)
6                                Ru102(Ru102)
7                                Ru104(Ru104)
8                            HistoneH3(In113)
9                                EMPTY(Xe126)
10                                EMPTY(I127)
11                           HistoneH3(La139)
...
42                            vWF-CD31(Yb172)
43                                mTOR(Yb173)
44                        Cytokeratin7(Yb174)
45    PanCytokeratin-KeratinEpithelial(Lu175)
46         CleavedPARP-CleavedCaspase3(Yb176)
47                                DNA1(Ir191)
48                                DNA2(Ir193)
49                               EMPTY(Pb206)
50                               EMPTY(Pb207)
51                               EMPTY(Pb208)
Name: BaselTMA_SP41_15.475kx12.665ky_10000x8500_5_20170905_90_88_X11Y5_242_a0, dtype: object
>>> prj.plot_channels(['DNA2', 'Ki67', "Cytokeratin7"])
<Figure size 400x1200 with 12 Axes>

Your own data

The best way is to have a CSV file with one row per sample, or one row per ROI. That will ensure additional sample/ROI metadata is passed to the objects and used later in analysis. Pass the path to the CSV file to the Project object constructor:

from imc import Project

prj = Project()  # will search current directory for Samples/ROIs

prj = Project(processed_dir="processed")  # will search `processed` for Samples/ROIs

prj = Project("path/to/sample/annotation.csv", processed_dir="processed")
# ^^ will use metadata from CSV and use the files in `processed`.

However, if one is not given, Project will search the current directory or the argument of processed_dir for IMCSamples and ROIs.

The processed_dir directory can be structured in two ways:

One directory per sample.

Inside there is a directory "tiffs" which contains the stack "*_full.tiff", channel labels "*_full.csv" and optionally a segmentation "*_full_mask.tiff".

All samples in the same directory processed_dir.

Inside the one directory there are stack "*_full.tiff", channel label "*_full.csv" and optionally segmentation "*_full_mask.tiff" files.

The default is option one. If you choose 2, simply pass subfolder_per_sample:

prj = Project(subfolder_per_sample=True)

The expected files are produced by common preprocessing pipelines such as imcpipeline or imcyto.

Documentation

Documentation is for now mostly a skeleton but will be expanded soon:

make docs

Testing

Tests are still very limited, but you can run tests this way:

pip install pytest  # install testing package
python -m pytest --pyargs imc

For data processing, running the example lung data should make sure eveything is running smoothly.

Name		Name	Last commit message	Last commit date
Latest commit History 268 Commits
.github/workflows		.github/workflows
docs		docs
imc		imc
requirements		requirements
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
Makefile		Makefile
Manifest.in		Manifest.in
README.md		README.md
noxfile.py		noxfile.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Imaging mass cytometry

Requirements and installation

Quick start

Use case 1 (pipeline processing)

Example: Lung sample processing from MCD to single-cell h5ad

Use case 2 (API usage)

Demo data (synthetic)

Demo data (real)

Your own data

Documentation

Testing

About

Releases

Packages

Contributors 2

Languages

ElementoLab/imc

Folders and files

Latest commit

History

Repository files navigation

Imaging mass cytometry

Requirements and installation

Quick start

Use case 1 (pipeline processing)

Example: Lung sample processing from MCD to single-cell h5ad

Use case 2 (API usage)

Demo data (synthetic)

Demo data (real)

Your own data

Documentation

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages