A package for processing and analysis of imaging mass cytometry (IMC) data.
It implements image- and channel-wise quality control, quantification of cell intenstity and morphology, cell type discovery through clustering, automated cell type labeling, community and super-community finding and differential comparisons between sample groups, in addition to many handy visualization tools. Above all, it is a tool for the use of IMC data at scale.
Development is still underway, so use at your own risk.
Requires Python >= 3.8
. imc
uses a pyproject.toml
configuration only, so you'll need a up-to-date version of pip
before installing. Base packages as gcc
and g++
will also need to be installed on system using the command sudo apt install g++
or likewise. We also highly recommend installing the package on a conda
environment to avoid dependency issues.
To install the most updated version of the program:
git clone https://github.com/ElementoLab/imc.git
cd imc
make install
Install from PyPI with pip
or with poetry:
pip install imc
# or
poetry install imc
Install the package from PyPI with extra packages required for all steps:
pip install imc[extra]
# or
poetry install imc[extra]
One-line IMC data processing:
# Run pipeline in one step with remote MCD file
MCD_URL=https://zenodo.org/record/4110560/files/data/20200612_FLU_1923/20200612_FLU_1923.mcd
imc process $MCD_URL
imc
also supports TXT or TIFF files as input, local or remote files:
# Run pipeline in one step with remote TXT file
TXT_URL=https://zenodo.org/record/5018260/files/COVID19_brain_Patient03_ROI3_COVID19_olfactorybulb.txt?download=1
imc process $TXT_URL
Input can be MCD, TIFF, or TXT files.
Several files can be given to imc process
at once. See more with the --help
option.
imc
is nonetheless very modular and allows the user to run any of the step seperately as well.
The above is also equivalent to the following:
MCD_URL=https://zenodo.org/record/4110560/files/data/20200612_FLU_1923/20200612_FLU_1923.mcd
SAMPLE=20200612_FLU_1923
wget -O data/${SAMPLE}.mcd $MCD_URL
## output description of acquired data
imc inspect data/${SAMPLE}.mcd
## convert MCD to TIFFs and auxiliary files
imc prepare \
--ilastik \
--n-crops 0 \
--ilastik-compartment nuclear \
data/${SAMPLE}.mcd
## For each TIFF file, output prediction of mask probabilities and segment them
TIFFS=processed/${SAMPLE}/tiffs/${SAMPLE}*_full.tiff
## Output pixel probabilities of nucleus, membrane and background using ilastik
imc predict $TIFFS
## Segment cell instances with DeepCell
imc segment \
--from-probabilities \
--model deepcell \
--compartment both $TIFFS
## Quantify channel intensity and morphology for each single cell in every image
imc quantify $TIFFS
Once all MCD files have been processed for the project, create a concatenated AnnData object containing all cells within a project.
from glob import glob
import os
import anndata
pattern = glob('processed/*.h5ad')
adatas = [anndata.read(f) for f in pattern if os.path.exists(f)]
adata = anndata.concat(adatas)
adata.write('results/quant.h5ad')
To perform batch correction and cell clustering:
## Phenotype cells into clusters
imc phenotype processed/quant.h5ad
There are many customization options for each step. Do imc --help
or imc <subcommand> --help
to see all.
imc
also includes a lightweight interactive image viewer:
imc view $TIFFS
There is also an interface to the more full fledged napari
image viwer:
imc view --napari data/${SAMPLE}.mcd # view MCD file
napari $TIFFS # view TIFF files directly with napari. Requires napari
A quick example of further analysis steps of single cell data downstream in IPython/Jupyter notebook:
import scanpy as sc
a = sc.read('processed/quantification.h5ad')
sc.pp.log1p(a)
sc.pp.pca(a)
sc.pp.neighbors(a)
sc.tl.umap(a)
sc.pl.umap(a, color=a.var.index)
>>> from imc.demo import generate_project
>>> prj = generate_project(n_samples=2, n_rois_per_sample=3, shape=(8, 8))
>>> prj
Project 'project' with 2 samples and 6 ROIs in total.
>>> prj.samples # type: List[IMCSample]
[Sample 'test_sample_01' with 3 ROIs,
Sample 'test_sample_02' with 3 ROIs]
>>> prj.rois # type: List[ROI]
[Region 1 of sample 'test_sample_01',
Region 2 of sample 'test_sample_01',
Region 3 of sample 'test_sample_01',
Region 1 of sample 'test_sample_02',
Region 2 of sample 'test_sample_02',
Region 3 of sample 'test_sample_02']
>>> prj.samples[0].rois # type: List[ROI]
[Region 1 of sample 'test_sample_01',
Region 2 of sample 'test_sample_01',
Region 3 of sample 'test_sample_01']
>>> roi = prj.rois[0] # Let's assign one ROI to explore it
>>> roi.channel_labels # type: pandas.Series; `channel_names`, `channel_metals` also available
0 Ch01(Ch01)
1 Ch02(Ch02)
2 Ch03(Ch03)
Name: channel, dtype: object
>>> roi.mask # type: numpy.ndarray
array([[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 2, 0, 0, 0, 3, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 4, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)
>>> roi.stack.shape # roi.stack -> type: numpy.ndarray
(3, 8, 8)
>>> # QC
>>> prj.channel_correlation()
>>> prj.channel_summary()
>>> # Cell type discovery
>>> prj.cluster_cells()
>>> prj.find_communities()
>>> import imc.demo
>>> imc.demo.datasets
['jackson_2019_short', 'jackson_2019_short_joint']
>>> prj = imc.demo.get_dataset('jackson_2019_short')
>>> prj # type: Project
Project 'jackson_2019_short' with 4 samples and 4 ROIs in total.
>>> prj.samples # type: List[IMCSample]
[Sample 'BaselTMA_SP41_15.475kx12.665ky_10000x8500_5_20170905_90_88_X11Y5_242_a0' with 1 ROI,
Sample 'BaselTMA_SP41_25.475kx12.665ky_8000x8500_3_20170905_90_88_X11Y5_235_a0' with 1 ROI,
Sample 'BaselTMA_SP41_33.475kx12.66ky_8500x8500_2_20170905_24_61_X3Y4_207_a0' with 1 ROI,
Sample 'BaselTMA_SP41_33.475kx12.66ky_8500x8500_2_20170905_33_61_X4Y4_215_a0' with 1 ROI]
>>> prj.samples[0].channel_labels # type: pandas.Series
chanel
0 Ar80(Ar80)
1 Ru96(Ru96)
2 Ru98(Ru98)
3 Ru99(Ru99)
4 Ru100(Ru100)
5 Ru101(Ru101)
6 Ru102(Ru102)
7 Ru104(Ru104)
8 HistoneH3(In113)
9 EMPTY(Xe126)
10 EMPTY(I127)
11 HistoneH3(La139)
...
42 vWF-CD31(Yb172)
43 mTOR(Yb173)
44 Cytokeratin7(Yb174)
45 PanCytokeratin-KeratinEpithelial(Lu175)
46 CleavedPARP-CleavedCaspase3(Yb176)
47 DNA1(Ir191)
48 DNA2(Ir193)
49 EMPTY(Pb206)
50 EMPTY(Pb207)
51 EMPTY(Pb208)
Name: BaselTMA_SP41_15.475kx12.665ky_10000x8500_5_20170905_90_88_X11Y5_242_a0, dtype: object
>>> prj.plot_channels(['DNA2', 'Ki67', "Cytokeratin7"])
<Figure size 400x1200 with 12 Axes>
The best way is to have a CSV file with one row per sample, or one row per ROI.
That will ensure additional sample/ROI metadata is passed to the objects and used later in analysis.
Pass the path to the CSV file to the Project
object constructor:
from imc import Project
prj = Project() # will search current directory for Samples/ROIs
prj = Project(processed_dir="processed") # will search `processed` for Samples/ROIs
prj = Project("path/to/sample/annotation.csv", processed_dir="processed")
# ^^ will use metadata from CSV and use the files in `processed`.
However, if one is not given, Project
will search the current directory or the
argument of processed_dir
for IMCSamples and ROIs.
The processed_dir
directory can be structured in two ways:
- One directory per sample.
- Inside there is a directory
"tiffs"
which contains the stack"*_full.tiff"
, channel labels"*_full.csv"
and optionally a segmentation"*_full_mask.tiff"
.
- All samples in the same directory
processed_dir
.
- Inside the one directory there are stack
"*_full.tiff"
, channel label"*_full.csv"
and optionally segmentation"*_full_mask.tiff"
files.
The default is option one. If you choose 2
, simply pass subfolder_per_sample
:
prj = Project(subfolder_per_sample=True)
The expected files are produced by common preprocessing pipelines such as imcpipeline or imcyto.
Documentation is for now mostly a skeleton but will be expanded soon:
make docs
Tests are still very limited, but you can run tests this way:
pip install pytest # install testing package
python -m pytest --pyargs imc
For data processing, running the example lung data should make sure eveything is running smoothly.