- Reproducing results in the "Harmonization and Annotation of Single-cell Transcriptomics data with Deep Generative Models" paper
- Demonstration of how to use scVI and scANVI for the harmonization and annotation problem
chenlingantelope [at] berkeley [dot] edu
Analysis | Associated Script | Datasets | Technology | Number of Cells | Ref. |
---|---|---|---|---|---|
Figure 2: Benchmark | PBMC8KCITE.py | PBMC-8K; PBMC-CITE | 10x | 8,381; 7,667 | 10x DatasetsStoeckius, Marlon, et al. 2017 |
Supplementary Figure 2: UMAP Visualization | PBMC8KCITE.py | PBMC-8K; PBMC-CITE | 10x | 8,381; 7,667 | 10x Datasets; Stoeckius, Marlon, et al. 2017 |
Figure 2: Benchmark | MarrowTM.py Tech1.pretty.ipynb | MarrowTM-10x; MarrowTM-ss2 | 10x; SmartSeq2 | 4,112;5,351 | Quake, Stephen R., et al. 2018 |
Supplementary Figure 1: Robustness Analysis for Hyperparameter Choice | Robustness_study.ipynb | MarrowTM-10x; MarrowTM-ss2 | 10x; SmartSeq2 | 4,112;5,351 | Quake, Stephen R., et al. 2018 |
Supplementary Figure 3: UMAP Visualization | MarrowTM.py | MarrowTM-10x; MarrowTM-ss2 | 10x; SmartSeq2 | 4,112;5,351 | |
Figure 2: Benchmark | Pancreas.py | Pancreas-InDrop; Pancreas-CEL-Seq2 | inDrop; CEL-Seq2 | 8,569; 2,449 | Baron, Maayan, et al. 2016; Muraro, Mauro J., et al. 2016 |
Supplementary Figure 4: UMAP Visualization | Pancreas.py | Pancreas-InDrop; Pancreas-CEL-Seq2 | inDrop; CEL-Seq2 | 8,569; 2,449 | Baron, Maayan, et al. 2016; Muraro, Mauro J., et al. 2016 |
Figure 2: Benchmark | DentateGyrus.py | DentateGyrus-10x; DentateGyrus-C1 | 10x; Fluidigm C1 | 5,454; 2,303 | Hochgerner, Hannah, et al. 2018 |
Supplementary Figure 5: UMAP Visualization | DentateGyrus.py | DentateGyrus-10x; DentateGyrus-C1 | 10x; Fluidigm C1 | 5,454; 2,303 | Hochgerner, Hannah, et al. 2018 |
Figure 3: Robustness Analysis by subsampling cells Supplementary Figure 10 | NoOverlapSCANVI.py PopRemoveSCANVI.py SCANVI_posterior-NoOverlap.ipynb SCANVI_posterior_poprm.ipynb | PBMC-8K; PBMC-CITE | 10x | 8,381; 7,667 | 10x Datasets; Stoeckius, Marlon, et al. 2017 |
Figure 4: Continuous Trajectory Supplementary Supplementary Figure 6: UMAP | continuous.ipynb | HEMATO-Tusi; HEMATO-Paul | inDrop; MARS-seq | 4,016 ; 2,730 | Tusi, Betsabeh Khoramian, et al. 2018; Paul, Franziska, et al. 2015 |
Figure 5: External Validation by Experimentally Derived Labels, Supplementary Figure 11 | harmonization-CitePure-SCANVI.ipynb | PBMC-68K; PBMC-Sorted; PBMC-CITE | 10x | 68,579; 94,655; 7,667 | Zheng, Grace XY, et al. 2017; Stoeckius, Marlon, et al. 2017 |
Figure 6: Semi-Supervised Annotation of T Cell Subtypes, Supplementary Figure 12 | SCANVI-mild-annot-Clustering.ipynb | PBMC-Sorted T cell Subtypes | 10x | 42919 | Zheng, Grace XY, et al. 2017; Stoeckius, Marlon, et al. 2017 |
Hierarchical Semi-Supervised Annotation | Hierarchical.ipynb | CORTEX | 10x | 160,796 | Zeisel, Amit, et al. "Molecular architecture of the mouse nervous system." bioRxiv (2018): 294918. |
Supplementary Figure 7: Scalability Analysis | scanorama.ipynb | SCANORAMA | Mixed | 105,476 | Hie, Brian L., Bryan Bryson, and Bonnie Berger. "Panoramic stitching of heterogeneous single-cell transcriptomic data." bioRxiv (2018): 371179. |
Supplementary Figure 13: Differential Expression | DE-final.ipynb | PBMC-8K; PBMC-68K | 10x | 8,381; 68,579 | 10x Datasets; Zheng, Grace XY, et al. 2017 |
- Supplemtary Figure 2,3,4,5,8,9 are generated using scripts in Additional_Scripts/ using output from the analysis python scripts including scanvi_acc.R, KNNcurves.py and BE_curves.py.
- Boxplots for Figure 3 are generated using poprm_boxplot.R in Additional_Scripts/
- The Additional_Scripts also contains code for running Seurat directly from commandline runSeurat.R and SeuratPCA.R.
- All .gmt files in Additional_Scripts/ are gene signatures.
- Clone the github repository, install the dependencies and call functions from the modules scVI
- Install time (< 10 min)
- Pytorch V0.4.1
- Python 3
- scikit-learn V0.19.1
- To reproduce results from the paper, look up the relevant datasets, python notebooks (located in notebooks/), or python scripts (located in the root directory).
- Download the relevant datasets except for the ones already wrapped for the scVI package (PBMC-8K, PBMC-CITE, PBMC-68K, PBMC-Sorted, MarrowTM-10x, MarrowTM-ss2 can be loaded directly with the dataloader functions)
- Annotation files generated by us when the original study did not provide annotation (cite.seurat.labels) can be found in the scvi-data repository
- Run the analysis and results should match those of the paper.
- This repository contains functions written uniquely to produce some of the analysis in this paper. For more up-to-date package refer to main scVI repository