Multiple Input Multiple Output Single Cell Analysis

Named in accordance with MIMO control systems, on which a combination of perturbations and measurements can help decode the dynamics of a system, this package was designed to assist those attempting to understand biological dynamics by designing, performing, and analyzing perturbation scRNA-seq experiments.

Related Resources

FAQ

Q: How many perturbations can I do?
A: It costs around $0.2/cell for commercial droplet scRNA-seq methods, it takes ~10 cells/perturbation to observe signature effects and ~100 cells/perturbation to see individual gene level effects robustly. Based on your budget, you can crunch the numbers (also see Design of Experiments section below)

Design of Experiments & Power Calculations

For a rough comparison of our pilot scRNA-seq to population RNA-seq of the same perturbation, see this iPython notebook.

In designing Perturb-seq like experiments, there are a few key factors to keep in mind:

Signatures vs. individual transcript-level phenotypes

Are you interested in broad transcriptional signatures or individual gene level differential expression? If the former, a rough approximation may be around 10 cells/perturbation. If the later, 100 or more cells may be required based on the effect size.

A similar approximation for reads/cell would be a couple thousand for signatures and tens of thousands for gene-level.

Library Size and Representation

As in any pooled screen, the representation of each perturbation in the library will vary. With genome wide CRISPR libraries the difference between the 10th and 90th percentile of a library is roughly 6-fold (Wang, 2013). Depending on how much a user wants to ensure every member of the library is represented, the cells/perturbation factor should be multiplied by an additional factor to reflect this variance.

Using High MOI to infer genetic interactions

Our approach to use high MOI instead of either a single vector with multiple sgRNAs or vectors with different selection methods benefits from ease of implementation and the ability to represent a large diversity of combinations (only limited by the number of cells).

However, challenges include a Poisson-like variance in the number of sgRNA/cell, sgRNA detection sensitivity, and the formation of PCR chimeras during the enrichment PCR procedure that can create misassignments.

All three of these factors should be assessed in pilot experiments to troubleshoot. An example of such a pilot would look as follows (modified from the Drop-seq style species mixing experiments):

Guide Barcode, Cell Barcode Pairing

The distribution of reads going to the Perturb-seq vector (antiparallel) from 10X RNA-seq is shown above. Note that while the expression of the construct is comprable to that of a housekeeping gene, only a fraction of the reads overlap with the 18bp barcode (colored section in the coverage track). As such, it is advisable in most cases when you have a short barcode to perform enrichment PCR to obtain sensitive GBC/CBC pairing.

Computational Workflow

Inputs

An expression matrix output by a high throughput scRNA-seq protocol (such as the Drop-seq or 10X cellranger)
Guide barcode (GBC) PCR data to pair perturbations with cell barcodes (for certain applications this may be able to be directly obtained from the RNA-seq data
A database of preassociated sgRNA/GBC pairs (either by Sanger sequencing or NGS)

Intermediate Computation

A simple fitness calculation is possible by determining the difference between the initial abundances of a GBC and how many cells it appeared in.
Guide barcodes and cell barcodes have to be paired accurately
A Cell state classifier is defined on wildtype or control cells and then applied to all cells in an experiment. These classifications can used as outputs to be predicted (instead of gene expression) or as covariates in the model
The linear model integrating all covariates (and interactions terms as desired) is fit. An EM-like approach filters cells that look much more like control cells than perturbed cells

Outputs

The regulatory coefficient obtained from the model are the most informative output giving an estimate of what extent each covariate (perturbation, cell state, pairwise interaction between perturbations, etc) impacted a given gene.
Cell state effects are obtained by predicting the cell states based on the linear model instead of predicting gene expression
Cell size effects (genes detected or transcripts detected) can be predicted as well

Name		Name	Last commit message	Last commit date
Latest commit History 189 Commits
GBC_CBC_pairing		GBC_CBC_pairing
Power_Analysis_DOE		Power_Analysis_DOE
common_files		common_files
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiple Input Multiple Output Single Cell Analysis

Related Resources

Contents

FAQ

Design of Experiments & Power Calculations