Skip to content
This repository has been archived by the owner on Nov 29, 2023. It is now read-only.
/ MIMOSCA Public archive
forked from asncd/MIMOSCA

A repository for the design and analysis of pooled single cell RNA-seq perturbation experiments.

License

Notifications You must be signed in to change notification settings

TiffanyAmariuta/MIMOSCA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIMOSCA

Multiple Input Multiple Output Single Cell Analysis

Named in accordance with MIMO control systems, on which a combination of perturbations and measurements can help decode the dynamics of a system, this package was designed to assist those attempting to understand biological dynamics by designing, performing, and analyzing perturbation scRNA-seq experiments.

Related Resources

Contents

Please let me know (here or in the Google Forum) if there are any areas that you'd like to see improved or new items to be added!

FAQ

  • Q: How many perturbations can I do?
  • A: It costs around $0.2/cell for commercial droplet scRNA-seq methods, it takes ~10 cells/perturbation to observe signature effects and ~100 cells/perturbation to see individual gene level effects robustly. Based on your budget, you can crunch the numbers (also see Design of Experiments section below)

Design of Experiments & Power Calculations

Experimental Design

For a rough comparison of our pilot scRNA-seq to population RNA-seq of the same perturbation, see this iPython notebook.

In designing Perturb-seq like experiments, there are a few key factors to keep in mind:

Signatures vs. individual transcript-level phenotypes

Are you interested in broad transcriptional signatures or individual gene level differential expression? If the former, a rough approximation may be around 10 cells/perturbation. If the later, 100 or more cells may be required based on the effect size.

A similar approximation for reads/cell would be a couple thousand for signatures and tens of thousands for gene-level.

Library Size and Representation

As in any pooled screen, the representation of each perturbation in the library will vary. With genome wide CRISPR libraries the difference between the 10th and 90th percentile of a library is roughly 6-fold (Wang, 2013). Depending on how much a user wants to ensure every member of the library is represented, the cells/perturbation factor should be multiplied by an additional factor to reflect this variance.

Using High MOI to infer genetic interactions

Our approach to use high MOI instead of either a single vector with multiple sgRNAs or vectors with different selection methods benefits from ease of implementation and the ability to represent a large diversity of combinations (only limited by the number of cells).

However, challenges include a Poisson-like variance in the number of sgRNA/cell, sgRNA detection sensitivity, and the formation of PCR chimeras during the enrichment PCR procedure that can create misassignments.

All three of these factors should be assessed in pilot experiments to troubleshoot. An example of such a pilot would look as follows (modified from the Drop-seq style species mixing experiments):

SMIX

Guide Barcode, Cell Barcode Pairing

pseq_plasmid

The distribution of reads going to the Perturb-seq vector (antiparallel) from 10X RNA-seq is shown above. Note that while the expression of the construct is comprable to that of a housekeeping gene, only a fraction of the reads overlap with the 18bp barcode (colored section in the coverage track). As such, it is advisable in most cases when you have a short barcode to perform enrichment PCR to obtain sensitive GBC/CBC pairing.

Computational Workflow

Overview

Inputs

  • An expression matrix output by a high throughput scRNA-seq protocol (such as the Drop-seq or 10X cellranger)
  • Guide barcode (GBC) PCR data to pair perturbations with cell barcodes (for certain applications this may be able to be directly obtained from the RNA-seq data
  • A database of preassociated sgRNA/GBC pairs (either by Sanger sequencing or NGS)

Intermediate Computation

  • A simple fitness calculation is possible by determining the difference between the initial abundances of a GBC and how many cells it appeared in.
  • Guide barcodes and cell barcodes have to be paired accurately
  • A Cell state classifier is defined on wildtype or control cells and then applied to all cells in an experiment. These classifications can used as outputs to be predicted (instead of gene expression) or as covariates in the model
  • The linear model integrating all covariates (and interactions terms as desired) is fit. An EM-like approach filters cells that look much more like control cells than perturbed cells

Outputs

  • The regulatory coefficient obtained from the model are the most informative output giving an estimate of what extent each covariate (perturbation, cell state, pairwise interaction between perturbations, etc) impacted a given gene.
  • Cell state effects are obtained by predicting the cell states based on the linear model instead of predicting gene expression
  • Cell size effects (genes detected or transcripts detected) can be predicted as well

About

A repository for the design and analysis of pooled single cell RNA-seq perturbation experiments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%