CRISPR-SURF (Screening of Uncharacterized Region Function) is an exploratory and interactive computational framework for the design and analysis of CRISPR-Cas, CRISPRi, and CRISPRa tiling screens.
CRISPR-SURF is available as a user-friendly, open-source software and can be used interactively as a web application at crisprsurf.pinellolab.org or as a stand-alone command-line tool with Docker https://github.com/pinellolab/CRISPR-SURF.
With Docker, no installation is required - the only dependence is Docker itself. Users will not need to deal with installation and configuration issues. Docker will do all the dirty work for you!
Docker can be downloaded freely here: https://store.docker.com/search?offering=community&type=edition
To get a local copy of CRISPR-SURF, simply execute the following command:
docker pull pinellolab/crisprsurf
The CRISPR-SURF Design script allows users to design sgRNAs for their CRISPR tiling screen. Run CRISPR-SURF Design in the terminal with the command:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_design [options]
Users can specify the following options:
-bed, --bed
Input bed file to design tiling sgRNAs. (Required)
-genome, --genome
Input genome 2bit file. (Required)
-pams, --pams
Specification of different CRISPR PAMs where brackets [] allow for multiple nucleotides for a given position (i.e. [ATCG]GG -> NGG, TTT[ACG] -> TTTV, [ATCG]G -> NG). Multiple PAMs separated by spaces can be inputted (i.e. [ATCG]GG TTT[ACG]). (Required)
-orient, --orientations
Orientation of the spacer sequence relative to the PAM. This must match the length of the -pams option as an orientation must be specified for each PAM. Multiple orientations are separated by spaces (i.e. left right). (Options: left, right | Required)
-guide_l, --guide_length
Length of the sgRNA to design. (Default: 20)
-g_constraint, --g_constraint
Constraint forcing the 5' sgRNA bp to be G base. All guides with no 5’ G will be filtered out. (Options: true, false | Default: false)
-out, --out_dir
Name of output directory. (Default: ./)
Running CRISPR-SURF Design Yourself
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_design -bed BED_FILE -genome 2BIT_GENOME_FILE -pams [ATCG]GG TTT[ACG] -orient left right -out example_run
IMPORTANT: The BED_FILE and 2BIT_GENOME_FILE must be in the working directory where the command-line code is run.
Running Cas-OFFinder
The off-targets of the designed sgRNAs can be enumerated with Cas-OFFinder by isolating the 4th column in the CRISPR-SURF Design output file, SURF_designed_sgRNAs.csv
. Instructions on running Cas-OFFinder can be found here: http://www.rgenome.net/cas-offinder/portable
The CRISPR-SURF Count script generates a required input file, sgRNAs_summary_table.csv
, for both the CRISPR-SURF interactive website and command-line interface. Run CRISPR-SURF Count in the terminal with the command:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_count [options]
Users can specify the following options:
-f, --sgRNA_library
Input sgRNA library file. Formatting specified below. (Required)
-control_fastqs, --control_fastqs
List of control FASTQs with sgRNA sequencing prior to selection separated by spaces (i.e. rep1_control.fastq rep2_control.fastq rep3_control.fastq). (Default: None)
-sample_fastqs, --sample_fastqs
List of sample FASTQs with sgRNA sequencing following selection separated by spaces (i.e. rep1_sample.fastq rep2_sample.fastq rep3_sample.fastq). (Default: None)
-nuclease, --nuclease
Nuclease used in the CRISPR tiling screen experiment. This information is used to determine the cleavage index if indels are specified as the perturbation. (Options: cas9, cpf1 | Default: cas9)
-pert, --perturbation
Perturbation type used in the CRISPR tiling screen experiment. This information is used to determine the perturbation index for a given sgRNA. (Options: indel, crispri, crispra | Default: indel)
-norm, --normalization
Normalization method between sequencing libraries. (Options: none, median, total | Default: median)
-count_method, --count_method
Counting method for sgRNAs from FASTQ. The tracrRNA option aligns a consensus sequence directly downstream of the sgRNA. The index option uses provided indices to grab sgRNA sequence from the sequencing reads. (Options: tracrRNA, index | Default: tracrRNA)
-tracrRNA, --tracrRNA
If -count_method == tracrRNA. The consensus tracrRNA sequence directly downstream of the sgRNA for counting from FASTQ. (Default: GTTTTAG)
-sgRNA_index, --sgRNA_index
If -count_method == index. The sgRNA start and stop indices (0-index) within the sequencing reads (i.e. 0 20). (Default: 0 20)
-count_min, --count_minimum
The minimum number of counts for a given sgRNA in each control sample. (Default: 50)
-dropout, --dropout_penalty
The dropout penalty removes sgRNAs that have a 0 count in any of the control/sample replicates. (Default: True)
-TTTT, --TTTT_penalty
The TTTT penalty removes sgRNAs that have a homopolymer stretch of Ts >= 4. (Default: True)
-sgRNA_length, --sgRNA_length
Length of sgRNAs used in the CRISPR tiling screen experiment. This must match the sgRNA length provided in the sgRNA library file. (Default: 20)
-reverse, --reverse_score
Reverse the enrichment score. Generally applied to depletion screens where a positive score is associated with depletion of a sgRNA. (Default: False)
-out_dir, --out_directory
The output directory for CRISPR-SURF counts. (Default: ./)
To start, you will need one of the following:
- Option (1) sgRNA Library File with FASTQs
- Option (2) sgRNA Library File with counts
sgRNA Library File Format Example (.CSV):
Chr | Start | Stop | sgRNA_Sequence | Strand | sgRNA_Type |
---|---|---|---|---|---|
chr2 | 60717499 | 60717519 | AGCTCTGGAATGATGGCTTA | - | observation |
chr2 | 60717506 | 60717526 | ATTGTGGAGCTCTGGAATGA | + | observation |
chr2 | 60717514 | 60717534 | GGAGTTGGATTGTGGAGCTC | + | observation |
chr2 | 60717522 | 60717542 | AGAAAATTGGAGTTGGATTG | - | negative_control |
chr2 | 60717529 | 60717549 | CTGGAATAGAAAATTGGAGT | + | positive_control |
Required Column Names:
- Chr - Chromosome
- Start - sgRNA Start Genomic Coordinate
- Stop - sgRNA Start Genomic Coordinate
- sgRNA_Sequence - sgRNA sequence not including PAM sequence
- Strand - Targeting strand of the sgRNA
- sgRNA_Type - Label for sgRNA type (observation, negative_control, positive_control)
Example CRISPR-SURF Count on Canver et al. 2015 for Option (1)
The following command will run CRISPR-SURF Count for Option (1) on provided example data:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_count -f /SURF/command_line/exampleDataset/sgRNA_library_file.csv -control_fastqs /SURF/command_line/exampleDataset/rep1_neg.fastq.gz /SURF/command_line/exampleDataset/rep2_neg.fastq.gz -sample_fastqs /SURF/command_line/exampleDataset/rep1_pos.fastq.gz /SURF/command_line/exampleDataset/rep2_pos.fastq.gz -nuclease cas9 -pert indel
Running CRISPR-SURF Count Option (1) Yourself
Place the sgRNA library file and FASTQs in the same directory. The control FASTQs represent the sgRNA distribution prior to selection, while the sample FASTQs represent the sgRNA distribution following selection. Assuming the sgRNA library file is named sgRNA_library_file.csv
, the FASTQs (2 replicates) are named rep1_control.fastq
, rep2_control.fastq
, rep1_sample.fastq
, rep2_sample.fastq
, and it's a CRISPR-Cas9 tiling screen, the command-line call would look like:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_count -f sgRNA_library_file.csv -control_fastqs rep1_control.fastq rep2_control.fastq -sample_fastqs rep1_sample.fastq rep2_sample.fastq -nuclease cas9 -pert indel
Simply change -pert indel
to -pert crispri
or -pert crispra
for CRISPRi and CRISPRa screens, respectively.
IMPORTANT: The number of control FASTQs must equal the number of sample FASTQs. If a single control FASTQ (i.e. plasmid count) is used for multiple sample FASTQs, just enumerate the -control_fastqs
option with the same single control FASTQ.
sgRNA Library File Format Example (.CSV):
Chr | Start | Stop | sgRNA_Sequence | Strand | sgRNA_Type | Replicate1_Control_Count | Replicate2_Control_Count | Replicate1_Sample_Count | Replicate2_Sample_Count |
---|---|---|---|---|---|---|---|---|---|
chr2 | 60717499 | 60717519 | AGCTCTGGAATGATGGCTTA | - | observation | 322 | 615 | 131 | 403 |
chr2 | 60717506 | 60717526 | ATTGTGGAGCTCTGGAATGA | + | observation | 365 | 812 | 448 | 227 |
chr2 | 60717514 | 60717534 | GGAGTTGGATTGTGGAGCTC | + | observation | 86 | 169 | 13 | 129 |
chr2 | 60717522 | 60717542 | AGAAAATTGGAGTTGGATTG | - | negative_control | 1823 | 381 | 1923 | 321 |
chr2 | 60717529 | 60717549 | CTGGAATAGAAAATTGGAGT | + | positive_control | 54 | 124 | 355 | 521 |
Required Column Names:
- Chr - Chromosome
- Start - sgRNA Start Genomic Coordinate
- Stop - sgRNA Start Genomic Coordinate
- sgRNA_Sequence - sgRNA sequence not including PAM sequence
- Strand - Targeting strand of the sgRNA
- sgRNA_Type - Label for sgRNA type (observation, negative_control, positive_control)
- Replicate1_Control_Count - sgRNA Count in Replicate 1 Control FASTQ (pre-selection)
- Replicate2_Control_Count - sgRNA Count in Replicate 2 Control FASTQ (pre-selection)
- Replicate1_Sample_Count - sgRNA Count in Replicate 1 Sample FASTQ (post-selection)
- Replicate2_Sample_Count - sgRNA Count in Replicate 2 Sample FASTQ (post-selection)
IMPORTANT: Minimum of two experimental replicates are needed. Additional columns (ReplicateN_Control_Count, ReplicateN_Sample_Count) can be included for more experimental replicates.
Example CRISPR-SURF Count on Canver et al. 2015 for Option (2)
The following command will run CRISPR-SURF Count for Option (2) on provided example data:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_count -f /SURF/command_line/exampleDataset/sgRNA_library_file_w_counts.csv -nuclease cas9 -pert indel
Running CRISPR-SURF Count Option (2) Yourself
Go into the directory where the sgRNA library file is located. Assuming the sgRNA library file with counts is named sgRNA_library_file_w_counts.csv
and it's a CRISPR-Cas9 tiling screen, the command-line call would look like:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_count -f sgRNA_library_file_w_counts.csv -nuclease cas9 -pert indel
Simply change -pert indel
to -pert crispri
or -pert crispra
for CRISPRi and CRISPRa screens, respectively.
IMPORTANT: Additional ReplicateN_Control_Count and ReplicateN_Sample_Count columns can be added depending on the number of replicates used in the experiment. The number of ReplicateN_Control_Count columns must equal ReplicateN_Sample_Count columns. If a single control column (i.e. plasmid count) is used for multiple sample counts, just duplicate the single control column with the appropriate column names.
The CRISPR-SURF Deconvolution command-line tool takes sgRNAs_summary_table.csv
(generated from CRISPR-SURF Count) as input. File requirements are stated below.
Required Column Names:
- Chr - Chromosome
- Start - sgRNA Start Genomic Coordinate
- Stop - sgRNA Start Genomic Coordinate
- sgRNA_Sequence - sgRNA sequence not including PAM sequence
- Strand - Targeting strand of the sgRNA
- sgRNA_Type - Label for sgRNA type (observation, negative_control, positive_control)
- Log2FC_Replicate1 - Replicate 1 Log2FC enrichment score of sgRNA
- Log2FC_Replicate2 - Replicate 2 Log2FC enrichment score of sgRNA
IMPORTANT: Minimum of two experimental replicates are needed. Additional columns (Log2FC_ReplicateN) can be included for more experimental replicates.
Run CRISPR-SURF Deconvolution in the terminal with the command:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_deconvolution [options]
Users can specify the following options:
-f, --sgRNAs_summary_table
Input sgRNAs summary table. Direct output of CRISPR-SURF Count. (Required)
-pert, --perturbation_type
The CRISPR perturbation type used in the tiling experiment. (Options: cas9, cpf1, crispri, crispra | Required)
-range, --characteristic_perturbation_range
Characteristic perturbation length. If 0 (default), the -pert argument will be used to set an appropriate perturbation range. (Default: 0)
-scale, --scale
Scaling factor to efficiently perform deconvolution with negligible consequences. If 0 (default), the -range argument will be used to set an appropriate scaling factor. (Default: 0)
-limit, --limit
Maximum distance between two sgRNAs to perform inference on bp in-between. Sets the boundaries of the gaussian profile to perform efficient deconvolution. If 0 (default), the -pert argument will be used to set an appropriate limit. (Default: 0)
-avg, --averaging_method
The averaging method to be performed to combine biological replicates. (Options: mean, median | Default: median)
-null_dist, --null_distribution
The method of building a null distribution for each smoothed beta score. (Options: negative_control, gaussian, laplace | Default: gaussian)
-sim_n, --simulation_n
The number of simulations to perform for construction of the null distribution. (Default: 1000)
-test_type, --test_type
Parametric or non-parametric test for betas. (Options: parametric, nonparametric | Default: parametric)
-lambda_list, --lambda_list
List of lambdas (regularization parameter) separated by spaces to use during the deconvolution step (i.e. 1 2 3 4 5 6 7 8 9 10). If 0 (default), the -pert argument will be used to set a reasonable lambda list. (Default: 0)
-lambda_val, --lambda_val
The lambda value to be used during the deconvolution step. If 0 (default), the -lambda_list argument will be used. (Default: 0)
-corr, --correlation
The Pearson's r correlation coefficient between biological replicates to determine a reasonable lambda for the deconvolution operation. If 0 (default), the -range argument will be used to set an appropriate correlation. (Default: 0)
-genome, --genome
The genome to be used to create the IGV session file. (Options: hg19, hg38, mm9, mm10, etc. | Default: hg19)
-effect_size, --effect_size
Effect size to estimate statistical power. (Default: 1)
-estimate_power, --estimate_statistical_power
Whether or not to compute a track estimating statistical power for the CRISPR tiling screen data. (Options: yes, no | Default: no)
-padjs, --padj_cutoffs
List of p-adj. (Benjamini-Hochberg) cut-offs separated by spaces for determining significance of regulatory regions in the CRISPR tiling screen (i.e. 0.05 0.01 0.001 0.0001). (Default: 0.05 0.01 0.001 0.0001)
-out_dir, --out_directory
The name of the output directory to place CRISPR-SURF analysis files. (Default: CRISPR_SURF_Analysis_TIMESTAMP)
Example CRISPR-SURF Deconvolution on Canver et al. 2015
The following command will run CRISPR-SURF analysis on provided example data:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_deconvolution -f /SURF/command_line/exampleDataset/sgRNAs_summary_table.csv -pert cas9
Running CRISPR-SURF Deconvolution Yourself
Go into the directory where the sgRNAs summary table is located. Assuming the sgRNAs summary table is named sgRNAs_summary_table.csv
and it's a CRISPR-Cas9 tiling screen, the command-line call would look like:
docker run -v ${PWD}/:/DATA -w /DATA pinellolab/crisprsurf SURF_deconvolution -f sgRNAs_summary_table.csv -pert cas9
Simply change -pert cas9
to -pert crispri
or -pert crispra
for CRISPRi and CRISPRa screens, respectively.
1. sgRNAs_summary_table_updated.csv: An updated sgRNAs summary table with deconvolution and p-adj. values.
2. igv_session.xml: An IGV session for the following tracks
- raw_scores.bedgraph - sgRNA enrichment scores
- deconvolved_scores.bedgraph - deconvolution beta profile
- positive_significant_regions.bed - positive significant regions at set FDR
- negative_significant_regions.bed - negative significant regions at set FDR
- neglog10_pvals.bedgraph - negative log10 p-values for betas
- statistical_power.bedgraph - statistical power track at set effect size and FDR (
-estimate_power yes
)
3. significant_regions.csv: List of the significant regions and its associated statistics and supporting sgRNAs.
4. beta_profile.csv: Full deconvolution beta profile with associated statistics.
5. correlation_curve_lambda.csv: The correlation curve generated for determining lambda.
6. crispr-surf_parameters.csv: The CRISPR-SURF analysis parameters used during the analysis session.
7. crispr-surf.log: The log file for CRISPR-SURF analysis.
In order to make CRISPR-SURF more user-friendly and accessible, we have created an interactive website: http://crisprsurf.pinellolab.org. The website implements all the features of the CRISPR-SURF command-line tool (except CRISPR-SURF Count) and, in addition, provides interactive and exploratory plots to visualize your CRISPR tiling screen data.
The website offers two functions: 1) Running CRISPR-SURF on data provided by the user and 2) Visualizing CRISPR-SURF analysis on several published data sets, serving as the first database dedicated to CRISPR tiling screen data. There is a 10,000 sgRNA limitation for analysis with the web application due to server capacity. Analysis of CRISPR tiling screen data with >10,000 sgRNAs requires the use of the command-line tool or provided Docker image.
The web application can also run on a local machine using the provided Docker image we have created. To run the website on a local machine after the Docker installation, execute the following command from the command line:
docker run -p 9993:9993 pinellolab/crisprsurf SURF_webapp
After execution of the command, the user will have a local instance of the website accessible at the URL: http://localhost:9993