This workflow runs STARsolo with the appropriate parameters for the particular scRNA-seq technology. Currently, 'indrop_v2', '10x_v1', '10x_v2', '10x_v3' and 'cellseq192' are supported. Optionally, variants can be called suing the GATK RNA-seq workflow. Duplicates are identified using the UB tag for each cell barcode separately. Final variants are hard-filtered, annotated with SNPEff and passed through the R package, SNPRelate, for PCA, MDS and dendrograms.
-
Download the pipeline using
git clone [email protected]:vari-bbc/scRNAseq.git new_proj_dir_name
orgit clone https://github.com/vari-bbc/scRNAseq.git new_proj_dir_name
depending on if you have an SSH key set up with GitHub or not, respectively. -
Put fastq files or symlinks into 'raw_data/'.
-
Fill out 'samples.tsv':
- sample Sample name; If more than one row has the same sample name, they will be merged.
- fq1 R1 filename
- fq2 R2 filename
- RG Optional. If provided, read groups will not be inferred from fastq headers. Provide in the style specified for --outSAMattrRGline option in STAR. e.g. 'ID:zzz ”DS:z z”' or 'ID:yyy DS:yyyy'
-
Fill out 'bin/config.yaml' to indicate the location of index files, the scRNA-seq technology etc. See config file comments for more details.
For variant calling, set 'call_variants' to True. To variant call only a subset of the cell barcodes, specify only those barcodes in the 'sample_decoder' file. See config file for more info.
-
Run
sbatch bin/run_snakemake.sh
.
snakemake -l
: Print all the rules and a description of what it does.