This is a documentation for the Slide-tags pipeline (Snakemake-based). It is designed to run on both cluster (UGE 8.5.5) and local environment, and is able to process the data from raw BCL files to spatial analysis.
Edit main_env.yml
and cellbender_env.yml
files located in workflow/config
folder.
Change the name
and prefix
to the path of the conda environment you want to create.
conda env create -f workflow/config/main_env.yml
conda env create -f workflow/config/cellbender_env.yml
The pipeline is structured as follows:
-
env
Folder to store the conda environments installed for the pipeline. -
pkgs
Folder to store the packages used in the pipeline, including:- bcl2fastq2_v2.20.0
- CellBender-0.2.2
- CellBender-0.3.0
- cellranger-7.2.0
- cellranger-8.0.1
- cellranger-arc-2.0.2
- cellranger-atac-2.1.0
- google-cloud-sdk
-
reference
Folder to store the reference genome data used in the pipeline, including:- refdata-arc-GRCh38-2020-A
- refdata-gex-GRCh38-2024-A
- refdata-arc-mm10-2020-A
- refdata-gex-GRCm39-2024-A
- refdata-gex-mm10-2020-A
- refdata-cellranger-vdj-GRCh38
- refdata-cellranger-vdj-GRCm38
- custom_refdata (folder for custom reference data)
- Index_Info (folder for the index information from 10x)
- probesets (folder for the custom probesets)
-
workflow
Download theworkflow
from this github repository. -
data
Each run will create a folder named byBCL
, which contains the following subfolders:fastq
Contains FASTQ data, generated bycellranger mkfastq
orbcl2fastq
, organized by RNA/SB/ATAC Index.count
Contains the count data, generated bycellranger
orcellbender
, organized by RNA/ATAC Index.outs
cellbender_outs
spatial
ContainsSBcounts.h5
andseurat.qs
files, organized by RNA/SB Index.Positions
SBcounts
log
Orginized by the sequence of run, current run_folder recorded inworking
.input
Contains the metadata files for cellranger etc.main
Contains log files for parsing google sheet and Cluster.mkfastq_logs
Split by Lane number.counts_logs
Split by RNA/ATAC Index.spatial_logs
Split by RNA/SB Index.
CLUSTER_PATH
Path to the cluster bin (currently using UGE 8.5.5); set blank if running locally.CONDA_PATH
Path to the conda bin.ENV_PATH
Path to the main conda environment for python, julia, and R.PKG_PATH
Path to the package folder for the pipeline.BASE_DATA_PATH
Path to the store processed data by pipeline, including outputs ofmkfastq
,RNAcounts
etc.BCL_MAIN_PATH
Path to the bcl data; use it as default main path for input BCLs in google sheet.WORKFLOW_PATH
Path toworkflow
folder, DO NOT containworkflow
in the path.GOOGLE_SHEET_ID
Google sheet id for the sample metadata.
Put yourgoogle_key.json
file in theworkflow/config
folder.
Here is a Google sheet demo.GOOGLE_CLOUD_BUCKET
Google cloud bucket to store the fastq and bam files.PUCK_PATH
Path to the puck coordiante csv files .PUCK_IN
Path to the slide-seq puck barcode files.REF_PATH
Path to the reference genome data.
CHUNK_SIZE_MKFASTQ
= 1CHUNK_SIZE_RNACOUNTS
= 1CHUNK_SIZE_CELLBENDER
= 1CHUNK_SIZE_SBCOUNT
= 10CHUNK_SIZE_POSITION
= 10
mem_gb: 128
disk_mb: 8192
runtime_min: 60*60*3
threads: 68
Export workflow path to the PATH:
echo "export PATH="$PATH:/Path/to/slidetag/workflow"" >> ~/.bashrc
source ~/.bashrc
Run pipeline after filling sample information in the google sheet:
slidetag_pipe.sh -bcl bcl_name -ra
Get help message:
slidetag_pipe.sh -h
Usage: Slidetag Pipeline [options]
Required:
-bcl [value] Input BCL name.
Options:
-h Display this help message.
-ra, -run_all Run mkfastq, RNAcounts, SBcounts and Spatial analysis.
-mk, -run_mkfastq Run cellranger mkfastq or bcl2fastq.
-cr, -run_RNAcounts Run cellranger count or cellranger arc.
-cb, -run_cellbender Run cellbender based on Cellranegr count results.
-sb, -run_SBcounts Run Spatial beads counts.
-sp, -run_spatial Run Spatial analysis for cell positionings.
-us, -use_sheet Get input sheets form the current working run.
-mv, -mv_file Move results to store path.
-gb, -generate_bam Generate bams when running Cellranger.
-uf, -upload_fastq Upload fastqs to google bucket.
-ub, -upload_bam Upload bams to google bucket.
-rf, -rm_fastq Remove local fastqs.
-rb, -rm_bam Remove local bams.
-df, -download_fastq Download fastqs from google bucket.
-db, -download_bam Download bams from google bucket.
-f, -force Force to re-run selected jobs.
-ec [value], -expected_cells From cellbender parameters.
-td [value], -total_droplets_included
FALSE or NONE at default for the above parameters.