The workflow script that runs the tools is workflows/kf_single_cell_10x_refinement.cwl
SoupX is used for subtraction of the RNA background. scDblFinder is used to score and predict doublets. Decontaminated outputs are aggregated using the Seurat R package from the Satija lab at the New York Genome Center. Original workflow design heavily contributed to by Erin Reichenbee of DBHi.
- SoupX 1.6.2
- scDblFinder 1.12.0
- Seurat 4.3.0.1
- SeuratObject 4.1.3
- tidyverse docker base 4.2.3
output_basename
: basename used to name output filessample_name
: used as prefix for finding fastqs to analyze, e.g. 1k_PBMCs_TotalSeq_B_3p_LT_antibody if the names of the underlying fastqs are of the form 1k_PBMCs_TotalSeq_B_3p_LT_antibody_S1_L001_I1_001.fastq.gz, one per input fastq in the same order
counts_matrix_raw
: h5 format raw feature matrix file from Cellranger or equivalentcounts_matrix_filtered
: h5 format filtered feature matrix file from Cellranger or equivalentcounts_cluster
: CSV containing cluster information from Cellranger or equivalent if available
align_qc_rds
: Align QC file frm D3b 10X alignment workflowseurat_raw_rds
: Seurat raw rds file from D3b 10X alignment workflow
soupx_rplots
: PDF R plot made by soupXsoupx_rds
: R object with SoupX resultsscdblfinder_plot
: PDF cluster plots generated by scDblFinderscdblfinder_doublets
: TSV containing scoring matrix with doublets marked by scDblFinderscdblfinder_summary
: Summary stats of number and percent of doublets in libraryseurat_filtered_rds
: RDS file containing filtered 10X data based on Seurat QC (align workflow) SoupX and scDblFinder resultsseurat_filtered_summary
: TSV file summarizing number of cells removed at each step, if relevant