TinDaisy2 is a CWL pipeline for calling somatic variants from tumor and normal whole genome and exome data. It is illustrated below
The following variant callers are the basis of TinDaisy2 variant calls:
- VarScan.v2.3.8: SNP and indel calls
- Strelka2 v2.9.10: SNP and indel calls
- Pindel (Sept 2018): indel calls only
- mutect-1.1.7: SNP calls only
The following filters are then applied:
- Normal and tumor VAF
- Indel length
- Normal and tumor read depth
Variant calls from all callers are then merged into one VCF file. Variants called by two or more callers are retained. Following merging, the following processing steps are applied
- Sequential SNPs on same haplotype merged into DNP, TNP, and QNP
- Annotation with VEP. Based on annotation, the following filters are applied
- Population allele frequency
- Variant classification (e.g., retain exon only)
- Presence in dbSnP database of common variants. Those in COSMIC and ClinVar databases optionally retained
- SNP variants in proximity to indels are excluded
Variants which fail a filter are retained in the VCF file but are flagged as having failed that filter.
Three output files are generated:
- Output VCF - contains all variants which were called by 2 or 3 callers.
- Clean VCF - contains only variants which passed all filters
- Clean MAF - MAF file corresponding to Clean VCF
The specific paramater values and database versions used are defined in processing description files with a specific pipeline version.
-
Version 2.7.0 - Adding QC of VEP output
-
Version 2.6.2 - Bugfix to VCF headers. Using updated VEP v99.
-
Version 2.6.1 - Adds
bypass_classification
parameter. Also introducing-ffpe
variant
VAF Rescue is an optional variant of the TinDaisy2 workflow which implements a position-aware VAF filter which applies different
parameters (min_vaf_tumor=0
) based on location as defined in a
BED file. Variants with tumor VAF > 0 are retained in regions
given by a Rescue BED file. This file may be specific to cancer types.
TinDaisy defines a CWL workflow and algorithms associated with it. Running it
requires a CWL workflow engine such as cwltool
,
Rabix Executor
, or Cromwell
.
Install TinDaisy with,
git clone --recurse-submodules https://github.com/ding-lab/TinDaisy.git
Examples of how to run workflows using Cromwell are provided in the
./testing
diretory. Production runs are performed using
CromwellRunner,
a simple workflow manager which allows for command-line driven management of jobs in
a Cromwell workflow engine environment, developed primarily for the Wash U RIS and MGI systems.
Configuration of TinDaisy2 is through a YAML configuration file. A template of such
a file is in cwl/workflows/tindaisy2.template.yaml
, and example configuration
files for the MGI and compute1 systems can be found in testing/cromwell-simple/yaml
.
Examples for running TinDaisy2 in the Wash U MGI and compute1 environments can be found in
the testing/cromwell-simple
directory.
TinDaisy relies on a number of modules to implement the various filtering and processing steps. In most cases, each module provides the underlying algorithms, tools to generate the docker image, and CWL definitions needed to implement it. The TinDaisy module also defines CWL tools whose algorithms are implemented in the TinDaisy-Core project.
Modules used by TinDaisy2 are,