This is a subworkflow that is part of the Kids First DRC Somatic Variant Workflow that can be run as standalone. Strelka2 is a variant caller that calls single nucleotide variants and small insertions and deletions.
This subworkflow does the following things as described below:
- Run the Strelka2 variant caller tool
- Merge the SNV and Indel results
- Reheader merged VCF with Sample IDs provided
- Hard filter resultant VCF on
- Annotate the
VCF using the annotation sub workflow - most relevant information is here! - Rename outputs to fit a standard format
This workflow runs Strelka2 v2.9.3 which calls single nucleotide variants (SNV) and insertions/deletions (INDEL).
indexed_reference_fasta: {type: File, secondaryFiles: [.fai, ^.dict]}
reference_dict: File
hg38_strelka_bed: {type: File, secondaryFiles: ['.tbi']}
manta_small_indels: {type: File?, secondaryFiles: ['.tbi']}
use_manta_small_indels: {type: boolean?, default: false}
type: File
secondaryFiles: |
var dpath = self.location.replace(self.basename, "")
if(self.nameext == '.bam'){
return {"location": dpath+self.nameroot+".bai", "class": "File"}
return {"location": dpath+self.basename+".crai", "class": "File"}
doc: "tumor BAM or CRAM"
input_tumor_name: string
type: File
secondaryFiles: |
var dpath = self.location.replace(self.basename, "")
if(self.nameext == '.bam'){
return {"location": dpath+self.nameroot+".bai", "class": "File"}
return {"location": dpath+self.basename+".crai", "class": "File"}
doc: "normal BAM or CRAM"
input_normal_name: string
exome_flag: {type: ['null', string], doc: "set to 'Y' for exome mode"}
output_basename: string
select_vars_mode: {type: ['null', {type: enum, name: select_vars_mode, symbols: ["gatk", "grep"]}], doc: "Choose 'gatk' for SelectVariants tool, or 'grep' for grep expression", default: "gatk"}
tool_name: {type: string?, doc: "String to describe what tool was run as part of file name", default: "strelka2_somatic"}
# VEP params
vep_cache: {type: 'File', doc: "tar gzipped cache from ensembl/local converted cache", "sbg:suggestedValue": {class: File, path: 6332f8e47535110eb79c794f,
name: homo_sapiens_merged_vep_105_indexed_GRCh38.tar.gz}}
vep_ram: {type: 'int?', default: 32, doc: "In GB, may need to increase this value depending on the size/complexity of input"}
vep_cores: {type: 'int?', default: 16, doc: "Number of cores to use. May need to increase for really large inputs"}
vep_buffer_size: {type: 'int?', default: 1000, doc: "Increase or decrease to balance speed and memory usage"}
dbnsfp: { type: 'File?', secondaryFiles: [.tbi,^.readme.txt], doc: "VEP-formatted plugin file, index, and readme file containing dbNSFP annotations" }
dbnsfp_fields: { type: 'string?', doc: "csv string with desired fields to annotate. Use ALL to grab all"}
merged: { type: 'boolean?', doc: "Set to true if merged cache used", default: true }
cadd_indels: { type: 'File?', secondaryFiles: [.tbi], doc: "VEP-formatted plugin file and index containing CADD indel annotations" }
cadd_snvs: { type: 'File?', secondaryFiles: [.tbi], doc: "VEP-formatted plugin file and index containing CADD SNV annotations" }
run_cache_existing: { type: 'boolean?', doc: "Run the check_existing flag for cache" }
run_cache_af: { type: 'boolean?', doc: "Run the allele frequency flags for cache" }
# annotation vars
genomic_hotspots: { type: 'File[]?', doc: "Tab-delimited BED formatted file(s) containing hg38 genomic positions corresponding to hotspots", "sbg:suggestedValue": [{class: File, path: 607713829360f10e3982a423, name: tert.bed}] }
protein_snv_hotspots: { type: 'File[]?', doc: "Column-name-containing, tab-delimited file(s) containing protein names and amino acid positions corresponding to hotspots", "sbg:suggestedValue": [{class: File, path: 66980e845a58091951d53984, name: kfdrc_protein_snv_cancer_hotspots_20240718.txt}] }
protein_indel_hotspots: { type: 'File[]?', doc: "Column-name-containing, tab-delimited file(s) containing protein names and amino acid position ranges corresponding to hotspots", "sbg:suggestedValue": [{class: File, path: 663d2bcc27374715fccd8c6f, name: protein_indel_cancer_hotspots_v2.ENS105_liftover.tsv}] }
retain_info: {type: 'string?', doc: "csv string with INFO fields that you want to keep", default: "gnomad_3_1_1_AC,gnomad_3_1_1_AN,gnomad_3_1_1_AF,gnomad_3_1_1_nhomalt,gnomad_3_1_1_AC_popmax,gnomad_3_1_1_AN_popmax,gnomad_3_1_1_AF_popmax,gnomad_3_1_1_nhomalt_popmax,gnomad_3_1_1_AC_controls_and_biobanks,gnomad_3_1_1_AN_controls_and_biobanks,gnomad_3_1_1_AF_controls_and_biobanks,gnomad_3_1_1_AF_non_cancer,gnomad_3_1_1_primate_ai_score,gnomad_3_1_1_splice_ai_consequence,MBQ,TLOD,HotSpotAllele"}
retain_fmt: {type: 'string?', doc: "csv string with FORMAT fields that you want to keep"}
retain_ann: { type: 'string?', doc: "csv string of annotations (within the VEP CSQ/ANN) to retain as extra columns in MAF", default: "HGVSg" }
echtvar_anno_zips: {type: 'File[]?', doc: "Annotation ZIP files for echtvar anno"}
bcftools_strip_columns: {type: 'string?', doc: "csv string of columns to strip if needed to avoid conflict, i.e INFO/AF"}
bcftools_public_filter: {type: 'string?', doc: "Will hard filter final result to create a public version", default: FILTER="PASS"|INFO/HotSpotAllele=1}
gatk_filter_name: {type: 'string[]', doc: "Array of names for each filter tag to add, recommend: [\"NORM_DP_LOW\", \"GNOMAD_AF_HIGH\"]"}
gatk_filter_expression: {type: 'string[]', doc: "Array of filter expressions to establish criteria to tag variants with. See, recommend: \"vc.getGenotype('\" + inputs.input_normal_name + \"').getDP() <= 7\"), \"gnomad_3_1_1_AF != '.' && gnomad_3_1_1_AF > 0.001\"]"}
disable_hotspot_annotation: { type: 'boolean?', doc: "Disable Hotspot Annotation and skip this task.", default: false }
maf_center: {type: 'string?', doc: "Sequencing center of variant called", default: "."}
Recommended reference inputs - all file references can be obtained here*
Secondary files needed for each reference file will be a sub-bullet point.
- For recommendations for inputs in the
section, see the annotation subworkflow docs.
if exomeN
or leave blank if WGS
strelka2_prepass_vcf: {type: File, outputSource: rename_strelka_samples/reheadered_vcf}
strelka2_protected_outputs: {type: 'File[]', outputSource: rename_protected/renamed_files}
strelka2_public_outputs: {type: 'File[]', outputSource: rename_public/renamed_files}
: Combined SNV + INDEL file with renamed Sample IDs. Has all softFILTER
values generated by variant caller. Use this file if you believe important variants are being left out when using the algorithm'sPASS
: Array of files containing MAF format of PASS hits,PASS
VCF with annotation pipeline softFILTER
-added values, and VCF indexstrelka2_public_outputs
: Same as above, except MAF and VCF have had entries with softFILTER
values removed