Skip to content

Documentation related to the bug resulting in drop of specificity/precision in Mutect2 after v.4.1.8.1

License

Notifications You must be signed in to change notification settings

ddrichel/Mutect2_calling_performance_bug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mutect2 calling performance bug

Documentation related to the bug resulting in drop of specificity/precision in Mutect2 between GATK v.4.1.8.1 and v4.1.9.0.

Description

The bug was discovered by investigating a strong drop in precision for SNVs (~10% for tumor-normal mode, ~20% in tumor-only mode) between GATK releases 4.1.7.0 and 4.2.6.0.

With more testing using the HCC1395 somatic benchmark (https://pubmed.ncbi.nlm.nih.gov/34504347/) and sequencing data provided by the Somatic Mutation Working Group (Fudan University WES tumor-normal data set, 2 x 100x coverage), the drop in performance can be traced to changes between 4.1.8.1 and 4.1.9.0.

The reads were aligned to GRCh38.primary_assembly.genome: http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_33/GRCh38.primary_assembly.genome.fa.gz

The alignment was performed using bwa: bwa mem -M -O 6 -A 1 -B 4 -t 7 -E 1 -k 19 with a subsequent run of picard MarkDuplicates.

The calling was done with essentially default parameters:

tools/gatk-${version}/gatk Mutect2  --normal-sample WES_FD_N --output $outvcf --intervals $wesbed  --interval-padding 0 --input $inbam_t --input $inbam_n --reference $ref
tools/gatk-${version}/gatk FilterMutectCalls --output ${outvcf%.vcf}_filtered.vcf --variant $outvcf --intervals $wesbed --reference $ref --stats ${outvcf}.stats --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0

som.py was used for calculating performance metrics.

The performance metrics for selected GATK releases:

FD_TN_4170_filter_FD_TN_4181_filter_FD_TN_4190_filter_FD_TN_4200_filter_FD_TN_4260_filter

Curiously, we do not observe such a substantial drop in precision in WGS data, neither in tumor-only nor in tumor-normal mode. In the foillowing, our "v04" corresponds to GATK 4.1.7.0 and out "v05" corresponds to GATK 4.2.6.0:

Tumor-normal:

WGS_FD_tumor-normal_reference_workflow_v04_WGS_FD_tumor-normal_reference_workflow_v05

Tumor-only:

WGS_FD_tumor_reference_workflow_v04_WGS_FD_tumor_reference_workflow_v05

Reproducing the bug

For tumor-normal WES analysis, aligned reads are available from Google Drive:

Normal tissue:

Tumor tissue:

Calling was performed with Mutect2 from various releases of GATK, for v4.1.8.1 and v4.1.9.0 specifically using the commands:

wesbed=intervals/S07604624_Covered_human_all_v6_plus_UTR.liftover.to.hg38_merged_allowed_contigs_intersect_HighConfidence.bed
outvcf=vcfs/WES_FD_TN_4181.vcf.gz
$tools/gatk-4.1.8.1/gatk Mutect2  --normal-sample WES_FD_N --output $outvcf  --intervals $wesbed   --interval-padding 0 --input $inbam_t --input $inbam_n --reference $ref
$tools/gatk-4.1.8.1/gatk FilterMutectCalls  --output ${outvcf%.vcf.gz}_filtered.vcf.gz  --variant $outvcf --intervals $wesbed --reference $ref --stats ${outvcf}.stats --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0
wesbed=intervals/S07604624_Covered_human_all_v6_plus_UTR.liftover.to.hg38_merged_allowed_contigs_intersect_HighConfidence.bed
outvcf=vcfs/WES_FD_TN_4190.vcf.gz
$tools/gatk-4.1.9.0/gatk Mutect2  --normal-sample WES_FD_N --output $outvcf  --intervals $wesbed   --interval-padding 0 --input $inbam_t --input $inbam_n --reference $ref
$tools/gatk-4.1.9.0/gatk FilterMutectCalls  --output ${outvcf%.vcf.gz}_filtered.vcf.gz  --variant $outvcf --intervals $wesbed --reference $ref --stats ${outvcf}.stats --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0

The output as vcf files are available here:

The combined (SNVs and INDELs) benchmark reference benchmark/high-confidence_sSNV_sINDEL_in_HC_regions_v1.2.vcf.gz was created using src/get_benchmark_calls.sh.

For reproducing performance metrics callsets_performance/WES_FD_TN_4181_filter_som_py.stats.csv and callsets_performance/WES_FD_TN_4190_filter_som_py.stats.csv, change to src/ directory and:

To generate the following figure, change to src/ and execute plot_performance_metrics.sh.

figures/WES_FD_TN_4181_filter_WES_FD_TN_4190_filter.png

Original bug report

See broadinstitute/gatk#7921

About

Documentation related to the bug resulting in drop of specificity/precision in Mutect2 after v.4.1.8.1

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published