Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FilterMutectCalls crashes when vcf is empty (when running on a sge cluster run) #2832

Closed
waemm opened this issue May 24, 2019 · 4 comments
Closed

Comments

@waemm
Copy link

waemm commented May 24, 2019

Hi Brad et al,

When running bcbio on an sge cluster I get the following error. It looks like this is due to the VCF file being empty for one of the small chromosome scaffolds "chr14_GL000009v2_random". This issue is similar to #2829 but I'm currently running on the hg38 build with mostly default parameters.

Details below, thanks again for a fantastic tool!
Warren

command:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms681m -Xmx6818m -XX:+UseSerialGC -Djava.io.tmpdir=/shared/pipeline-user/test_data/cll_5_tumor/work/bcbiotx/tmpl2aen8h4 -jar /shared/pipeline-user/bcbio/anaconda/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar FilterMutectCalls --reference /shared/pipeline-user/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa --variant /shared/pipeline-user/test_data/cll_5_tumor/work/bcbiotx/tmpl2aen8h4/CLL006_tumor-chr14_GL000009v2_random_0_122919-raw.vcf.gz --output /shared/pipeline-user/test_data/cll_5_tumor/work/bcbiotx/tmpl2aen8h4/CLL006_tumor-chr14_GL000009v2_random_0_122919-raw-filt.vcf.gz

some of the output:
12:40:05.745 INFO FilterMutectCalls - Done initializing engine 12:40:05.907 INFO ProgressMeter - Starting traversal 12:40:05.907 INFO ProgressMeter - Current Locus Elapsed Minutes Variants Processed Variants/Minute 12:40:05.909 INFO FilterMutectCalls - Starting pass 0 through the variants 12:40:05.980 INFO FilterMutectCalls - Finished pass 0 through the variants 12:40:06.005 INFO FilterMutectCalls - Starting pass 1 through the variants 12:40:06.017 INFO FilterMutectCalls - Shutting down engine [May 24, 2019 12:40:06 PM UTC] org.broadinstitute.hellbender.tools.walkers.mutect.filtering.FilterMutectCalls done. Elapsed time: 0.02 minutes. Runtime.totalMemory()=691404800 java.lang.IllegalArgumentException: log10 p: Values must be non-infinite and non-NAN

YAML:
`details:

  • algorithm:
    aligner: bwa
    ensemble:
    numpass: 2
    aligner: bwa
    ensemble:
    numpass: 2
    mark_duplicates: true
    recalibrate: true
    remove_lcr: true
    svcaller:
    • cnvkit
    • lumpy
    • manta
      tools_on:
    • gemini
    • damage_filter
      variant_regions: /shared/pipeline-user/bcbio/genomes/Hsapiens/hg38/coverage/capture_regions/Exome-NGv3.bed
      variantcaller:
    • mutect2
    • vardict
      vcfanno:
    • gemini
    • somatic
      analysis: variant2
      description: CLL006_tumor
      files:
    • /shared/pipeline-user/test_data/input/ERR315865_1.fastq.gz
    • /shared/pipeline-user/test_data/input/ERR315865_2.fastq.gz
      genome_build: hg38
      metadata:
      phenotype: tumor`
@roryk
Copy link
Collaborator

roryk commented May 24, 2019

Thanks, I know this is stupid but do you think you could pass on the empty VCF file so I can test and fix this? It is helpful to have the actual failing file so I can definitely reproduce.

@waemm
Copy link
Author

waemm commented May 24, 2019

CLL006_tumor-chr14_GL000009v2_random_0_122919-raw.vcf.gz

Hi @roryk , sorry for multiple edits, is there a way around this bug ? Will using "--force-single" prevent the pipeline from splitting jobs into chromosomes ? Thanks.

Here is the file and the two commands I used:

mkdir -p /shared/pipeline-user/test_data/cll_5/work/bcbiotx/tmpjgip15hg

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms681m -Xmx6818m -XX:+UseSerialGC -Djava.io.tmpdir=/shared/pipeline-user/test_data/cll_5/work/bcbiotx/tmpjgip15hg -jar /shared/pipeline-user/bcbio/anaconda/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar Mutect2 --annotation ClippingRankSumTest --annotation DepthPerSampleHC --reference /shared/pipeline-user/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa --annotation MappingQualityRankSumTest --annotation MappingQualityZero --annotation QualByDepth --annotation ReadPosRankSumTest --annotation RMSMappingQuality --annotation FisherStrand --annotation MappingQuality --annotation DepthPerAlleleBySample --annotation Coverage --read-validation-stringency LENIENT -I /shared/pipeline-user/test_data/cll_5/work/align/CLL006_tumor/CLL006_tumor-sort-recal.bam --tumor-sample CLL006_tumor -L /shared/pipeline-user/test_data/cll_5/work/mutect2/chr14_GL000009v2_random/CLL006_tumor-chr14_GL000009v2_random_0_122919-regions-nolcr.bed --interval-set-rule INTERSECTION -O /shared/pipeline-user/test_data/cll_5/work/bcbiotx/tmpjgip15hg/CLL006_tumor-chr14_GL000009v2_random_0_122919-raw.vcf.gz

java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xms681m -Xmx6818m -XX:+UseSerialGC -Djava.io.tmpdir=/shared/pipeline-user/test_data/cll_5/work/bcbiotx/tmpjgip15hg -jar /shared/pipeline-user/bcbio/anaconda/share/gatk4-4.1.2.0-1/gatk-package-4.1.2.0-local.jar FilterMutectCalls --reference /shared/pipeline-user/bcbio/genomes/Hsapiens/hg38/seq/hg38.fa --variant /shared/pipeline-user/test_data/cll_5/work/bcbiotx/tmpjgip15hg/CLL006_tumor-chr14_GL000009v2_random_0_122919-raw.vcf.gz --output /shared/pipeline-user/test_data/cll_5/work/bcbiotx/tmpjgip15hg/CLL006_tumor-chr14_GL000009v2_random_0_122919-raw-filt.vcf.gz

Also, I have had some similar issues with other samples in this project, sometimes they seem to work when I run them locally but in this case and a few others they always crash the pipeline. Weird bug!

@chapmanb
Copy link
Member

chapmanb commented May 25, 2019

Warren;
Thanks much for this report and apologies about the issue. This looks related to #2829 and we were able to reproduce when the MuTect2 associated .stats file has 0 callable reads. The latest development has a workaround which avoid this by flooring this stat at 1. It also leaves the intermediate files in the work directory (instead of transactional directory) to make it easier to debug if it still fails. If you still have issues after updating please share the input *.vcf.gz and *.vcf.gz.stats files to FilterMutectCalls and we can investigate more. Thanks again and hope this gets your analysis done.

@waemm
Copy link
Author

waemm commented May 28, 2019

Much appreciated @chapmanb and @roryk for your work on this! Run completed with no problems!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants