Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reported at the GATHERPILEUPSUMMARIES step causing the workflow to abort #1094

Closed
Tracked by #1096
jaybee84 opened this issue Jun 9, 2023 · 5 comments
Closed
Tracked by #1096
Assignees
Labels
bug Something isn't working

Comments

@jaybee84
Copy link

jaybee84 commented Jun 9, 2023

Description of the bug

While running sarek v3.2.1 with Mutect2 on two batches of WES files, I am repeatedly getting the following error:

Error executing process > 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:GATHERPILEUPSUMMARIES_NORMAL (XX-A)'
.
.
.
A USER ERROR has occurred: Bad input: format error in 'XX-A.mutect2.pileups.table' at line 0: premature end of table: header line not found

For one batch this error occurs for the same sample every time I ran the workflow, for the other batch this error occurs for different samples in different runs.

Command used and terminal output

nextflow run nf-core/sarek
--input samplesheet.csv 
--wes true 
--igenomes_base sage-igenomes/igenomes
--genome GATK.GRCh38
--tools strelka,mutect2
--intervals custom_withChr_GRCh38_sorted.bed
--outdir ./batch1/

Relevant files

No response

System information

No response

@jaybee84 jaybee84 added the bug Something isn't working label Jun 9, 2023
@jaybee84
Copy link
Author

jaybee84 commented Jun 9, 2023

I did some investigations and found that the error is mainly being caused by the GATHERPILEUPSUMMARIES process overwriting the input file.

The input file to GATHERPILEUPSUMMARIES has the following format:

$ head XX-A.mutect2.pileups.table
#<METADATA>SAMPLE=XX-A
contig  position        ref_count       alt_count       other_alt_count allele_frequency
chr1    69428   3       0       0       0.023
chr1    69761   0       0       0       0.053
chr1    69849   0       0       0       0.035
chr1    69968   0       0       0       0.012
chr1    943937  327     0       0       0.074
chr1    944101  237     0       0       0.019

If I run .command.sh for this step manually, I can reproduce the same error :

$ docker run -v "/<pwd>:/work" -w "/work" quay.io/biocontainers/gatk4:4.4.0.0--py36hdfd78af_0 bash .command.sh
Using GATK jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx9830M -jar /usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar GatherPileupSummaries --I JH-2-091-854A2-A.mutect2.pileups.table --O JH-2-091-854A2-A.mutect2.pileups.table --sequence-dictionary Homo_sapiens_assembly38.dict --tmp-dir .
22:08:53.803 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/usr/local/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
22:08:53.998 INFO  GatherPileupSummaries - ------------------------------------------------------------
22:08:54.001 INFO  GatherPileupSummaries - The Genome Analysis Toolkit (GATK) v4.4.0.0
22:08:54.001 INFO  GatherPileupSummaries - For support and documentation go to https://software.broadinstitute.org/gatk/
22:08:54.001 INFO  GatherPileupSummaries - Executing as root@73d08ea446c4 on Linux v5.10.76-linuxkit amd64
22:08:54.002 INFO  GatherPileupSummaries - Java runtime: OpenJDK 64-Bit Server VM v17.0.3-internal+0-adhoc..src
22:08:54.003 INFO  GatherPileupSummaries - Start Date/Time: June 9, 2023 at 10:08:53 PM GMT
22:08:54.003 INFO  GatherPileupSummaries - ------------------------------------------------------------
22:08:54.003 INFO  GatherPileupSummaries - ------------------------------------------------------------
22:08:54.005 INFO  GatherPileupSummaries - HTSJDK Version: 3.0.5
22:08:54.005 INFO  GatherPileupSummaries - Picard Version: 3.0.0
22:08:54.007 INFO  GatherPileupSummaries - Built for Spark Version: 3.3.1
22:08:54.008 INFO  GatherPileupSummaries - HTSJDK Defaults.COMPRESSION_LEVEL : 2
22:08:54.009 INFO  GatherPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
22:08:54.010 INFO  GatherPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
22:08:54.012 INFO  GatherPileupSummaries - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
22:08:54.013 INFO  GatherPileupSummaries - Deflater: IntelDeflater
22:08:54.014 INFO  GatherPileupSummaries - Inflater: IntelInflater
22:08:54.015 INFO  GatherPileupSummaries - GCS max retries/reopens: 20
22:08:54.015 INFO  GatherPileupSummaries - Requester pays: disabled
22:08:54.016 INFO  GatherPileupSummaries - Initializing engine
22:08:54.316 INFO  GatherPileupSummaries - Done initializing engine
22:08:54.630 INFO  GatherPileupSummaries - Shutting down engine
[June 9, 2023 at 10:08:54 PM GMT] org.broadinstitute.hellbender.tools.walkers.contamination.GatherPileupSummaries done. Elapsed time: 0.01 minutes.
Runtime.totalMemory()=184549376
***********************************************************************

A USER ERROR has occurred: Bad input: format error in 'XX-A.mutect2.pileups.table' at line 0: premature end of table: header line not found

***********************************************************************
Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

When I look at the .command.sh file from this step, I see that both the input and output files have the same name:

cat .command.sh
#!/bin/bash -euo pipefail
gatk --java-options "-Xmx9830M" GatherPileupSummaries \
    --I XX-A.mutect2.pileups.table \
    --O XX-A.mutect2.pileups.table \
    --sequence-dictionary Homo_sapiens_assembly38.dict \
    --tmp-dir . \


cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:GATHERPILEUPSUMMARIES_NORMAL":
    gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS

If I change the name of the output file like below, the error is mitigated:

cat .command.sh
#!/bin/bash -euo pipefail
gatk --java-options "-Xmx9830M" GatherPileupSummaries \
    --I XX-A.mutect2.pileups.table \
    --O XX-A-gathered.mutect2.pileups.table \
    --sequence-dictionary Homo_sapiens_assembly38.dict \
    --tmp-dir . \


cat <<-END_VERSIONS > versions.yml
"NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_SOMATIC_ALL:BAM_VARIANT_CALLING_SOMATIC_MUTECT2:GATHERPILEUPSUMMARIES_NORMAL":
    gatk4: $(echo $(gatk --version 2>&1) | sed 's/^.*(GATK) v//; s/ .*$//')
END_VERSIONS

@jaybee84
Copy link
Author

jaybee84 commented Jun 9, 2023

I looked at the original code in the nf-core/sarek repo and it looks like the code expects an addition of a {prefix} to the output file. I am not sure why that is not the case during execution.

@jaybee84 jaybee84 changed the title Error reported at the GATHERPILEUPSUMMARIES step leading the workflow to abort Error reported at the GATHERPILEUPSUMMARIES step causing the workflow to abort Jun 9, 2023
@maxulysse maxulysse mentioned this issue Jun 12, 2023
10 tasks
@maxulysse maxulysse self-assigned this Jun 12, 2023
@maxulysse
Copy link
Member

ok, so issue is most likely due to the fact that this process should not run when just having the one interval.
I'm looking into a fix right now

@maxulysse
Copy link
Member

@jaybee84 #1098 should fix it.
It'll be soon in dev, so coming in the next release, we will probably have a patch soon

@jaybee84
Copy link
Author

Thanks @maxulysse !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants