Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The pipeline should fail early when --no_intervals is used with joint germline calling #1282

Open
FriederikeHanssen opened this issue Oct 11, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@FriederikeHanssen
Copy link
Contributor

Description of the bug

As title states, we used to have it but apparently not anymore, see here: https://nfcore.slack.com/archives/CGFUX04HZ/p1697021363120009

Command used and terminal output

No response

Relevant files

No response

System information

No response

@FriederikeHanssen FriederikeHanssen added the bug Something isn't working label Oct 11, 2023
@asp8200
Copy link
Contributor

asp8200 commented Oct 11, 2023

@FriederikeHanssen : You mentioned that "GenomicsDB doesn’t work without intervals."

In connection to that, I feel like pointing out that we now have two subworkflows for joint-germline variant-calling : one with GATK/haplotypecaller and one with Sentieon/haplotyper. The Sentieon/haplotyper subworkflow for joint-germlilne variant-callling doesn't use GenomicsDB, and as far as I can tell it works fine with the option --no_intervals, although we do not have a CI-test for that at the moment.

We do have this test, which gives:

[0e/b77807] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIE... [100%] 4 of 4 ✔

with the sentieon-cmd being this

sentieon driver  -r genome.fasta -t 2 -i testT.converted.cram --interval chr22_2-15000.bed  --algo Haplotyper -d dbsnp_146.hg38.vcf.gz  --emit_mode gvcf testT.haplotyper.chr22_2-15000.g.vcf.gz

If I do something similar but with --no_intervals added, that is,

nextflow run main.nf -profile test_cache,software_license,docker --sentieon_extension --input ./tests/csv/3.0/mapped_joint_bam.csv --tools sentieon_haplotyper --step variant_calling --joint_germline --outdir results --sentieon_haplotyper_emit_mode gvcf --no_intervals --nucleotides_per_second 20 --wes true

then I get:

[24/648d40] process > NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_VARIANT_CALLING_SENTIEON_HAPLOTYPER:SENTIEON_HAPLOTYPER (testN)             [100%] 2 of 2 ✔

and the sentieon-cmd looks like this:

sentieon driver  -r genome.fasta -t 2 -i testN.converted.cram   --algo Haplotyper -d dbsnp_146.hg38.vcf.gz  --emit_mode gvcf testN.haplotyper.g.vcf.gz

I guess we want to keep the possibility of running the joint-germline sentieon/haplotyper with --no_intervals, right?

Should I perhaps add a test for that?

@FriederikeHanssen
Copy link
Contributor Author

yes of course, just for the one that uses the genomicsdb route we want to have an early fail

@asp8200
Copy link
Contributor

asp8200 commented Oct 11, 2023

yes of course, just for the one that uses the genomicsdb route we want to have an early fail

Should I add a pytest for that as mentioned above?

@cmatKhan
Copy link
Contributor

repeat #1434

@cmatKhan
Copy link
Contributor

cmatKhan commented Aug 3, 2024

I commented this in the other issue, but im going to put it here b/c this has some discussion -- rather than failing, the GATK workflow could use combineGVCFs

https://gatk.broadinstitute.org/hc/en-us/articles/360037053272-CombineGVCFs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants