-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nf core dev / variant calling bugs / ideas #489
Conversation
I'd rather keep the default for
Why not changing the modules to a
Sounds like a good idea
Not seeing any other solution there for now
I see what you mean there |
I don't understand the advantages of this. can you elaborate?
I agree that this should be changed upstream
not sure about a solution, but also haven't given it any thought. Honestly, not very happy about changing the temaplte code. Can we maybe take this disucssion to #tools before making any changes here? Looks like a release there will happen soon anyways, so it would maybe be good to get it in there now
Joint variant calling subworkflow needs a bit of work still anyways. So fine with merging it here and then work on the changes in a separate PR. |
the bcftools sort in the If a user is on a shared file system that doesn't have write access to |
I agree that we can change the upstream priority for |
ah now I see. Yes we had the same issue with GATK tools in the past and had to add |
oretty sure we are the only ones using it at the moment :D If the raredisease pipeline needs it they would run into the same problem with WGS data. |
I think default should be more, so that it's easier to restrict when needed. |
What about a params.scratchdir so that we can set any scratch directive with process.scratch = "${params.scratchdir}" |
this already exists to provide in a custom config/institutional config etc.. not sure, if process configurations should be command line parameters |
Like (or something):
|
yep it can also be set globally for all pipelines like here: https://github.com/nf-core/configs/blob/91b5875b44b01edfd2992e19adec0d1b552feef8/conf/cfc.config#L13 or just for your pipeline run for all processes:
Docu: https://www.nextflow.io/docs/latest/process.html#scratch |
I am confused. What do you mean :D Lasse Folkersen is implementing the module right now, so I can add it to the subworkflow soon, but nothing going on in this PR as far as I see.... it's all still in nf-core/modules |
Sorry, replied in the wrong PR... |
Co-authored-by: FriederikeHanssen <[email protected]>
Markdown linting is failingTo keep the code consistent with lots of contributors, we run automated code consistency checks.
Once you push these changes the test should pass, and you can hide this comment 👍 We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! Thanks again for your contribution! |
PR checklist
The following changes were made after trying to run the variant_calling pipeline (specifically
--tools haplotypecaller
). I have readded thegatk4/genotypegvcfs
module from nf-core (which was updated to include interval index files)An overview of the changes file by file:
concatenateVCFs.sh
: The default for bcftools sorting is on the /tmp directory (or /scratch) however it makes sense to me to maintain the temp.sorting files in the location of your work directory (.
)conf/base.config
: GENOTYPEGVCFS is listed as a medium_priority tool, but for WGS it was constantly failing the first 1 or 2 attempts. Giving it the defaulthigh_priority
mem/cpusconf/modules.config
: added the GENOTYPEGVCFS module to publish the final vcf file. Edited the HAPLOTYPECALLER module to not publish the gvcfs by default (could be set to true, but they are huge for a WGS sample) and to name them "{meta.id}.g" to differentiate them from the final vcf file.lib/WorkflowMain.groovy
: the nextflow pipeline requires an input file. But this is in direct contrast to the automatic restart based on theresults/../.csv
used in sarek.nf to determine the step and input files. Not elegant, but works to restart if appropriate step is namedsubworkflows/local/germline_variant_calling.nf
: include GENOTYPEGVCFS and account for both with intervals genotyping (work on the gvcfs after concat) or no intervals (work on the gvfs without concat) and create a new channel for the GENOTYPEGVCFS input. Should eventually be linked with logic for joint variant calling. The current behaviour is ALWAYS do single sample variant calling, and if joint variant calling is called, do that also. (not ideal)