Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CNV WDLs to WDL 1.0. #6502

Closed
samuelklee opened this issue Mar 16, 2020 · 7 comments
Closed

Update CNV WDLs to WDL 1.0. #6502

samuelklee opened this issue Mar 16, 2020 · 7 comments

Comments

@samuelklee
Copy link
Contributor

No description provided.

@ldgauthier
Copy link
Contributor

Are these NUM_INTERVALS=${dollar}(grep -v '@' ${interval_list} | wc -l) shenanigans in ScatterIntervals solved in WDL 1.0?

@samuelklee
Copy link
Contributor Author

If you're talking about the need for dollar, I'm not sure---looks like there are other instances of this hack in other WDLs, some of which have purportedly been updated to 1.0.

I don't think we would need this hack (in the gCNV WDLs, at least) if it weren't for the bash-related shenanigans in that task, which are required due to the wonky output style of IntervalListTools, though...

@ldgauthier
Copy link
Contributor

We can certainly modify IntervalListTools to make the behavior more intuitive (e.g. add a new mode for INTERVAL_COUNT_WITH_OVREFLOW), but I'm not sure I understand the issue. We're just trying to avoid calling the GATK command if the scatter is going to be a noop?

The GATK SplitIntervals has slightly different behavior that's helpful in some cases.

@ldgauthier
Copy link
Contributor

From gatk-workflows/gatk4-germline-cnvs#2
PreprocessIntervals doesn't need a sequence dictionary arg if it has a reference fasta

@samuelklee
Copy link
Contributor Author

samuelklee commented Mar 16, 2020

We already added the functionality needed for the gCNV workflow to IntervalListTools in broadinstitute/picard#1208. The issue is that the tool outputs each scattered interval list to a separate directory if the number of scatters is greater than 1, but it just outputs to a file (essentially a noop) if we don't need to scatter. Not sure the reason for this design, but it makes things difficult from the perspective of WDL. Would be easier if the expected output was always Array[File]+.

I don't really see why IntervalListTools needs to create those intermediate directories (nor why the naming scheme is determined by DecimalFormat("0000")---don't think this is documented, either), but I am not sure if that behavior is expected by now or if it is safe to modify it.

Pretty sure SplitIntervals is just calling the same backend class used by IntervalListTools. Perhaps that tool might've been spun off before we exposed the Picard tools? See e.g. #5392 (comment). I think we should try to avoid writing such custom/utility GATK tools unless really warranted.

@samuelklee
Copy link
Contributor Author

Actually note broadinstitute/picard#1208 (comment) concerning the naming scheme of the output directories.

@mwalker174
Copy link
Contributor

Resolved via #6506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants