-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove NuMTs from MT pipeline and updates wdl to GATK4.1.1.0 #5847
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5847 +/- ##
==============================================
- Coverage 87.062% 80.262% -6.8%
+ Complexity 32179 30528 -1651
==============================================
Files 1979 1979
Lines 147498 147500 +2
Branches 16215 16215
==============================================
- Hits 128414 118386 -10028
- Misses 13172 23406 +10234
+ Partials 5912 5708 -204
|
${m2_extra_filtering_args} \ | ||
--max-alt-allele-count ${max_alt_allele_count} \ | ||
--mitochondria-mode \ | ||
${"--autosomal-coverage " + autosomal_coverage} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we ok with not having an autosomal coverage in some cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that if a user doesn't have this information it should be optional so they can still run the pipeline. But if you don't provide an input here you don't get the polymorphic NuMT filter, so I'm open to making it required. Do you have a strong opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gotcha that makes sense to me, also your parameter_meta
👯 is great!
} | ||
File wgs_aligned_input_bam_or_cram | ||
Int? autosomal_coverage | ||
Float? autosomal_coverage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to have a path for passing in a coverage file as well or jsut a float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a file with only that float in it? Or do you mean Picard's coverage metrics file? And if it's the latter, would I have to put the code that reads the file and extracts that value directly in GATK or in the WDL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about Picard's metrics file in the case where a non data delivery person wants to run from start to end (they don't get the coverage automatically put into their data model). It's probably not worth the trouble at this point though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you're right, let's leave that as future work.
@@ -156,76 +150,43 @@ workflow MitochondriaPipeline { | |||
} | |||
|
|||
task SubsetBam { | |||
File input_bam | |||
String input_bam | |||
String cram_basename = basename(input_bam, ".cram") | |||
String basename = basename(cram_basename, ".bam") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you do basename(basename(input_bam, ".cram"), ".bam")
here?
@@ -156,76 +150,43 @@ workflow MitochondriaPipeline { | |||
} | |||
|
|||
task SubsetBam { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth renaming to SubsetBamToMT
since the command is pretty specific
gatk --java-options "-Xmx2500m" AddOriginalAlignmentTags \ | ||
gatk PrintReads \ | ||
${"-R " + ref_fasta} \ | ||
-L chrM \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we expect this to be run on hg19? chrM
wont work if it is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it as an input to the workflow: String contig_name = "chrM"
which my understanding from the spec means that a user can overwrite this value.
|
||
# runtime | ||
Int? preemptible_tries | ||
Int disk_size = 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard coded disk?
} | ||
output { | ||
File raw_vcf = "${output_vcf}" | ||
File faw_vcf_idx = "${output_vcf_index}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that supposed to be raw_vcf_idx
?
Boolean compress | ||
Float? vaf_cutoff | ||
|
||
String output_vcf = "output" + if compress then ".vcf.gz" else ".vcf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to allow the pipeline to output uncompressed vcfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only look at uncompressed vcfs (the files are already tiny and it makes looking at things in Terra/Notebooks smoother). In fact I think I hardcoded compress = false
when calling this task. Should I wire that argument through as an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehhh it just irks me when I see uncompressed output (even if I know its tiny). Tools you use in Terra/Notebooks should be able to read compressed vcfs :P. I would lean towards wiring it through so people have an option to changing it.
} | ||
|
||
call Filter { | ||
input: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many spaces before input:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good as long as tests pass
Could I find any further details about how to filter the NuMTs? |
@yzhang88 I'd recommend looking on the GATK forum. There's been some discussion in the comments on this here: https://gatkforums.broadinstitute.org/gatk/discussion/comment/61646 |
Thank you! |
Tests are not passing because I'm now using NIO in the WDL. I'll need to fix that, but the WDL itself should be ready for review.
The changes: