Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove NuMTs from MT pipeline and updates wdl to GATK4.1.1.0 #5847

Merged
merged 7 commits into from
Apr 2, 2019

Conversation

meganshand
Copy link
Contributor

Tests are not passing because I'm now using NIO in the WDL. I'll need to fix that, but the WDL itself should be ready for review.

The changes:

  • Updates the pipeline for the new Mutect2 Filtering scheme and pulls filtering after the liftover and recombining of the VCF.
  • Makes the subsetting of the WGS bam fast by using PrintReads over just chrM instead of traversing the whole bam for NuMT mates.
  • Moves polymorphic NuMTs based on autosomal coverage to a filter (it was an annotation before)
  • Adds option to hard filter by VAF

@meganshand meganshand requested a review from jsotobroad March 29, 2019 19:49
@codecov-io
Copy link

codecov-io commented Mar 29, 2019

Codecov Report

Merging #5847 into master will decrease coverage by 6.8%.
The diff coverage is n/a.

@@              Coverage Diff               @@
##              master     #5847      +/-   ##
==============================================
- Coverage     87.062%   80.262%    -6.8%     
+ Complexity     32179     30528    -1651     
==============================================
  Files           1979      1979              
  Lines         147498    147500       +2     
  Branches       16215     16215              
==============================================
- Hits          128414    118386   -10028     
- Misses         13172     23406   +10234     
+ Partials        5912      5708     -204
Impacted Files Coverage Δ Complexity Δ
...dorientation/CollectF1R2CountsIntegrationTest.java 0.714% <0%> (-99.286%) 1% <0%> (-14%)
...kers/filters/VariantFiltrationIntegrationTest.java 0.826% <0%> (-99.174%) 1% <0%> (-25%)
.../walkers/bqsr/BaseRecalibratorIntegrationTest.java 1.031% <0%> (-98.969%) 1% <0%> (-7%)
...s/variantutils/VariantsToTableIntegrationTest.java 1.042% <0%> (-98.958%) 1% <0%> (-21%)
...ers/vqsr/FilterVariantTranchesIntegrationTest.java 1.053% <0%> (-98.947%) 1% <0%> (-5%)
...on/FindBreakpointEvidenceSparkIntegrationTest.java 1.754% <0%> (-98.246%) 1% <0%> (-6%)
...bender/tools/spark/PileupSparkIntegrationTest.java 2.041% <0%> (-97.959%) 2% <0%> (-13%)
...tute/hellbender/tools/FlagStatIntegrationTest.java 2.083% <0%> (-97.917%) 1% <0%> (-5%)
...rs/variantutils/SelectVariantsIntegrationTest.java 0.25% <0%> (-97.75%) 1% <0%> (-70%)
...alkers/rnaseq/SplitNCigarReadsIntegrationTest.java 2.5% <0%> (-97.5%) 1% <0%> (-9%)
... and 159 more

${m2_extra_filtering_args} \
--max-alt-allele-count ${max_alt_allele_count} \
--mitochondria-mode \
${"--autosomal-coverage " + autosomal_coverage} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we ok with not having an autosomal coverage in some cases?

Copy link
Contributor Author

@meganshand meganshand Apr 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that if a user doesn't have this information it should be optional so they can still run the pipeline. But if you don't provide an input here you don't get the polymorphic NuMT filter, so I'm open to making it required. Do you have a strong opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha that makes sense to me, also your parameter_meta 👯 is great!

}
File wgs_aligned_input_bam_or_cram
Int? autosomal_coverage
Float? autosomal_coverage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to have a path for passing in a coverage file as well or jsut a float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean a file with only that float in it? Or do you mean Picard's coverage metrics file? And if it's the latter, would I have to put the code that reads the file and extracts that value directly in GATK or in the WDL?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about Picard's metrics file in the case where a non data delivery person wants to run from start to end (they don't get the coverage automatically put into their data model). It's probably not worth the trouble at this point though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you're right, let's leave that as future work.

@@ -156,76 +150,43 @@ workflow MitochondriaPipeline {
}

task SubsetBam {
File input_bam
String input_bam
String cram_basename = basename(input_bam, ".cram")
String basename = basename(cram_basename, ".bam")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do basename(basename(input_bam, ".cram"), ".bam") here?

@@ -156,76 +150,43 @@ workflow MitochondriaPipeline {
}

task SubsetBam {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth renaming to SubsetBamToMT since the command is pretty specific

gatk --java-options "-Xmx2500m" AddOriginalAlignmentTags \
gatk PrintReads \
${"-R " + ref_fasta} \
-L chrM \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect this to be run on hg19? chrM wont work if it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put it as an input to the workflow: String contig_name = "chrM" which my understanding from the spec means that a user can overwrite this value.


# runtime
Int? preemptible_tries
Int disk_size = 50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hard coded disk?

}
output {
File raw_vcf = "${output_vcf}"
File faw_vcf_idx = "${output_vcf_index}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that supposed to be raw_vcf_idx?

Boolean compress
Float? vaf_cutoff

String output_vcf = "output" + if compress then ".vcf.gz" else ".vcf"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to allow the pipeline to output uncompressed vcfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only look at uncompressed vcfs (the files are already tiny and it makes looking at things in Terra/Notebooks smoother). In fact I think I hardcoded compress = false when calling this task. Should I wire that argument through as an option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehhh it just irks me when I see uncompressed output (even if I know its tiny). Tools you use in Terra/Notebooks should be able to read compressed vcfs :P. I would lean towards wiring it through so people have an option to changing it.

}

call Filter {
input:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many spaces before input:

Copy link
Contributor

@jsotobroad jsotobroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good as long as tests pass

@meganshand meganshand merged commit 85c4f9f into master Apr 2, 2019
@meganshand meganshand deleted the ms_no_numt branch April 2, 2019 13:22
@yzhang88
Copy link

yzhang88 commented Nov 15, 2019

Could I find any further details about how to filter the NuMTs?

@meganshand
Copy link
Contributor Author

@yzhang88 I'd recommend looking on the GATK forum. There's been some discussion in the comments on this here: https://gatkforums.broadinstitute.org/gatk/discussion/comment/61646
Or you can search the rest of the GATK forum/open a new thread as needed. Thanks!

@yzhang88
Copy link

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants