Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for CEMBA type caller #5762

Merged
merged 18 commits into from
Apr 11, 2019
Merged

First draft for CEMBA type caller #5762

merged 18 commits into from
Apr 11, 2019

Conversation

benjamincarlin
Copy link
Contributor

This is a first draft/iteration on my CEMBA Type Caller in order to jump start the review process. Here are some of my own suggestions to add:

Add Tests with a sample methylated bam as input and a corresponding vcf as expected exact match output
Add constants for VCF header names
Add constants for VCF header descriptions
Change program group

@lbergelson lbergelson self-assigned this Mar 5, 2019
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamincarlin This looks good! I have a number of comments but they're all pretty much about style and typos. This needs some tests though. If you want help writing those I'm happy to help at some point.

I think this is just reproducing what's in your pipeline right now, but it seems like there's a lot that could be done to make this more robust.

  1. You might want to compute a score based on the base qualities instead just the count.
  2. It would be a good idea to use some sort of strand bias measure to differentiate between SNPs and methylation sites. SNPs should be close to evenly balanced across either strand while the methylation will only be happening on a single strand so it should be detectable in most cases.
  3. There's work going on in the bam spec right now to support a sane way to represent methylation in bam. I don't know how that ties into your work or not. You might want a similar tool to this that writes the methylation calls into the bam file. I know you've had some bams with methylation tags already and I'd be curious to talk about how they are represented.

@benjamincarlin
Copy link
Contributor Author

@lbergelson Thanks for the review and your time! I responded to your comments, so please let me know if there is anything else I should do!

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamincarlin A few comments on GATK conventions for this.

@benjamincarlin benjamincarlin requested a review from cmnbroad March 18, 2019 14:26
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor comments. Per discussion with @benjamincarlin, this needs at least one test that exercises the tool and verifies the output. @benjamincarlin is working on getting a test file that we can put in the public repo. Once we have that, I can help with the test. Also I think many of @lbergelson's original suggestions still apply, but with a test we could at least get this to @Experimental stage.

@benjamincarlin benjamincarlin requested a review from cmnbroad March 20, 2019 15:37
@benjamincarlin
Copy link
Contributor Author

@cmnbroad Thank you again! I responded/fixed all of your comments/review and will hopefully obtain some test data later this afternoon!

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting close - a few comments, mostly on GATK code, doc, and naming conventions.

@benjamincarlin benjamincarlin requested a review from cmnbroad April 3, 2019 17:39
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamincarlin A couple of remaining minor cleanup changes, and one other request now that I've seen the test data. It looks like the test BAM you're using was aligned with bismark. Since the tool will run and produce results on any BAM/reference combination, the tool doc should have a section that explicitly states any assumed requirements/prerequisites (i.e, bisulfite sequenced, methylation-aware aligner, reference requirements - whatever they are), and the command line summary and oneline summary should refer to these.

@benjamincarlin benjamincarlin requested a review from cmnbroad April 9, 2019 14:00
@cmnbroad cmnbroad marked this pull request as ready for review April 10, 2019 13:38
Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benjamincarlin This looks good now. I just noticed though that its been in PR draft mode through the whole review cycle I moved it out of draft, but lets squash it down (and rebase on master while you're at it) and let the PR branch tests run through once on travis, and then we'll get it merged. thx.

@benjamincarlin benjamincarlin force-pushed the bc-methylation-walker branch from 4255650 to 81334de Compare April 11, 2019 15:40
@codecov-io
Copy link

codecov-io commented Apr 11, 2019

Codecov Report

Merging #5762 into master will decrease coverage by 6.732%.
The diff coverage is 83.951%.

@@              Coverage Diff               @@
##              master    #5762       +/-   ##
==============================================
- Coverage     86.833%    80.1%   -6.732%     
+ Complexity     32275    30638     -1637     
==============================================
  Files           1987     1990        +3     
  Lines         149022   149103       +81     
  Branches       16472    16477        +5     
==============================================
- Hits          129400   119432     -9968     
- Misses         13609    23863    +10254     
+ Partials        6013     5808      -205
Impacted Files Coverage Δ Complexity Δ
...institute/hellbender/utils/help/HelpConstants.java 4.167% <ø> (ø) 1 <0> (ø) ⬇️
...ute/hellbender/utils/variant/GATKVCFConstants.java 75% <ø> (ø) 4 <0> (ø) ⬇️
...cmdline/programgroups/MethylationProgramGroup.java 100% <100%> (ø) 3 <3> (?)
.../walkers/MethylationTypeCallerIntegrationTest.java 9.091% <9.091%> (ø) 1 <1> (?)
...ellbender/tools/walkers/MethylationTypeCaller.java 95.522% <95.522%> (ø) 11 <11> (?)
...dorientation/CollectF1R2CountsIntegrationTest.java 0.714% <0%> (-99.286%) 1% <0%> (-14%)
...kers/filters/VariantFiltrationIntegrationTest.java 0.826% <0%> (-99.174%) 1% <0%> (-25%)
.../walkers/bqsr/BaseRecalibratorIntegrationTest.java 1.031% <0%> (-98.969%) 1% <0%> (-7%)
...s/variantutils/VariantsToTableIntegrationTest.java 1.042% <0%> (-98.958%) 1% <0%> (-21%)
...ers/vqsr/FilterVariantTranchesIntegrationTest.java 1.053% <0%> (-98.947%) 1% <0%> (-5%)
... and 163 more

@codecov-io
Copy link

Codecov Report

Merging #5762 into master will decrease coverage by 49.627%.
The diff coverage is 80.247%.

@@               Coverage Diff                @@
##              master     #5762        +/-   ##
================================================
- Coverage     86.833%   37.206%   -49.627%     
+ Complexity     32275     18148     -14127     
================================================
  Files           1987      1990         +3     
  Lines         149022    149103        +81     
  Branches       16472     16477         +5     
================================================
- Hits          129400     55475     -73925     
- Misses         13609     88681     +75072     
+ Partials        6013      4947      -1066
Impacted Files Coverage Δ Complexity Δ
...institute/hellbender/utils/help/HelpConstants.java 4.167% <ø> (ø) 1 <0> (ø) ⬇️
...ute/hellbender/utils/variant/GATKVCFConstants.java 75% <ø> (ø) 4 <0> (ø) ⬇️
...cmdline/programgroups/MethylationProgramGroup.java 0% <0%> (ø) 0 <0> (?)
.../walkers/MethylationTypeCallerIntegrationTest.java 9.091% <9.091%> (ø) 1 <1> (?)
...ellbender/tools/walkers/MethylationTypeCaller.java 95.522% <95.522%> (ø) 11 <11> (?)
...ls/variant/writers/GVCFBlockCombiningIterator.java 0% <0%> (-100%) 0% <0%> (-1%)
...ls/walkers/genotyper/HeterogeneousPloidyModel.java 0% <0%> (-100%) 0% <0%> (-14%)
...nder/utils/downsampling/FractionalDownsampler.java 0% <0%> (-100%) 0% <0%> (-17%)
...park/pathseq/MarkedOpticalDuplicateReadFilter.java 0% <0%> (-100%) 0% <0%> (-4%)
...otypecaller/RandomLikelihoodCalculationEngine.java 0% <0%> (-100%) 0% <0%> (-6%)
... and 1227 more

@codecov-io
Copy link

codecov-io commented Apr 11, 2019

Codecov Report

Merging #5762 into master will increase coverage by 0.006%.
The diff coverage is 96.296%.

@@               Coverage Diff               @@
##              master     #5762       +/-   ##
===============================================
+ Coverage     86.833%   86.839%   +0.006%     
- Complexity     32275     32293       +18     
===============================================
  Files           1987      1990        +3     
  Lines         149022    149103       +81     
  Branches       16472     16477        +5     
===============================================
+ Hits          129400    129479       +79     
- Misses         13609     13610        +1     
- Partials        6013      6014        +1
Impacted Files Coverage Δ Complexity Δ
...institute/hellbender/utils/help/HelpConstants.java 4.167% <ø> (ø) 1 <0> (ø) ⬇️
...ute/hellbender/utils/variant/GATKVCFConstants.java 75% <ø> (ø) 4 <0> (ø) ⬇️
.../walkers/MethylationTypeCallerIntegrationTest.java 100% <100%> (ø) 3 <3> (?)
...cmdline/programgroups/MethylationProgramGroup.java 100% <100%> (ø) 3 <3> (?)
...ellbender/tools/walkers/MethylationTypeCaller.java 95.522% <95.522%> (ø) 11 <11> (?)
...dinstitute/hellbender/engine/AlignmentContext.java 89.189% <0%> (+2.703%) 22% <0%> (+1%) ⬆️

Copy link
Collaborator

@cmnbroad cmnbroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like all tests passed.

@cmnbroad cmnbroad dismissed lbergelson’s stale review April 11, 2019 18:27

Merging as @Experimental in Louis' absence.

@cmnbroad cmnbroad merged commit 8ab6604 into master Apr 11, 2019
@cmnbroad cmnbroad deleted the bc-methylation-walker branch April 11, 2019 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants