Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abolish unfilled likelihoods and revamp VariantAnnotator #6172

Merged
merged 8 commits into from
Nov 25, 2019

Conversation

davidbenjamin
Copy link
Contributor

Here's the rundown of this PR:

We had this awkward implementation of AlleleLikelihoods called UnfilledReadLikelihoods, which didn't contain likelihoods but did have pileups. This was necessary because the annotation engine expects likelihoods but VariantAnnotator only has the reads. Several annotations had fallback code to annotate a likelihoods object that had no likelihoods. @vruano This is the class that you disliked so much in your recent code review of #5783

A few issues with this state of things:

  • Torturing the definition of AlleleLikelihoods, which forced the class to have methods like hasLikelihoods().
  • VariantAnnotator only applied the few annotations that had custom pileup-based fallback code.
  • Lots more annotation code for the fallback mode.

So the first step was the option that @lbergelson and @jamesemery liked most: create a regular likelihoods object in VariantAnnotator by hard-assigning of each read to the allele it best matches. This is exactly what all the custom fallback modes were doing in effect, but now it's implemented in one place instead of six or so. This lets us delete UnfilledLikelihoods and also lets VariantAnnotator apply any annotation.

@ldgauthier Since the most non-trivial aspect is the new integration test I'm inclined to assign you the review, but a case could be made for someone on the engine team.

This completely broke the VariantAnnotator tests, which were based on exact matches. This had been an issue before and has always been a bit of a nuisance, but now overhauling the tests became completely unavoidable. So, I rewrote all the tests and wrote a rigorous test based on concordance with annotations from Mutect2.

If I were reviewing I would start with the new code in VariantAnnotator that constructs the likelihoods object from the reads and verify that it is just a more polished version of the fallback code that several annotations used to have. Then I would look at the new VariantAnnotator integration tests. Some of the tolerances are fairly liberal but it's worth noting that much of the old exact match "truth" annotations were completely bogus. This is better than what we had before by a long shot but it's still use at your own risk.

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions about preserving old test behavior.

final ArgumentsBuilder args = new ArgumentsBuilder()
.addVCF(inputVCF)
.addOutput(outputVCF)
.addArgument(DbsnpArgumentCollection.DBSNP_LONG_NAME, dbsnp_138_b37_20_21_vcf);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we supposed to match alleles with dbSNP now? There certainly was a time when we didn't, i.e. a different ALT at the same position would still get annotated.

Copy link
Contributor Author

@davidbenjamin davidbenjamin Nov 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's unchanged. The dbSNP annotation still cares only about position.

@davidbenjamin davidbenjamin force-pushed the db_unfilled_likelihoods branch from 0cef5f2 to 5ffdb83 Compare November 24, 2019 05:22
@davidbenjamin
Copy link
Contributor Author

Back to @ldgauthier

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants