Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional code paths for ReadLikelihoods method that always returns true #4865

Closed
davidbenjamin opened this issue Jun 8, 2018 · 8 comments
Assignees

Comments

@davidbenjamin
Copy link
Contributor

ReadLikelihoods has the following method:

public boolean hasFilledLikelihoods() { return true; }

The annotations engine has several code paths that could only be taken if this were to return false. For example:

  • In DepthPerAlleleBySample
if (likelihoods.hasFilledLikelihoods()) {
    counts = annotateWithLikelihoods(vc, g, alleles, likelihoods);
 } else { 30 lines of code}
  • In RankSumTest
if (likelihoods.hasFilledLikelihoods()) {
    fillQualsFromLikelihood(vc, likelihoods, refQuals, altQuals, refLoc);
} else { 11 quintuply-nested lines of code}
  • There's also a bunch of code in all the StrandBiasTest implementations.

Can this method, and all the unused code paths, just be deleted? @droazen @ldgauthier

@droazen
Copy link
Contributor

droazen commented Jun 8, 2018

@davidbenjamin I don't think it can be removed. That method (ReadLikelihoods.hasFilledLikelihoods()) is overridden in UnfilledReadsLikelihoods to return false. It was deliberately added by @jamesemery in his port of VariantAnnotator -- the else clauses you mention are effectively the "VariantAnnotator" code path.

@jamesemery Can you comment?

@jamesemery
Copy link
Collaborator

@droazen That is correct. It was a necessary step to avoid having to make a class equality check on the likelihoods object itself. @davidbenjamin I am open to suggestions if you have an idea of how better to encapsulate the separation between these two likelihood objects.

@davidbenjamin
Copy link
Contributor Author

Thanks for indulging me on this. To me it seems like UnfilledReadsLikelihoods diverges too much from ReadsLikelihoods to extend it. In effect it's letting ReadsLikelihoods sometimes be a wrapper for something that is not a ReadsLikelihoods.

I haven't worked this out but I would hope that it's possible to construct a ReadsLikelihoods from a pileup. I mean, the idea of pileup calling is that you use just a single base for the likelihoods and not the whole read (via Pair-HMM), so we should be able to fill the likelihoods from the base qualities.

@davidbenjamin
Copy link
Contributor Author

Another point is that the code is already implicitly assigning likelihoods to reads in order to determine which allele each read supports. A method to create a ReadsLikelihoods from a pileup would just make this explicit.

@magicDGS
Copy link
Contributor

I have some code to do that in a test project - if you are interested on it, I can submit a PR with my proposal. It will be nice for an idea that I have in mind, and if it is used also in GATK it would have more support (as a single developer, my reviews are not as good as in a team). Just let me know if you wanna port the code to some utility class!

@droazen
Copy link
Contributor

droazen commented Jun 12, 2018

@jamesemery Can you weigh in with your thoughts? I seem to recall that there were a lot of nuances surrounding this issue (for example, consistency issues between VariantAnnotator and other annotation-producing tools).

@jamesemery
Copy link
Collaborator

@davidbenjamin That would work and is already more or less what we are doing for VariantAnnotator. The reason we went through trouble of making two classes in the first place was because the pileup was not entirely adequate without actually doing the assembly and genotyping as indels and a number of other effects cause problems that become expensive unless we are in a tool like haplotype caller.
Currently, UnfilledReadsLiklihoods represents the "likelihood" of each allele by doing just what you said, looking at only the base pileup. This means that VariantAnnoator is currently only able to add annotations to SNPs as a result, just like it did in gatk3 (indeed including some of the same UnifiedGenotyper bugs as well).

@davidbenjamin
Copy link
Contributor Author

Closed by #6172.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants