Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SelectVariants and VariantFiltration not updating AC, AN and AF for --setFilteredGtToNocall #1871

Merged
merged 1 commit into from
Sep 15, 2016

Conversation

ronlevine
Copy link
Contributor

Fixes #1731

Summary of changes

  • SelectVariants.java
    • createVCFHeaderLineList() - ensure allele count INFO annotations (AC, AN, AF) are in the output VCF header if --setFilteredGtToNocall
    • setFilteredGenotypeToNocall() - recompute AC, AN and AF if filtered genotypes are set to no-call, add them to the genotype builder.
  • VariantFiltration.java
    • initializeVcfWriter() - ensure allele count INFO annotations (AC, AN, AF) are in the output VCF header if --setFilteredGtToNocall
    • makeGenotypes() - recompute AC, AN and AF if filtered genotypes are set to no-call, add them to the genotype builder.
  • GATKVariantContextUtils.java
    • incrementChromosomeCountsInfo() - count the total and alternate alleles for a genotype
    • updateChromosomeCountsInfo() - update AC, AN and AF with the computed count of total and alternate alleles
  • selectVariantsInfoField.vcf* and variantFiltrationInfoField.vcf*
  • VariantFiltrationIntegrationTest.java
  • SelectVariantsIntegrationTest.java
    • MD5 changed in testSetFilteredGtoNocall() because AC, AN and AF are always in the VCF if --setFilteredGtToNocall
  • GATKVariantContextUtilsUnitTest.java
    • Test new GATKVariantContextUtils methods

@ronlevine
Copy link
Contributor Author

@droazen Please choose a reviewer.

@coveralls
Copy link
Collaborator

coveralls commented Jun 1, 2016

Coverage Status

Coverage increased (+0.04%) to 82.044% when pulling 2b0d306 on rhl_sv_vf_update_ac into 911d5fb on master.

@droazen
Copy link
Contributor

droazen commented Jun 21, 2016

@akiezun Would you have time to review this one this week, since you've been looking at Ron's other PRs?

@droazen droazen assigned akiezun and unassigned droazen Jun 21, 2016
@@ -177,6 +178,9 @@

private final List<Allele> diploidNoCallAlleles = Arrays.asList(Allele.NO_CALL, Allele.NO_CALL);

// Number of called alleles
int calledAlleles = 0;
Copy link
Contributor

@akiezun akiezun Jul 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this field. It is state that is not carried over apply calls and thus it is useless. Turn it into a temp and pass directly to methods that need it

Copy link
Contributor Author

@ronlevine ronlevine Aug 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to makeGenotypes().

@akiezun
Copy link
Contributor

akiezun commented Jul 14, 2016

looks good. small edits. back to you @ronlevine. Please rebase before next round.

@akiezun akiezun assigned ronlevine and unassigned akiezun Jul 14, 2016
@droazen
Copy link
Contributor

droazen commented Aug 3, 2016

@ronlevine Will you have time to address Adam's comments on this PR?

@ronlevine
Copy link
Contributor Author

Yes. It will have to wait until Monday.

@ronlevine ronlevine force-pushed the rhl_sv_vf_update_ac branch 2 times, most recently from 9104222 to 9ebdaf3 Compare August 8, 2016 17:20
@ronlevine
Copy link
Contributor Author

@droazen Please take a look. Note that I removed the functional programming in SelectVariants.setFilteredGenotypeToNocall(). I could not think of a clean way to implement the added logic.

@ronlevine ronlevine assigned droazen and unassigned ronlevine Aug 8, 2016
@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.03%) to 82.233% when pulling 9ebdaf3 on rhl_sv_vf_update_ac into a830c13 on master.

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.03%) to 82.236% when pulling 9ebdaf3 on rhl_sv_vf_update_ac into a830c13 on master.

@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.03%) to 82.236% when pulling 9ebdaf3 on rhl_sv_vf_update_ac into a830c13 on master.

@ronlevine ronlevine added the secon label Aug 8, 2016
@droazen droazen assigned lbergelson and unassigned droazen Aug 22, 2016
public static void updateChromosomeCountsInfo(final Map<Allele, Integer> calledAltAlleles, final int calledAlleles,
final VariantContextBuilder builder) {
Utils.nonNull(calledAltAlleles, "Called alternate alleles can not be null");
Utils.nonNull(builder, "Variant context builder can not be null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you might want a guard against negative calledAlleles here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@lbergelson
Copy link
Member

@ronlevine It looks to me that there could be some refactoring done here to separate out the AF update logic into a single function call that could be used in multiple places rather than having two sets of fairly complicated and highly similar logic. Could you see if that's doable?

@lbergelson lbergelson assigned ronlevine and unassigned lbergelson Aug 24, 2016
@ronlevine
Copy link
Contributor Author

@lbergelson I refactored the common logic into a utility class. Please take a look.

@ronlevine ronlevine assigned lbergelson and unassigned ronlevine Sep 1, 2016
@ronlevine
Copy link
Contributor Author

ronlevine commented Sep 2, 2016

@lbergelson Per your suggestion, I made setFilteredGenotypeToNocall() a void function. This is all set for review.

@@ -1349,4 +1351,119 @@ private static int indexOfSameAllele(final VariantContext vc, final Allele allel

return -1;
}

/**
* Add chromosome Counds (AC, AN and AF) to the VCF header lines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: counds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is chromosome counts established nomenclature that I'm not aware of? It seems like allele counts is more accurate to me since this is all per allele.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of. There are the classes ChromosomeCounts and ChromosomeCountConstants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are. Wait on doing those renames until I talk with a few people here about what the right name is. I don't want it to be inconsistent, but I think it's confusing how it is at the moment.

  1. Also, I noticed that VariantContextUtils has calculateChromosomeCounts which seems to be very similar to the new methods in this pr. Did we know about those going in to these changes? Are we duplicating code that already exists?
  2. I also noticed that ChromosomeCountConstants seems to be totally redundant with ChromosomeCounts. Could you remove ChromoSomeCountConstants while you're in this code? It looks like add ChromomeCounts should probably just addAll ChromosomeCounts.getDescriptions as well..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Removed getCalledChromosomeCounts and updateChromosomeCountsInfo and replaced the VariantContextUtils .calculateChromosomeCounts.
  2. Removed ChromoSomeCountConstants.

@lbergelson
Copy link
Member

@ronlevine Sorry for the slow response. I like it much more with the refactoring you did. I had a few minor comments. I want to ask @vdauwera about the names on monday, I find them confusing, but if it's standard nomenclature then we should keep them. So don't do any renaming until I get back to you about that I guess.

@lbergelson
Copy link
Member

@ronlevine I consulted with Geraldine. Keep calling things chromosome counts. That seems to be the accepted name.

@ronlevine
Copy link
Contributor Author

@lbergelson Incorporated your comments. Back to you.

@lbergelson
Copy link
Member

@ronlevine Looks good to me. Feel free to merge after squashing/rebasing if there aren't any issues with the rebase. Ignore the codacy failure, it's something I was testing but I don't think it's that useful.

@lbergelson lbergelson removed their assignment Sep 15, 2016
@coveralls
Copy link
Collaborator

Coverage Status

Coverage increased (+0.01%) to 79.985% when pulling 88acbda on rhl_sv_vf_update_ac into 4ec1b85 on master.

@ronlevine ronlevine merged commit 7882c82 into master Sep 15, 2016
@ronlevine ronlevine deleted the rhl_sv_vf_update_ac branch September 15, 2016 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SelectVariants and VariantFiltration not updating AC, AN and AF for --setFilteredGtToNocall
5 participants