Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed undocumented mid-p correction to p-values in exact test of Hardy-Weinberg equilibrium and updated corresponding tests. #7394

Merged
merged 6 commits into from
Nov 19, 2021

Conversation

samuelklee
Copy link
Contributor

@samuelklee samuelklee commented Aug 4, 2021

Work is split into two commits:

  • Removed undocumented mid-p correction to p-values in exact test of Hardy-Weinberg equilibrium and updated corresponding unit tests.
  • Updated expected ExcessHet values in integration test resources and added an update toggle to GnarlyGenotyperIntegrationTest.

Various scout cleanups as well.

We now report the same value as ExcHet in bcftools. Note that previous values of 3.0103 (corresponding to mid-p values of 0.5) will now be 0.0000. See discussion below and in linked issue for additional details.

Closes #7392.

@samuelklee samuelklee force-pushed the sl_excess_het_calc branch 6 times, most recently from 17e096d to 0ece016 Compare August 4, 2021 14:38
@samuelklee
Copy link
Contributor Author

samuelklee commented Aug 4, 2021

@ldgauthier, @meganshand and I discussed whether we might want to switch from the one-sided p-values currently calculated by ExcessHet to mid p-values (which are used in the Hail implementation, see also https://www2.unil.ch/popgen/teaching/SISG14/Graffelman_Moreno_SAGMB_2013.pdf). She said you probably should make the call.

As discussed by that reference, there can be significant differences between one-sided and mid p-values. However, since this PR already introduces differences between the old one-sided p-values and the corrected one-sided p-values, perhaps we want to go a step further and just switch over? Advantages would include consistency with Hail, as well as better power and type-1 error rate, at least according to that reference. But the test would no longer be specific to excess heterozygosity, which might not be desirable.

I would also be curious to see what differences the correction or the switch would have in practice, given that the filter threshold is relatively conservative. Not sure I'm set up to rerun filtering on a large dataset, though.

@gatk-bot
Copy link

gatk-bot commented Aug 4, 2021

Travis reported job failures from build 35283
Failures in the following jobs:

Test Type JDK Job ID Logs
integration openjdk11 35283.12 logs
variantcalling openjdk8 35283.4 logs
integration openjdk8 35283.2 logs

@gatk-bot
Copy link

gatk-bot commented Aug 4, 2021

Travis reported job failures from build 35285
Failures in the following jobs:

Test Type JDK Job ID Logs
integration openjdk11 35285.12 logs
variantcalling openjdk8 35285.4 logs
integration openjdk8 35285.2 logs

@samuelklee
Copy link
Contributor Author

samuelklee commented Aug 4, 2021

Oof, looks like there are now a bunch of broken integration tests that check ExcessHet for whatever reason. So let's definitely decide on whether we want to make the switch to mid p-values before I go through those. EDIT: Actually, what’s SOP here? Do I have to go through and recalculate ExcessHet for every single VCF/GenomicsDB in the repo?

If we stick with the one-sided p-values now calculated here, then I guess one bonus is we’ll no longer have ExcessHet Phred scores of 3.0103 (which result from that short circuit returning a p-value of 0.5) everywhere.

}

@DataProvider(name = "smallSets")
public Object[][] counts() {
return new Object[][]{
{1, 0, 0, 0.5},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consequential test changes start here; most of the other changes in the class are just the result of scout-level cleanup.

@samuelklee
Copy link
Contributor Author

samuelklee commented Aug 4, 2021

Also just realized there’s yet another implementation in htsjdk, HardyWeinbergCalculation at https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/tribble/util/popgen/HardyWeinbergCalculation.java, so just a reminder to myself to check against that. Looks like a two-sided p-value of sorts is calculated there—I think this is P_{2\alpha} from Wigginton, although I need to double check.

EDIT: Yup, it is, and furthermore the implementation appears to be correct. Phew! Added one more test to guard against a possible overflow issue that came up with that implementation, although it doesn't appear we have the same issue here. Will also note that 1) tests for the htsjdk implementation are pretty slim and don't actually cover very much, and 2) I don't see why we need to have two copies of this implementation, when all that essentially differs is the choice of p-value returned---we could certainly consolidate and expose the option of which p-value to return.

Finally, I will also note that there is an implementation in bcftools. I have not checked it for correctness, but it appears to allow the calculation of both the one-sided p-value intended by ExcessHet, as well as what Wigginton calls P_{HWE}. So with that, the aforementioned implementations have covered every p-value discussed by that paper—and then one!

//Check if we observed the highest possible number of hets
if (hetCount == rareCopies) {
return rightPval;
}
return rightPval + StatUtils.sum(Arrays.copyOfRange(probs, hetCount + 1, probs.length)) / mysum;
return Math.min(1., rightPval + StatUtils.sum(Arrays.copyOfRange(probs, hetCount + 1, probs.length)) / mysum);
Copy link
Contributor Author

@samuelklee samuelklee Aug 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, seems like we should use something from QualityUtils for converting this to Phred scale in the calculateEH method, but will let the reviewer decide. Not sure if we will want to e.g. change the details of capping; there seems to be some inconsistencies in magic numbers used in the code and in documentation. Will let reviewer decide if this is worth filing an issue to clean up later.

@samuelklee samuelklee force-pushed the sl_excess_het_calc branch 3 times, most recently from fd35aed to daaa90a Compare August 5, 2021 01:58
@samuelklee
Copy link
Contributor Author

Also added a little something to prevent output of negative-zero scores (since all of those 3.0103s actually became -0.0000s before this fix), see e.g. https://gatk.broadinstitute.org/hc/en-us/community/posts/360056519272-ExcessHet-value-is-0-0000. Not sure if this is something that should be done at the QualityUtils level either.

@broadinstitute broadinstitute deleted a comment from gatk-bot Aug 5, 2021
@broadinstitute broadinstitute deleted a comment from gatk-bot Aug 5, 2021
@broadinstitute broadinstitute deleted a comment from gatk-bot Aug 5, 2021
@broadinstitute broadinstitute deleted a comment from gatk-bot Aug 5, 2021
@broadinstitute broadinstitute deleted a comment from gatk-bot Aug 5, 2021
@samuelklee
Copy link
Contributor Author

Ah, now that I'm going through and re-updating test resources for a rebase, I see that I missed responding to this:

Now that I think more about it, I'm surprised none of the GenotypeGVCFs tests changed. I guess they must all be too few samples.

I think some of them did? E.g. I'm having to re-update src/test/resources/org/broadinstitute/hellbender/tools/walkers/GenotypeGVCFs/expected/gvcf.basepairResolution.includeNonVariantSites.vcf.

@samuelklee
Copy link
Contributor Author

samuelklee commented Sep 7, 2021

OK, thanks @ldgauthier, I think I've addressed all the comments but one. A little TODO list for my benefit:

  • Updated the GATK version in the ExcessHet documentation to 4.2.2.0, but we'll see if I need to revisit that.
  • Not quite sure about the ReducibleAnnotation business. Let me know how to make these changes, or else happy to punt and file an issue.
  • Also not sure I've parsed the results of the Jenkins tests, at least in terms of comparing how many sites get hard filtered with/out the change. Where should I be looking at to see the baseline result for that step? Also looks like a lot of results for https://gotc-jenkins.dsp-techops.broadinstitute.org/job/warp-workflow-tests/11755/ were call-cached, is that to be expected? Haven't looked at these tests before, so maybe you can walk me through them at some point. But I guess we can be sure that the overall results don't change too much (at least for 50 samples), which is a good start.
  • Didn't quite get to making those plots of the change in decision boundary, will do that tomorrow or later this week. EDIT: Nevermind, took like 5 minutes to throw them together (albeit using the slow python implementation and some for loops...), see below.
  • Hmm, looks like my own PR Exposed Smith-Waterman parameters in HaplotypeCaller, Mutect2, and FilterAlignmentArtifacts. #6885 might've introduced a few more exact match test failures...grr.

Here are some plots for N = 50, 100, and 500 samples showing (in black) those counts that previously fell under the 3E-6 threshold with the mid-p correction but now pass without it. As you can see, not much to sweat from these "theoretical" plots, but good to convolve with the actual allele frequency spectrum and get an idea of how many sites occupy these black squares in practice (as well as start us down the road of reexamining the threshold itself):

image
image
image

@gatk-bot

This comment has been minimized.

@gatk-bot

This comment has been minimized.

@ldgauthier
Copy link
Contributor

Based on gs://broad-gotc-test-results/staging/joint_genotyping/exome/scientific/2021-09-03-11-25-15/gather_vcfs_low_memory/small_callset_high_threshold.vcf.gz (from the console output) there are slightly fewer variants filtered with ExcessHet now, which is expected since you said it was an across-the-board shift. Expected (old) has 4335 and actual (new) has 4133 -- no new things, just some now pass. If you can calculate a new equivalent threshold I'd rather use that, but otherwise I'm not overly concerned about the changes.

I'm not concerned about the Jenkins call caching unless it's for the GenotypeGVCFs task where ExcessHet actually gets calculated.

For the ReducibleAnnotation comments, if you just revert your changes (statics, visibility, etc.) and open an issue I'm fine with that. Admittedly this could be another target for refactoring.

@samuelklee
Copy link
Contributor Author

Thanks for those numbers! I'm not sure we can shift the threshold in a uniquely defined way, since things are a function of the allele-frequency spectrum and the number of (non-missing) samples. So definitely glad to hear the impact is minimal and everything behaves as expected.

I'll revert the changes tomorrow and will merge after your final thumbs up, thanks!

@samuelklee
Copy link
Contributor Author

@ldgauthier did you end up getting your other branches in already? If not, let me know when would be a good time to rebase this one.

@ldgauthier
Copy link
Contributor

I think we're planning on cutting a release early next week to use in a Warp update. I don't plan on adding anything else, so you could probably rebase now. I don't want to merge before the release though.

…rdy-Weinberg equilibrium and updated corresponding unit tests.
…dded an update toggle to GnarlyGenotyperIntegrationTest. (Then reverted changes to some test resources in a rebase.)
…sIntegrationTest to resolve rebase conflicts; uncommented tests and updated resources for GnarlyGenotyperIntegrationTest.
@samuelklee
Copy link
Contributor Author

Seems like there weren't any exact-match tests to update after the latest release, so this should be ready pending your approval, @ldgauthier!

Copy link
Contributor

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've held this up long enough, but it might be worth asking @gbrandt6 if she wants to take a look at the ExcessHet documentation changes in that class.

@samuelklee
Copy link
Contributor Author

Thanks @ldgauthier. @gbrandt6 I’d appreciate it if you want to take a look, but I might ask if you can do it by Friday afternoon—I’m out after then through all of next week. Would like to merge before I head out to avoid any more rebasing and/or updating of exact-match tests. Happy to look at any changes to the docs you might make in a subsequent PR, though!

@samuelklee samuelklee removed the request for review from meganshand November 17, 2021 20:53
@samuelklee
Copy link
Contributor Author

Actually going to go ahead and merge. Got my booster earlier today and might be knocked out tomorrow. Again, happy to look at subsequent doc changes.

@samuelklee samuelklee merged commit f06971a into master Nov 19, 2021
@samuelklee samuelklee deleted the sl_excess_het_calc branch November 19, 2021 04:01
@gbrandt6
Copy link
Contributor

@samuelklee I didn't get a chance to take a look, I'll see if we need to make any doc changes next week and let you know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible bug in ExcessHet p-value calculation?
6 participants