-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removed undocumented mid-p correction to p-values in exact test of Hardy-Weinberg equilibrium and updated corresponding tests. #7394
Conversation
src/main/java/org/broadinstitute/hellbender/tools/walkers/annotator/ExcessHet.java
Show resolved
Hide resolved
17e096d
to
0ece016
Compare
@ldgauthier, @meganshand and I discussed whether we might want to switch from the one-sided p-values currently calculated by ExcessHet to mid p-values (which are used in the Hail implementation, see also https://www2.unil.ch/popgen/teaching/SISG14/Graffelman_Moreno_SAGMB_2013.pdf). She said you probably should make the call. As discussed by that reference, there can be significant differences between one-sided and mid p-values. However, since this PR already introduces differences between the old one-sided p-values and the corrected one-sided p-values, perhaps we want to go a step further and just switch over? Advantages would include consistency with Hail, as well as better power and type-1 error rate, at least according to that reference. But the test would no longer be specific to excess heterozygosity, which might not be desirable. I would also be curious to see what differences the correction or the switch would have in practice, given that the filter threshold is relatively conservative. Not sure I'm set up to rerun filtering on a large dataset, though. |
Oof, looks like there are now a bunch of broken integration tests that check ExcessHet for whatever reason. So let's definitely decide on whether we want to make the switch to mid p-values before I go through those. EDIT: Actually, what’s SOP here? Do I have to go through and recalculate ExcessHet for every single VCF/GenomicsDB in the repo? If we stick with the one-sided p-values now calculated here, then I guess one bonus is we’ll no longer have ExcessHet Phred scores of 3.0103 (which result from that short circuit returning a p-value of 0.5) everywhere. |
} | ||
|
||
@DataProvider(name = "smallSets") | ||
public Object[][] counts() { | ||
return new Object[][]{ | ||
{1, 0, 0, 0.5}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consequential test changes start here; most of the other changes in the class are just the result of scout-level cleanup.
Also just realized there’s yet another implementation in htsjdk, HardyWeinbergCalculation at https://github.com/samtools/htsjdk/blob/master/src/main/java/htsjdk/tribble/util/popgen/HardyWeinbergCalculation.java, so just a reminder to myself to check against that. Looks like a two-sided p-value of sorts is calculated there—I think this is P_{2\alpha} from Wigginton, although I need to double check. EDIT: Yup, it is, and furthermore the implementation appears to be correct. Phew! Added one more test to guard against a possible overflow issue that came up with that implementation, although it doesn't appear we have the same issue here. Will also note that 1) tests for the htsjdk implementation are pretty slim and don't actually cover very much, and 2) I don't see why we need to have two copies of this implementation, when all that essentially differs is the choice of p-value returned---we could certainly consolidate and expose the option of which p-value to return. Finally, I will also note that there is an implementation in bcftools. I have not checked it for correctness, but it appears to allow the calculation of both the one-sided p-value intended by ExcessHet, as well as what Wigginton calls P_{HWE}. So with that, the aforementioned implementations have covered every p-value discussed by that paper—and then one! |
//Check if we observed the highest possible number of hets | ||
if (hetCount == rareCopies) { | ||
return rightPval; | ||
} | ||
return rightPval + StatUtils.sum(Arrays.copyOfRange(probs, hetCount + 1, probs.length)) / mysum; | ||
return Math.min(1., rightPval + StatUtils.sum(Arrays.copyOfRange(probs, hetCount + 1, probs.length)) / mysum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, seems like we should use something from QualityUtils for converting this to Phred scale in the calculateEH method, but will let the reviewer decide. Not sure if we will want to e.g. change the details of capping; there seems to be some inconsistencies in magic numbers used in the code and in documentation. Will let reviewer decide if this is worth filing an issue to clean up later.
510e87e
to
4547321
Compare
src/test/java/org/broadinstitute/hellbender/tools/walkers/annotator/ExcessHetUnitTest.java
Show resolved
Hide resolved
fd35aed
to
daaa90a
Compare
src/test/java/org/broadinstitute/hellbender/tools/walkers/annotator/ExcessHetUnitTest.java
Show resolved
Hide resolved
Also added a little something to prevent output of negative-zero scores (since all of those 3.0103s actually became -0.0000s before this fix), see e.g. https://gatk.broadinstitute.org/hc/en-us/community/posts/360056519272-ExcessHet-value-is-0-0000. Not sure if this is something that should be done at the QualityUtils level either. |
Ah, now that I'm going through and re-updating test resources for a rebase, I see that I missed responding to this:
I think some of them did? E.g. I'm having to re-update |
6490258
to
64049cd
Compare
OK, thanks @ldgauthier, I think I've addressed all the comments but one. A little TODO list for my benefit:
Here are some plots for N = 50, 100, and 500 samples showing (in black) those counts that previously fell under the 3E-6 threshold with the mid-p correction but now pass without it. As you can see, not much to sweat from these "theoretical" plots, but good to convolve with the actual allele frequency spectrum and get an idea of how many sites occupy these black squares in practice (as well as start us down the road of reexamining the threshold itself): |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Based on gs://broad-gotc-test-results/staging/joint_genotyping/exome/scientific/2021-09-03-11-25-15/gather_vcfs_low_memory/small_callset_high_threshold.vcf.gz (from the console output) there are slightly fewer variants filtered with ExcessHet now, which is expected since you said it was an across-the-board shift. Expected (old) has 4335 and actual (new) has 4133 -- no new things, just some now pass. If you can calculate a new equivalent threshold I'd rather use that, but otherwise I'm not overly concerned about the changes. I'm not concerned about the Jenkins call caching unless it's for the GenotypeGVCFs task where ExcessHet actually gets calculated. For the ReducibleAnnotation comments, if you just revert your changes (statics, visibility, etc.) and open an issue I'm fine with that. Admittedly this could be another target for refactoring. |
Thanks for those numbers! I'm not sure we can shift the threshold in a uniquely defined way, since things are a function of the allele-frequency spectrum and the number of (non-missing) samples. So definitely glad to hear the impact is minimal and everything behaves as expected. I'll revert the changes tomorrow and will merge after your final thumbs up, thanks! |
@ldgauthier did you end up getting your other branches in already? If not, let me know when would be a good time to rebase this one. |
I think we're planning on cutting a release early next week to use in a Warp update. I don't plan on adding anything else, so you could probably rebase now. I don't want to merge before the release though. |
…rdy-Weinberg equilibrium and updated corresponding unit tests.
…dded an update toggle to GnarlyGenotyperIntegrationTest. (Then reverted changes to some test resources in a rebase.)
…sIntegrationTest to resolve rebase conflicts; uncommented tests and updated resources for GnarlyGenotyperIntegrationTest.
1880ce8
to
9b93d3c
Compare
…al of code in ReducibleAnnotation stubs.
Seems like there weren't any exact-match tests to update after the latest release, so this should be ready pending your approval, @ldgauthier! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've held this up long enough, but it might be worth asking @gbrandt6 if she wants to take a look at the ExcessHet documentation changes in that class.
Thanks @ldgauthier. @gbrandt6 I’d appreciate it if you want to take a look, but I might ask if you can do it by Friday afternoon—I’m out after then through all of next week. Would like to merge before I head out to avoid any more rebasing and/or updating of exact-match tests. Happy to look at any changes to the docs you might make in a subsequent PR, though! |
Actually going to go ahead and merge. Got my booster earlier today and might be knocked out tomorrow. Again, happy to look at subsequent doc changes. |
@samuelklee I didn't get a chance to take a look, I'll see if we need to make any doc changes next week and let you know |
Work is split into two commits:
Various scout cleanups as well.
We now report the same value as ExcHet in bcftools. Note that previous values of 3.0103 (corresponding to mid-p values of 0.5) will now be 0.0000. See discussion below and in linked issue for additional details.
Closes #7392.