-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retain all source IDs on VariantContext merge + test #9032
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In GATK3, when merging variants, the IDs of all the source VCFs were retained. This code path seems like it intended that, since the variantSources set is generated, but it doesnt get used for anything. This PR will use that set to set the source of the resulting merged VC.
…rve existing behavior when storeAllVcfSources=false
In GATK3, when merging variants, the IDs of all the source VCFs were retained. This code path seems like it intended that, since the variantSources set is generated, but it doesnt get used for anything. This PR will use that set to set the source of the resulting merged VC.
…rve existing behavior when storeAllVcfSources=false
…_merge2 # Conflicts: # src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVariantContextUtils.java
Closed
lbergelson
approved these changes
Nov 8, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Now how do I run the tests...
Tests ran successfully here: #9040 |
Closed
Thank you! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This feature was initially opened in this PR: #8750, after which @lbergelson and @droazen made comments here; #8752.
The driving use-case is that we took over the GATK3 MergeVariantsAndGenotypes tool at DISCVRseq and users have been requesting the older behavior on VCF merges, such as: BimberLab/DISCVRSeq#313.
The original PR has been languishing since March and I'm hoping to finalize this feature. Because I cant write to the GATK repo and b/c @lbergelson made some suggestions on a GATK-based branch I am going to put every together into one clean PR, which responds to the code review from the thread above.
To recap background:
In GATK3, when merging variants, the IDs of all the source VCFs were retained. The GATK4 code path seems like it intended to do this, since the variantSources set is generated, but that variant isnt used for anything (I assume GATK3 code was partially carried forward to incompletely refactored?).
This PR is designed to allow code to opt-in to the old GATK behavior of retaining the IDs of source VCFs in the ID field. It will not change the default behavior for existing code.
I dont think I can kick off the test suite, but these tests did pass here: Retain all source IDs on VariantContext merge + test #8752.
Again, @lbergelson and @droazen both reviewed the original PR and seemed fine with it in principle. The primary concern raised by @droazen was to avoid changing the current defaults and to not create additional burden (such as adding sorts). I believe this addresses both of those concerns. @jamesemery commented on the thread at one point as well.
Is there anything I can do to help move this forward? Thanks for your time.