Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retain all source IDs on VariantContext merge + test #9032

Merged
merged 16 commits into from
Nov 13, 2024

Conversation

bbimber
Copy link
Contributor

@bbimber bbimber commented Nov 4, 2024

This feature was initially opened in this PR: #8750, after which @lbergelson and @droazen made comments here; #8752.

The driving use-case is that we took over the GATK3 MergeVariantsAndGenotypes tool at DISCVRseq and users have been requesting the older behavior on VCF merges, such as: BimberLab/DISCVRSeq#313.

The original PR has been languishing since March and I'm hoping to finalize this feature. Because I cant write to the GATK repo and b/c @lbergelson made some suggestions on a GATK-based branch I am going to put every together into one clean PR, which responds to the code review from the thread above.

To recap background:

  • In GATK3, when merging variants, the IDs of all the source VCFs were retained. The GATK4 code path seems like it intended to do this, since the variantSources set is generated, but that variant isnt used for anything (I assume GATK3 code was partially carried forward to incompletely refactored?).

  • This PR is designed to allow code to opt-in to the old GATK behavior of retaining the IDs of source VCFs in the ID field. It will not change the default behavior for existing code.

  • I dont think I can kick off the test suite, but these tests did pass here: Retain all source IDs on VariantContext merge + test #8752.

Again, @lbergelson and @droazen both reviewed the original PR and seemed fine with it in principle. The primary concern raised by @droazen was to avoid changing the current defaults and to not create additional burden (such as adding sorts). I believe this addresses both of those concerns. @jamesemery commented on the thread at one point as well.

Is there anything I can do to help move this forward? Thanks for your time.

bbimber and others added 16 commits March 25, 2024 06:35
In GATK3, when merging variants, the IDs of all the source VCFs were retained. This code path seems like it intended that, since the variantSources set is generated, but it doesnt get used for anything. This PR will use that set to set the source of the resulting merged VC.
…rve existing behavior when storeAllVcfSources=false
In GATK3, when merging variants, the IDs of all the source VCFs were retained. This code path seems like it intended that, since the variantSources set is generated, but it doesnt get used for anything. This PR will use that set to set the source of the resulting merged VC.
…rve existing behavior when storeAllVcfSources=false
…_merge2

# Conflicts:
#	src/main/java/org/broadinstitute/hellbender/utils/variant/GATKVariantContextUtils.java
@bbimber bbimber mentioned this pull request Nov 5, 2024
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Now how do I run the tests...

@lbergelson
Copy link
Member

Tests ran successfully here: #9040

@lbergelson lbergelson merged commit 18707c6 into broadinstitute:master Nov 13, 2024
20 checks passed
@lbergelson lbergelson mentioned this pull request Nov 13, 2024
@bbimber bbimber deleted the lb_source_merge2 branch November 13, 2024 19:01
@bbimber
Copy link
Contributor Author

bbimber commented Nov 13, 2024

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants