Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle trailing reference blocks in CombineGVCFs. #4615

Merged
merged 1 commit into from
Apr 20, 2018
Merged

Conversation

cmnbroad
Copy link
Collaborator

Fixes #4572.

@cmnbroad cmnbroad requested a review from jamesemery March 30, 2018 02:26
@droazen
Copy link
Contributor

droazen commented Mar 30, 2018

@jamesemery please review

Copy link
Collaborator

@jamesemery jamesemery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure we want to abandon prevPos entirely, given that it could represent a gap between the last stack of variantcontexts and a previous real variant.

if ( !variantContextsOverlappingCurrentMerge.isEmpty() ) {
logger.warn("You have asked for an interval that cuts in the middle of one or more gVCF blocks. Please note that this will cause you to lose records that don't end within your interval.");
// finish off the last blocks
final SimpleInterval lastInterval = new SimpleInterval(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... So this change is mostly agnostic and is definately important for the case where the assumption that the prevPos must be on the right contig (which wasn't checked before). I'm not convinced this works universally in the case where there is a gap before the last contig. So imagine there are still reads queued to be closed at prevPos, a gap in coverage, then a stack of variant contexts at the last position. It doesn't strike me that this should handle closing that last interval if it needs to generate its own stops (for example if BREAK_BANDS_LONG_NAME) is set then it may well generate breaks in the gap between prevpos and the last stack of VCs. I think we should prefer to use prevPos in these cases and only switch to variantContextsOverlappingCurrentMerge.get(0) in the case where prevpos is on the wrong contig/starts later than it should.

Copy link
Collaborator Author

@cmnbroad cmnbroad Apr 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesemery Having thought about this a bit more, I'm not sure that problem is new to this change. As far as I can tell, this implementation of the tool only emits coverage for territory that is covered by the inputs, and doesn't pad out territory beyond that. In onTraversalSuccess, if prevPos is on a different contig than the remaining variants to be merged, we won't close out (pad out) the previous contig, but I don't think the tool ever does that ?). We might want to fix that, but I think thats a separate/bigger issue ? Otherwise, the variants that are remaining are all guaranteed to be on the same contig (right ?) Sorry if I'm missing something, but please push back if you think I am.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, you are right. I was thinking of the code being like a normal apply() call where you are holding onto something "next." Your fix looks good.

@@ -74,7 +74,10 @@
//Testing allele-specific annotations
{new File[]{getTestFile("NA12878.AS.chr20snippet.g.vcf"), getTestFile("NA12892.AS.chr20snippet.g.vcf")}, getTestFile("testAlleleSpecificAnnotations.vcf"), Arrays.asList("-G", "Standard", "-G", "AS_Standard"), b37_reference_20_21},
//Testing allele-specific annotations missing AS_Standard Group
{new File[]{getTestFile("NA12878.AS.chr20snippet.g.vcf"), getTestFile("NA12892.AS.chr20snippet.g.vcf")}, getTestFile("testAlleleSpecificAnnotationsNoGroup.vcf"), Arrays.asList("-G", "Standard", "-G", "AS_Standard"), b37_reference_20_21},};
{new File[]{getTestFile("NA12878.AS.chr20snippet.g.vcf"), getTestFile("NA12892.AS.chr20snippet.g.vcf")}, getTestFile("testAlleleSpecificAnnotationsNoGroup.vcf"), Arrays.asList("-G", "Standard", "-G", "AS_Standard"), b37_reference_20_21},
//Test that trailing reference blocks are emitted with correct intervals
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could demonstrate the issue i'm talking about if you combined two files that are already banded in step with eachother and you set BREAK_BANDS_LONG_NAME to some short value. you would expect a gap at the end of the contig given your change.

@jamesemery
Copy link
Collaborator

Feel free to merge it then 👍

@cmnbroad cmnbroad merged commit 54295ab into master Apr 20, 2018
@cmnbroad cmnbroad deleted the cn_combine_gvcfs branch April 20, 2018 13:24
cwhelan pushed a commit to cwhelan/gatk-linked-reads that referenced this pull request May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants