-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle trailing reference blocks in CombineGVCFs. #4615
Conversation
@jamesemery please review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure we want to abandon prevPos entirely, given that it could represent a gap between the last stack of variantcontexts and a previous real variant.
if ( !variantContextsOverlappingCurrentMerge.isEmpty() ) { | ||
logger.warn("You have asked for an interval that cuts in the middle of one or more gVCF blocks. Please note that this will cause you to lose records that don't end within your interval."); | ||
// finish off the last blocks | ||
final SimpleInterval lastInterval = new SimpleInterval( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... So this change is mostly agnostic and is definately important for the case where the assumption that the prevPos must be on the right contig (which wasn't checked before). I'm not convinced this works universally in the case where there is a gap before the last contig. So imagine there are still reads queued to be closed at prevPos, a gap in coverage, then a stack of variant contexts at the last position. It doesn't strike me that this should handle closing that last interval if it needs to generate its own stops (for example if BREAK_BANDS_LONG_NAME) is set then it may well generate breaks in the gap between prevpos and the last stack of VCs. I think we should prefer to use prevPos in these cases and only switch to variantContextsOverlappingCurrentMerge.get(0) in the case where prevpos is on the wrong contig/starts later than it should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesemery Having thought about this a bit more, I'm not sure that problem is new to this change. As far as I can tell, this implementation of the tool only emits coverage for territory that is covered by the inputs, and doesn't pad out territory beyond that. In onTraversalSuccess, if prevPos is on a different contig than the remaining variants to be merged, we won't close out (pad out) the previous contig, but I don't think the tool ever does that ?). We might want to fix that, but I think thats a separate/bigger issue ? Otherwise, the variants that are remaining are all guaranteed to be on the same contig (right ?) Sorry if I'm missing something, but please push back if you think I am.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You know, you are right. I was thinking of the code being like a normal apply()
call where you are holding onto something "next." Your fix looks good.
@@ -74,7 +74,10 @@ | |||
//Testing allele-specific annotations | |||
{new File[]{getTestFile("NA12878.AS.chr20snippet.g.vcf"), getTestFile("NA12892.AS.chr20snippet.g.vcf")}, getTestFile("testAlleleSpecificAnnotations.vcf"), Arrays.asList("-G", "Standard", "-G", "AS_Standard"), b37_reference_20_21}, | |||
//Testing allele-specific annotations missing AS_Standard Group | |||
{new File[]{getTestFile("NA12878.AS.chr20snippet.g.vcf"), getTestFile("NA12892.AS.chr20snippet.g.vcf")}, getTestFile("testAlleleSpecificAnnotationsNoGroup.vcf"), Arrays.asList("-G", "Standard", "-G", "AS_Standard"), b37_reference_20_21},}; | |||
{new File[]{getTestFile("NA12878.AS.chr20snippet.g.vcf"), getTestFile("NA12892.AS.chr20snippet.g.vcf")}, getTestFile("testAlleleSpecificAnnotationsNoGroup.vcf"), Arrays.asList("-G", "Standard", "-G", "AS_Standard"), b37_reference_20_21}, | |||
//Test that trailing reference blocks are emitted with correct intervals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could demonstrate the issue i'm talking about if you combined two files that are already banded in step with eachother and you set BREAK_BANDS_LONG_NAME to some short value. you would expect a gap at the end of the contig given your change.
Feel free to merge it then 👍 |
Fixes #4572.