You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we don't do this then we're going to run into problems with GenotypeGVCFs. Imagine if sample1 is multi-allelic and has the original record above at position N; and sample2 is bi-allelic and only has the insertion (so its record would be at position N+1). Because GenotypeGVCFs runs over each position in the gVCF/NAVS, it will genotype the same insertion separately for the 2 samples (because they occur in records at different positions).
Here's the example Eric came up with when we discussed this:
(I have no idea why Github rotated this)
If there is a het-non-ref sample (like S1), its alleles can be represented differently in the gVCF than a sample with a bi-allelic variant. Then when they get genotyped together, the same allele can show up at two different positions in the combined VCF, i.e. the T insertion is listed at position 325 for S1 and 326 for S2, but it's the same variant. This is probably what happened in the ExAC example (#1072).
It would be great for someone to write a HC (gVCF mode) unit test for this with some artificial reads so we can start working on a splitting procedure.
After looking at this again (years later), I don't think this is a real thing. Whether they occur in the same sample or not, variants get represented based on the SW alignment of the assembled haplotype to the reference haplotype. The S1 GGAGTC allele will get aligned to the reference as a G->GT insertion. When it's genotyped against the reads from the other haplotype it probably used to exhibit different behavior because HC didn't have spanning deletions, but thanks to Chris W. now it does! (#4963) I'm closing this since we don't have a real test case and I don't believe our whiteboard scribbles anymore.
@vdauwera commented on Tue Nov 15 2016
@vdauwera commented on Sat Mar 07 2015
Orginially from @eitanbanks
For example, we should split:
into:
If we don't do this then we're going to run into problems with GenotypeGVCFs. Imagine if sample1 is multi-allelic and has the original record above at position N; and sample2 is bi-allelic and only has the insertion (so its record would be at position N+1). Because GenotypeGVCFs runs over each position in the gVCF/NAVS, it will genotype the same insertion separately for the 2 samples (because they occur in records at different positions).
@vdauwera commented on Thu Jul 16 2015
May be solved by the spanning deletion fix. @eitanbanks do you still want methods to look at this? They need a concrete example.
@eitanbanks commented on Fri Jul 17 2015
This is not solved by the spanning deletions fix. Do you want me to create two sample gVCFs that illustrate this problem?
@ldgauthier commented on Fri Jul 31 2015
Here's the example Eric came up with when we discussed this:
(I have no idea why Github rotated this)
If there is a het-non-ref sample (like S1), its alleles can be represented differently in the gVCF than a sample with a bi-allelic variant. Then when they get genotyped together, the same allele can show up at two different positions in the combined VCF, i.e. the T insertion is listed at position 325 for S1 and 326 for S2, but it's the same variant. This is probably what happened in the ExAC example (#1072).
It would be great for someone to write a HC (gVCF mode) unit test for this with some artificial reads so we can start working on a splitting procedure.
@vdauwera commented on Mon Nov 14 2016
Does anyone still care about this? If so, should it go into the GATK4 repo?
@ldgauthier commented on Tue Nov 15 2016
I care, I just don't have the bandwidth to work on it. Please move to GATK4.
The text was updated successfully, but these errors were encountered: