You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Could I get you to have a little look excerpt from the vcf4.1/2 spec below:
REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; (<-The grammar here is stupidly confusing... Me) this *_padding base is not required (although it is permitted) *_for e.g. complex substitutions or other events where all alleles have at least one base represented in their Strings.
When working out what the change is (i.e. the alt bases) are you always expecting the first base of a complex to be unmodified?? If I remember correctly all our classifying is done by POS and string length difference. If I understand the spec above correctly then complex may or may not have leading unmodified bases which could mess up the classification or reconstruction of our (CGP) complex variants.
We have noticed that bcftools norm does something a little kookie with the following vcf/bcf:
from:
chr22 28687932 . TG T
chr7 140719345 . GTCA GGGG
chr7 140924667 . CCGC CTTA
chr7 140924681 . CC GG
chr7 140924680 . ACC AGG
chr7 140924700 . CCAT C
chr7 140924700 . CCA C
chr7 140924700 . C CTT
to:
chr22 28687932 . TG T
chr7 140719346 . TCAGGG
chr7 140924668 . CGCTTA
chr7 140924681 . CC GG
chr7 140924681 . CCGG
chr7 140924700 . CCAT C
chr7 140924700 . CCA C
chr7 140924700 . C CTT
Technically both are correct, but now whatever tools are processing these files have to check the first base of each string to see if there is unmodified base padding.
VEP seems to assume the whole alt sequence for a complex represents the total change and presents it as such to the user. I might have missed some combinations so by all means have a play around. . If Im, right you will have to interrogate the first base of what we call complex to distinguish between the two patterns otherwise we would be blind :-(
I thought I would mention as anyone using Vagrent on complex variants might not be getting what they expect depending on their interpretation of the spec.
Me :)
The text was updated successfully, but these errors were encountered:
Hi Guys,
Could I get you to have a little look excerpt from the vcf4.1/2 spec below:
REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive). Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For simple insertions and deletions in which either the REF or one of the ALT alleles would otherwise be null/empty, the REF and ALT Strings must include the base before the event (which must be reflected in the POS field), unless the event occurs at position 1 on the contig in which case it must include the base after the event; (<-The grammar here is stupidly confusing... Me) this *_padding base is not required (although it is permitted) *_for e.g. complex substitutions or other events where all alleles have at least one base represented in their Strings.
Then look at:
chr7 140924681 . CC GG . PASS PC=D;VT=Complex;VD=BRAF|CCDS5863.1|r.83delGinsc|c.22delGinsC|p.G8R|protein_coding:CDS:exon:indel:inframe_variant:substitution:non_synonymous_codon|SO:0000010:SO:0000316:SO:0000147:SO:1000032:SO:0001650:SO:1000002:SO:0001583;VC=missense;VW=BRAF|CCDS5863.1|r.83delGinsc|c.22delGinsC|p.G8R|protein_coding:CDS:exon:indel:inframe_variant:substitution:non_synonymous_codon|SO:0000010:SO:0000316:SO:0000147:SO:1000032:SO:0001650:SO:1000002:SO:0001583 GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:TG:VG ./.:0:0:0:0:42:43:42:43:0:0:1:0 ./.:90:70:116:106:181:154:181:154:116:106:2:3
chr7 1409246**80 ** . ACC AGG . PASS PC=D;VT=Complex;VD=BRAF|CCDS5863.1|r.83_84delGGinscc|c.22_23delGGinsCC|p.G8P|protein_coding:CDS:exon:indel:inframe_variant:substitution:non_synonymous_codon|SO:0000010:SO:0000316:SO:0000147:SO:1000032:SO:0001650:SO:1000002:SO:0001583;VC=missense;VW=BRAF|CCDS5863.1|r.83_84delGGinscc|c.22_23delGGinsCC|p.G8P|protein_coding:CDS:exon:indel:inframe_variant:substitution:non_synonymous_codon|SO:0000010:SO:0000316:SO:0000147:SO:1000032:SO:0001650:SO:1000002:SO:0001583 GT:PP:NP:PB:NB:PD:ND:PR:NR:PU:NU:TG:VG ./.:0:0:0:0:42:43:42:43:0:0:1:0 ./.:90:70:116:106:181:154:181:154:116:106:2:3
When working out what the change is (i.e. the alt bases) are you always expecting the first base of a complex to be unmodified?? If I remember correctly all our classifying is done by POS and string length difference. If I understand the spec above correctly then complex may or may not have leading unmodified bases which could mess up the classification or reconstruction of our (CGP) complex variants.
We have noticed that bcftools norm does something a little kookie with the following vcf/bcf:
bcftools norm -f GCA_000001405.15_GRCh38_full_analysis_set.fna tmp.bcf
from:
chr22 28687932 . TG T
chr7 140719345 . GTCA GGGG
chr7 140924667 . CCGC CTTA
chr7 140924681 . CC GG
chr7 140924680 . ACC AGG
chr7 140924700 . CCAT C
chr7 140924700 . CCA C
chr7 140924700 . C CTT
to:
chr22 28687932 . TG T
chr7 140719346 . TCA GGG
chr7 140924668 . CGC TTA
chr7 140924681 . CC GG
chr7 140924681 . CC GG
chr7 140924700 . CCAT C
chr7 140924700 . CCA C
chr7 140924700 . C CTT
Technically both are correct, but now whatever tools are processing these files have to check the first base of each string to see if there is unmodified base padding.
VEP seems to assume the whole alt sequence for a complex represents the total change and presents it as such to the user. I might have missed some combinations so by all means have a play around. . If Im, right you will have to interrogate the first base of what we call complex to distinguish between the two patterns otherwise we would be blind :-(
I thought I would mention as anyone using Vagrent on complex variants might not be getting what they expect depending on their interpretation of the spec.
Me :)
The text was updated successfully, but these errors were encountered: