You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This would be useful in a "samtools" library like we had discussed which could be an alternative to pysam.
Applications:
1. finding the location (and quality score) of an insert from pileup
2. do 1. for VCF files with a CIGAR field
3. programatically drop SNPs that occur only on the ends of reads
for 2., it's necessary to analyse the cigar string in order to compare
freebayes and our variant caller. consider the entry:
KC807175.1_Houston 18379 . TA TATAA 119040 . ...CIGAR=1M3I1M;DP=4432;DPB=10946;TYPE=ins;technology.ILLUMINA=1;technology.IONTORRENT=0;technology.L454=0...
which could be
TA---
TATAA
but according to the cigar string is actually
T---A
TATAA
So it's necessary to find out where the insertion (or deletion) actually happens.
One concern with 1. and 3. is that this could take a long time in high depth areas.
Right now the computation is really just addition (it's not necessary to recreate the alignment strings) so maybe this isn't a problem.
To make it faster:
* filter for reads with an 'X' or 'I' in the cigar string
* if there are multiple bases supporting the SNP(s)/insert(s), look for all of that information at the same time.
* liberal use of generators
There is an alternative implementation (haven't used at all) here
Missing currently is the accomadation of the MD flag, described in the linked ipython notebook
Edit: bwa doesn't produce any MD flags, because they weren't added to bwa until version 0.76
but they can be forced by using samtools calmd: samtools calmd <bam> <ref> produces the same output as samtools view but with the MD tags included. Supposedly, "The MD tag is for SNP/indel calling without looking at the reference.". I'm not sure if we need the MD tag, especially if bwa discriminates between M/=/X in the cigar string
The text was updated successfully, but these errors were encountered:
I'm going to say "No" to this. Freebayes does supply total quality for a variant. This kind of processing would be error-prone and (therefore) very time-consuming to get right.
Closing this, but it is a useful reference and may some day need to revisited.
Relevant to:
VDBWRAIR/ngs_mapper#166
VDBWRAIR/ngs_mapper#150
https://github.com/VDBWRAIR/VarCompare/issues/1
This would be useful in a "samtools" library like we had discussed which could be an alternative to
pysam
.Applications:
1. finding the location (and quality score) of an insert from pileup
2. do 1. for VCF files with a
CIGAR
field3. programatically drop SNPs that occur only on the ends of reads
for 2., it's necessary to analyse the cigar string in order to compare
freebayes and our variant caller. consider the entry:
KC807175.1_Houston 18379 . TA TATAA 119040 . ...CIGAR=1M3I1M;DP=4432;DPB=10946;TYPE=ins;technology.ILLUMINA=1;technology.IONTORRENT=0;technology.L454=0...
which could be
but according to the cigar string is actually
So it's necessary to find out where the insertion (or deletion) actually happens.
One concern with 1. and 3. is that this could take a long time in high depth areas.
Right now the computation is really just addition (it's not necessary to recreate the alignment strings) so maybe this isn't a problem.
To make it faster:
* filter for reads with an 'X' or 'I' in the cigar string
* if there are multiple bases supporting the SNP(s)/insert(s), look for all of that information at the same time.
* liberal use of generators
There is an alternative implementation (haven't used at all) here
Missing currently is the accomadation of the
MD
flag, described in the linked ipython notebookEdit: bwa doesn't produce any
MD
flags, because they weren't added tobwa
until version 0.76but they can be forced by using
samtools calmd
:samtools calmd <bam> <ref>
produces the same output assamtools view
but with the MD tags included. Supposedly, "The MD tag is for SNP/indel calling without looking at the reference.". I'm not sure if we need the MD tag, especially if bwa discriminates between M/=/X in the cigar stringThe text was updated successfully, but these errors were encountered: