Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyze alignment from SAM/BAM file #66

Closed
averagehat opened this issue Nov 10, 2015 · 1 comment
Closed

Analyze alignment from SAM/BAM file #66

averagehat opened this issue Nov 10, 2015 · 1 comment
Assignees
Labels

Comments

@averagehat
Copy link
Contributor

Relevant to:
VDBWRAIR/ngs_mapper#166
VDBWRAIR/ngs_mapper#150
https://github.com/VDBWRAIR/VarCompare/issues/1

This would be useful in a "samtools" library like we had discussed which could be an alternative to pysam.

Applications:
1. finding the location (and quality score) of an insert from pileup
2. do 1. for VCF files with a CIGAR field
3. programatically drop SNPs that occur only on the ends of reads

for 2., it's necessary to analyse the cigar string in order to compare
freebayes and our variant caller. consider the entry:

KC807175.1_Houston 18379 . TA TATAA 119040 . ...CIGAR=1M3I1M;DP=4432;DPB=10946;TYPE=ins;technology.ILLUMINA=1;technology.IONTORRENT=0;technology.L454=0...
which could be

TA---
TATAA

but according to the cigar string is actually

T---A
TATAA

So it's necessary to find out where the insertion (or deletion) actually happens.

One concern with 1. and 3. is that this could take a long time in high depth areas.
Right now the computation is really just addition (it's not necessary to recreate the alignment strings) so maybe this isn't a problem.

To make it faster:
* filter for reads with an 'X' or 'I' in the cigar string
* if there are multiple bases supporting the SNP(s)/insert(s), look for all of that information at the same time.
* liberal use of generators

There is an alternative implementation (haven't used at all) here

Missing currently is the accomadation of the MD flag, described in the linked ipython notebook

Edit: bwa doesn't produce any MD flags, because they weren't added to bwa until version 0.76

but they can be forced by using samtools calmd:
samtools calmd <bam> <ref> produces the same output as samtools view but with the MD tags included. Supposedly, "The MD tag is for SNP/indel calling without looking at the reference.". I'm not sure if we need the MD tag, especially if bwa discriminates between M/=/X in the cigar string

@averagehat
Copy link
Contributor Author

I'm going to say "No" to this. Freebayes does supply total quality for a variant. This kind of processing would be error-prone and (therefore) very time-consuming to get right.

Closing this, but it is a useful reference and may some day need to revisited.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant