-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract mapped genome sequence from mappy API #126
Comments
Not possible at the moment, but it should not be hard to add this functionality. |
Added. You can now: import mappy as mp
a = mp.Aligner("test/MT-human.fa")
seq = a.seq("MT_human", 100, 250)
for hit in a.map(mp.revcomp(seq)):
print(hit) I have only tested it on toy examples. Let me know if it has issues. |
I have tested this feature and it works perfectly for my test cases. |
After some more extensive testing it does not look as though this feature is working correctly. When working on some larger genomes the sequence extraction appears to be broken. As a test case extracting the following genomic location from the linked zebrafish genome returns None, but this is a valid genomic location and I can confirm that it contains only standard bases. Genomic location: 4:70000000-70000100 Through a binary search I've found that sequence is only able to be extracted for each chromosome up to position 58871916. This is true for every chromosome in this particular fasta file. |
It is a bug. It is now fixed via 96b132c. |
I am using the
mappy
API with the sole end goal of extracting the genomic sequence for a read mapping. Currently, I am usingpyfaidx
to take the mapped coordinates from themappy.Alignment
object and extract the associated genomic sequence. This seems like a bit of extra work since the sequence in theory may be extract-able from themappy.Aligner
ormappy.Alignment
objects (though there are likely reasons this may not be true). This would also allow my code not to load the same reference fasta file twice (once for mapping inmappy
and the second time for randomly extracting genome sequence). Is there access to this right now via the python API (maybe undocumented)? Or would it be possible to add this feature either directly or indirectly to themappy.Alignment
object?The text was updated successfully, but these errors were encountered: