-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
indels #26
Conversation
I think this is ready for review @standage. It's not much of a change (I realized I had accidentally included the indel_flag function in marker.py in the previous branch when it should have been here... oops! This is why I should only work on one branch at a time...). I also added a test to ensure the indel flag is created if the length allele is not present in the dictionary. One note I want to make. I updated the test_flank_anno() file. If you look at this file, you can see that there are a decent number of sequences that are flagged as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Unless I'm missing something, the number of reads for each sequence is 100. So I'm not sure how to interpret that. But apart from the uncertainty there I don't see anything problematic with the output. |
Ahh sorry, I was looking at a different file... I had made Reads all to be 100 in the test files. |
When I first saw that a while back, I figured you had just set the number of reads to 100 for testing purposes. At what point do sequences get filtered out by number of reads? |
It's up to the user to do beforehand, lusSTR don't do any filtering. That could be something we implement in the future, I suppose. |
Yeah, I mean in that case I can't think of any concerns about the output as is. |
Should I go ahead and merge this? |
Oh, yeah go ahead. I got distracted. |
This PR is to address the possibility of indels within the flanking sequence of the UAS region when using STRait Razor output (full length sequences). The initial idea is to create a dictionary of known STR length alleles for each locus and if the calculated length allele is not in the list of known alleles, flag the sequence as containing a possible indel in the flanking report for the user to manually evaluate.