-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement and rename get_annotation function #22
Conversation
Ready for review and merge! |
Also, renamed the |
Ok, are you done changing things? I didn't want to merge it if you were still working on it. All the tests passed and I used the ~1300 sequences we have "known" bracketed annotation for and all matched as well. |
Well, I'm on a roll. 😀 Since you're actively responding, let me get one more change in there. Then it will be mergeable. |
hahaha exactly why I asked! |
Ok, this last commit renamed the |
NOW are you ready? ;) |
This update replaces the
get_annotation
function with the newcollapse_tandem_repeat
andcollapse_all_repeats
functions.I tried several ways to clean up the regex-based splitting approach, but this approach leads to unintuitive off-by-one errors for internal tandem repeats (i.e. 10 empty strings representing 11 copies of the repeat). Correcting for this resulted in opaque code any way I sliced it.
So I started from scratch and went with a recursive function that finds the first instance of the repeat in the sequence, collapses it, and then calls itself to repeat the process on the remaining sequence. This approach is a bit more concise and clear, and (most importantly) it works perfectly as a drop-in replacement for
get_annotation
!I also added some doctests to the docstrings of the two new functions, so I updated
make test
in the Makefile to find and run these doctests.