Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement and rename get_annotation function #22

Merged
merged 4 commits into from
May 27, 2020

Conversation

standage
Copy link
Member

This update replaces the get_annotation function with the new collapse_tandem_repeat and collapse_all_repeats functions.

I tried several ways to clean up the regex-based splitting approach, but this approach leads to unintuitive off-by-one errors for internal tandem repeats (i.e. 10 empty strings representing 11 copies of the repeat). Correcting for this resulted in opaque code any way I sliced it.

So I started from scratch and went with a recursive function that finds the first instance of the repeat in the sequence, collapses it, and then calls itself to repeat the process on the remaining sequence. This approach is a bit more concise and clear, and (most importantly) it works perfectly as a drop-in replacement for get_annotation!

I also added some doctests to the docstrings of the two new functions, so I updated make test in the Makefile to find and run these doctests.

@standage standage requested a review from rnmitchell May 27, 2020 18:00
@standage
Copy link
Member Author

Ready for review and merge!

@standage
Copy link
Member Author

Also, renamed the loci_need_split_anno function to collapse_repeats_by_length. Made some minor changes to variable names, but otherwise left that function alone.

@rnmitchell
Copy link
Contributor

Ok, are you done changing things? I didn't want to merge it if you were still working on it.

All the tests passed and I used the ~1300 sequences we have "known" bracketed annotation for and all matched as well.

@standage
Copy link
Member Author

Ok, are you done changing things? I didn't want to merge it if you were still working on it.

Well, I'm on a roll. 😀 Since you're actively responding, let me get one more change in there. Then it will be mergeable.

@rnmitchell
Copy link
Contributor

hahaha exactly why I asked!

@standage
Copy link
Member Author

Ok, this last commit renamed the split_string method to sequence_to_bracketed_form, updated all the places in the code where this was called, and changed a few long variable names to get everything to fit on one 99-character line.

@rnmitchell
Copy link
Contributor

NOW are you ready? ;)

@standage
Copy link
Member Author

doit

@rnmitchell rnmitchell closed this May 27, 2020
@rnmitchell rnmitchell reopened this May 27, 2020
@rnmitchell rnmitchell merged commit 763b688 into master May 27, 2020
@standage standage deleted the refactor/get_annotation branch May 27, 2020 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants