Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New "marker" module #24

Merged
merged 27 commits into from
May 29, 2020
Merged

New "marker" module #24

merged 27 commits into from
May 29, 2020

Conversation

standage
Copy link
Member

@standage standage commented May 27, 2020

This update implements a new "marker" module that delegates all handling of UAS and non-UAS sequences to dedicated object classes. Generic behavior is defined in a base STRMarker class, and then locus-specific special handling of the sequence and flanks is delegated to subclasses such as STRMarker_FGA and STRMarker_CSF1PO. This doesn't decrease the lines of code per se, but the new organization makes it much easier to reason about the base functionality and the locus-specific special rules. Subsequent improvements to these bits of code will hopefully be much easier now.

@standage
Copy link
Member Author

Ok, the locus-specific stuff has been migrated, and most of the tests are passing. Still need to troubleshoot some bugs. But IMO the annot and marker modules are much more tractable now. This is a big reorganization—I'm curious about your impressions.

@rnmitchell
Copy link
Contributor

I pulled your most recent commit and have a ton of failed tests now? Are you getting that too? I was going through code before I realized you had made the commit and found an error in the FGA function (I fixed it manually). After that I pulled the most recent commit and had a bunch fail.

@standage
Copy link
Member Author

Yes, I'm seeing a bunch of tests fail now too. My latest troubleshooting has focused on test_suite.py::test_annotate_full_nocombine. I've made a lot of improvements there, but something caused...basically all of test_marker.py to fail. I think the remaining issues have a lot to do with forward/reverse sequences being handled incorrectly.

@rnmitchell
Copy link
Contributor

rnmitchell commented May 29, 2020

Yeah, looks like the forward and UAS sequences are getting switched around. Loci that are failing need to be RCed.

@standage
Copy link
Member Author

The recent spike in failing tests for test_marker.py was due mostly to changing uas=True by default to uas=False by default in STRMarkerObject, to match the CLI. Many of the locus-specific tests use UAS sequences, so once I marked them correctly things worked again.

But yes, many of the remaining failures still look like revcom issues.

@rnmitchell
Copy link
Contributor

The output is also switched... you have uas.sequence, forward_sequence, canon, annotation, reverse_annotation ... when the column headers are uas sequence, forward_seq, canon, forward_annotation, UAS annotation.

You added a function that changed the uas.sequence to the RC and forward_sequence to the UAS output sequence.. so those are switched.

@standage
Copy link
Member Author

Regarding uas sequence vs forward sequence: I used to have marker.uas_sequence return the input sequence, and marker.forward_sequence do the ReverseCompNeeded check. But...that's backwards right? If the user provides a non-UAS ForenSeq sequence that doesn't match the orientation of the UAS, then matching the UAS strand requires reverse complementation. Making that change certainly made lots of errors go away.

But I think I've detected another possible issue just now: no reverse complementation should be done when uas=True. Somehow during all this refactoring, the question of trimming for UAS compatibility became decoupled from orientatino for UAS compatibility.

@standage
Copy link
Member Author

Regarding the forward vs reverse annotation, having it in the order I have now results in fewer errors, so...yeah, still need to figure out why things are mixed up.

@standage
Copy link
Member Author

Here's an example where I could use your help @rnmitchell. Check out the new test_strobj_D1S1656 function in test_marker.py. I grabbed the input data from 2800M_formatted_full.csv and the expected output from 2800M_full_anno.txt. I can't figure out what's going wrong with the STRMarker_D1S1656::annotation method, derived from D1_annotation in master.

@standage
Copy link
Member Author

Ok, I was able to figure out the issue with the D1 locus. Only issues with the flanking annotations remain!!!

@rnmitchell
Copy link
Contributor

Ok great!! I was actually just going to work on it, so I'm glad you got it working. I've been having issues with git and it's been a huge pain to stash away changes I've made to pull your newest commits.

@rnmitchell
Copy link
Contributor

Ok awesome!! So I have one D3 sequence where the annotation is not matching to what was called previously (and what is "known"). Sequence: TCTATCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTAACTAACTATCTATCTA and should be bracketed annotation as: TCTA [TCTG]2 [TCTA]9 [ACTA]2 [TCTA]2 but is being annotated as TCTA [TCTG]2 [TCTA]9 ACTA ACTA [TCTA]2

My guess is that D3 should be annotated using the sequence block approach and not a prioritized list of variants, but I need to look further into it. Just wanted to give you a heads up!

@standage
Copy link
Member Author

Is the D3 sequence above a full forward strand sequence?

@rnmitchell
Copy link
Contributor

It is a UAS region forward sequence (the sequences I've been checking with are from the UAS).

@standage
Copy link
Member Author

Ok, I added a new regression test for D3. Working on the subtleties.

@standage
Copy link
Member Author

Ok, new update fixes this case. Want to run it against your big data set again?

@standage standage requested a review from rnmitchell May 29, 2020 19:34
@standage standage marked this pull request as ready for review May 29, 2020 19:34
@standage
Copy link
Member Author

Updated the PR description. Ready for review and, if it passes the bigly private dataset, merge.

@rnmitchell
Copy link
Contributor

yes! it worked! woo hoo!

@rnmitchell rnmitchell marked this pull request as draft May 29, 2020 19:42
@rnmitchell rnmitchell marked this pull request as ready for review May 29, 2020 19:42
@rnmitchell
Copy link
Contributor

It wasn't letting me review it for some reason, but it looks good! So I'm going to merge. Thanks for your help!!

@rnmitchell rnmitchell merged commit 27ef0f6 into master May 29, 2020
@standage standage deleted the markermodule branch May 29, 2020 19:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants