-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move marker plot generation to filter steb for STRs #69
Conversation
else: | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
elif "TTTT" in self.uas_sequence: | ||
break_point = self.uas_sequence.index("TTTT") + 14 | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
else: | ||
bracketed_form = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again ran into an issue with this marker- this time, the sequence seemed to be off by 10s of bases on one side (e.g. the sequence started ~50 bases early on the 5' side thus not containing the entire repeat region and was missing the GGGCTGCCTA
and the TTTT
sequences the code originally searched for). Now the code just basically drops these sequences- they are garbage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebecca emphatically declares that these sequences cannot be redeemed.
lusSTR/scripts/repeat.py
Outdated
if bf != "": | ||
for block in bf.split(" "): | ||
if block == repeat: | ||
if 1 > longest: | ||
longest = 1 | ||
match = re.match(r"\[" + repeat + r"\](\d+)", block) | ||
if match: | ||
length = int(match.group(1)) | ||
if length > longest: | ||
longest = length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is dropping these garbage sequences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a simple and straightforward change, but I have a suggestion for improvement: instead of indenting the code in a conditional block, short-circuit the inverse condition.
if bf == "":
return 0
for block in bf.split(" "):
### blah blah blah
This has a few benefits. First of all, you see what happens right away if bf
is empty: in the current code, you have to read all the way to the end of the block to find out that...nothing happens. Second, it keeps the nesting and indentation to a minimum: this is not only good for legibility, it also (at least in this case) makes for cleaner diffs in code review. My suggestion above would result in a simple two-line diff: the diff for the current code looks like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense! I'll update.
straitrazor_B9B_S9_L001_R1_001_straitrazor_marker_plots.pdf |
if datatype == "lusplus": | ||
final_df = final_df.drop("CE_Allele", axis=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran into a problem with this code because the marker plots use the CE allele for plotting- removing it so it can actually create the plots.
Ready for your review @standage |
if locus == "D12S391" and kit == "powerseq": | ||
if locus == "D12S391" and kit == "powerseq" and software == "straitrazor": | ||
if "." in str(marker.canonical): | ||
check_sr += 1 | ||
if check_sr > 10: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through the code I realized if I was going to be running a check on strait razor data I should make sure that strait razor was actually used (and not gene marker).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall! I have one suggestion for improvement.
lusSTR/scripts/repeat.py
Outdated
if bf != "": | ||
for block in bf.split(" "): | ||
if block == repeat: | ||
if 1 > longest: | ||
longest = 1 | ||
match = re.match(r"\[" + repeat + r"\](\d+)", block) | ||
if match: | ||
length = int(match.group(1)) | ||
if length > longest: | ||
longest = length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a simple and straightforward change, but I have a suggestion for improvement: instead of indenting the code in a conditional block, short-circuit the inverse condition.
if bf == "":
return 0
for block in bf.split(" "):
### blah blah blah
This has a few benefits. First of all, you see what happens right away if bf
is empty: in the current code, you have to read all the way to the end of the block to find out that...nothing happens. Second, it keeps the nesting and indentation to a minimum: this is not only good for legibility, it also (at least in this case) makes for cleaner diffs in code review. My suggestion above would result in a simple two-line diff: the diff for the current code looks like this.
else: | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
elif "TTTT" in self.uas_sequence: | ||
break_point = self.uas_sequence.index("TTTT") + 14 | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
else: | ||
bracketed_form = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebecca emphatically declares that these sequences cannot be redeemed.
Moving the marker plot generation code to the
filter
step in order to create a third set of plots containing only the true alleles.Marker plots containing all alleles will now be classified as either
BelowAT
,Real
Typed
, orStutter
by color.