Move marker plot generation to filter steb for STRs #69

rnmitchell · 2024-01-10T18:14:35Z

Moving the marker plot generation code to the filter step in order to create a third set of plots containing only the true alleles.

Marker plots containing all alleles will now be classified as either BelowAT, ~~Real~~Typed, or Stutter by color.

rnmitchell · 2024-01-10T18:20:24Z

lusSTR/scripts/marker.py

-            else:
+                bracketed_form = (
+                    f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} "
+                    f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}"
+                )
+            elif "TTTT" in self.uas_sequence:
                break_point = self.uas_sequence.index("TTTT") + 14
-            bracketed_form = (
-                f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} "
-                f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}"
-            )
+                bracketed_form = (
+                    f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} "
+                    f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}"
+                )
+            else:
+                bracketed_form = ""


Once again ran into an issue with this marker- this time, the sequence seemed to be off by 10s of bases on one side (e.g. the sequence started ~50 bases early on the 5' side thus not containing the entire repeat region and was missing the GGGCTGCCTA and the TTTT sequences the code originally searched for). Now the code just basically drops these sequences- they are garbage.

Rebecca emphatically declares that these sequences cannot be redeemed.

rnmitchell · 2024-01-10T18:20:55Z

lusSTR/scripts/repeat.py

+    if bf != "":
+        for block in bf.split(" "):
+            if block == repeat:
+                if 1 > longest:
+                    longest = 1
+            match = re.match(r"\[" + repeat + r"\](\d+)", block)
+            if match:
+                length = int(match.group(1))
+                if length > longest:
+                    longest = length


Code is dropping these garbage sequences.

This was a simple and straightforward change, but I have a suggestion for improvement: instead of indenting the code in a conditional block, short-circuit the inverse condition.

if bf == "": return 0 for block in bf.split(" "): ### blah blah blah

This has a few benefits. First of all, you see what happens right away if bf is empty: in the current code, you have to read all the way to the end of the block to find out that...nothing happens. Second, it keeps the nesting and indentation to a minimum: this is not only good for legibility, it also (at least in this case) makes for cleaner diffs in code review. My suggestion above would result in a simple two-line diff: the diff for the current code looks like this.

makes sense! I'll update.

rnmitchell · 2024-01-11T13:56:34Z

straitrazor_B9B_S9_L001_R1_001_straitrazor_marker_plots.pdf
Example of the new marker plots

rnmitchell · 2024-01-11T14:59:54Z

lusSTR/scripts/filter_settings.py

-    if datatype == "lusplus":
-        final_df = final_df.drop("CE_Allele", axis=1)


Ran into a problem with this code because the marker plots use the CE allele for plotting- removing it so it can actually create the plots.

rnmitchell · 2024-01-11T15:32:01Z

Ready for your review @standage

rnmitchell · 2024-01-11T16:50:49Z

lusSTR/wrappers/convert.py

-        if locus == "D12S391" and kit == "powerseq":
+        if locus == "D12S391" and kit == "powerseq" and software == "straitrazor":
            if "." in str(marker.canonical):
                check_sr += 1
                if check_sr > 10:


Looking through the code I realized if I was going to be running a check on strait razor data I should make sure that strait razor was actually used (and not gene marker).

rnmitchell · 2024-01-11T20:11:46Z

nist_NTD003_S7_L001_001_strresults_filtered_marker_plots.pdf
Newest plots

standage

Looks good overall! I have one suggestion for improvement.

standage · 2024-01-12T02:42:24Z

lusSTR/scripts/repeat.py

+    if bf != "":
+        for block in bf.split(" "):
+            if block == repeat:
+                if 1 > longest:
+                    longest = 1
+            match = re.match(r"\[" + repeat + r"\](\d+)", block)
+            if match:
+                length = int(match.group(1))
+                if length > longest:
+                    longest = length


This was a simple and straightforward change, but I have a suggestion for improvement: instead of indenting the code in a conditional block, short-circuit the inverse condition.

if bf == "": return 0 for block in bf.split(" "): ### blah blah blah

This has a few benefits. First of all, you see what happens right away if bf is empty: in the current code, you have to read all the way to the end of the block to find out that...nothing happens. Second, it keeps the nesting and indentation to a minimum: this is not only good for legibility, it also (at least in this case) makes for cleaner diffs in code review. My suggestion above would result in a simple two-line diff: the diff for the current code looks like this.

standage · 2024-01-12T02:44:02Z

lusSTR/scripts/marker.py

-            else:
+                bracketed_form = (
+                    f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} "
+                    f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}"
+                )
+            elif "TTTT" in self.uas_sequence:
                break_point = self.uas_sequence.index("TTTT") + 14
-            bracketed_form = (
-                f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} "
-                f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}"
-            )
+                bracketed_form = (
+                    f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} "
+                    f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}"
+                )
+            else:
+                bracketed_form = ""


Rebecca emphatically declares that these sequences cannot be redeemed.

rnmitchell added 2 commits January 10, 2024 13:11

changed how marker plots are created [skip ci]

506690e

remove print statement [skip ci]

2a6fd83

rnmitchell commented Jan 10, 2024

View reviewed changes

rnmitchell added 2 commits January 11, 2024 06:40

added AT line [skip ci]

504c142

updated test_suite [skip ci]

677f2c5

rnmitchell added 2 commits January 11, 2024 09:52

updated tests

fdc0086

removed hashed code

f0873dc

rnmitchell marked this pull request as ready for review January 11, 2024 14:57

rnmitchell requested a review from standage January 11, 2024 14:57

rnmitchell commented Jan 11, 2024

View reviewed changes

updated convert

3a24edf

rnmitchell commented Jan 11, 2024

View reviewed changes

rnmitchell added 2 commits January 11, 2024 13:06

fixed copy warning

922a539

added allele name to typed alleles plot

e534317

standage requested changes Jan 12, 2024

View reviewed changes

updated empty bf code

3d478ac

standage approved these changes Jan 12, 2024

View reviewed changes

standage merged commit 03aa9e2 into master Jan 12, 2024
2 checks passed

standage deleted the markerplots branch January 12, 2024 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move marker plot generation to filter steb for STRs #69

Move marker plot generation to filter steb for STRs #69

rnmitchell commented Jan 10, 2024 •

edited

Loading

rnmitchell Jan 10, 2024

standage Jan 12, 2024

rnmitchell Jan 10, 2024

standage Jan 12, 2024

rnmitchell Jan 12, 2024

rnmitchell commented Jan 11, 2024

rnmitchell Jan 11, 2024

rnmitchell commented Jan 11, 2024

rnmitchell Jan 11, 2024

rnmitchell commented Jan 11, 2024

standage left a comment

standage Jan 12, 2024

standage Jan 12, 2024

		if datatype == "lusplus":
		final_df = final_df.drop("CE_Allele", axis=1)

Move marker plot generation to filter steb for STRs #69

Move marker plot generation to filter steb for STRs #69

Conversation

rnmitchell commented Jan 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnmitchell commented Jan 11, 2024

Choose a reason for hiding this comment

rnmitchell commented Jan 11, 2024

Choose a reason for hiding this comment

rnmitchell commented Jan 11, 2024

standage left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rnmitchell commented Jan 10, 2024 •

edited

Loading