-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move marker plot generation to filter steb for STRs #69
Changes from 6 commits
506690e
2a6fd83
504c142
677f2c5
fdc0086
f0873dc
3a24edf
922a539
e534317
3d478ac
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -388,12 +388,18 @@ def convert(self): | |
else: | ||
if "GGGCTGCCTA" in self.uas_sequence: | ||
break_point = self.uas_sequence.index("GGGCTGCCTA") + 10 | ||
else: | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
elif "TTTT" in self.uas_sequence: | ||
break_point = self.uas_sequence.index("TTTT") + 14 | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
bracketed_form = ( | ||
f"{collapse_repeats_by_length(self.uas_sequence[:break_point], 4)} " | ||
f"{collapse_repeats_by_length(self.uas_sequence[break_point:], 4)}" | ||
) | ||
else: | ||
bracketed_form = "" | ||
Comment on lines
-391
to
+402
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Once again ran into an issue with this marker- this time, the sequence seemed to be off by 10s of bases on one side (e.g. the sequence started ~50 bases early on the 5' side thus not containing the entire repeat region and was missing the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rebecca emphatically declares that these sequences cannot be redeemed. |
||
return bracketed_form | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -163,15 +163,16 @@ def repeat_copy_number(bf, repeat): | |
The input is a sequence string collapsed to bracketed sequence form. | ||
""" | ||
longest = 0 | ||
for block in bf.split(" "): | ||
if block == repeat: | ||
if 1 > longest: | ||
longest = 1 | ||
match = re.match(r"\[" + repeat + r"\](\d+)", block) | ||
if match: | ||
length = int(match.group(1)) | ||
if length > longest: | ||
longest = length | ||
if bf != "": | ||
for block in bf.split(" "): | ||
if block == repeat: | ||
if 1 > longest: | ||
longest = 1 | ||
match = re.match(r"\[" + repeat + r"\](\d+)", block) | ||
if match: | ||
length = int(match.group(1)) | ||
if length > longest: | ||
longest = length | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Code is dropping these garbage sequences. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was a simple and straightforward change, but I have a suggestion for improvement: instead of indenting the code in a conditional block, short-circuit the inverse condition. if bf == "":
return 0
for block in bf.split(" "):
### blah blah blah This has a few benefits. First of all, you see what happens right away if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. makes sense! I'll update. |
||
return str(longest) | ||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,24 +1,24 @@ | ||
SampleID,Locus,LUS_Plus,Reads,allele_type,parent_allele1,parent_allele2,allele1_ref_reads,allele2_ref_reads,perc_noise,perc_stutter,CE_Allele | ||
Sample1,D4S2408,10_10_0,1022,real_allele,,,,,,, | ||
Sample1,D4S2408,9_9_0,116,-1_stutter/+1_stutter,10_10_0,8_8_0,1022.0,1050.0,,, | ||
Sample1,D4S2408,8_8_0,1050,real_allele,,,,,,, | ||
Sample1,D8S1179,14_12_1_0,869,real_allele,,,,,,, | ||
Sample1,D8S1179,13_11_1_0,184,-1_stutter,14_12_1_0,,869.0,,,0.212, | ||
Sample1,D8S1179,12_10_1_0,37,-2_stutter,14_12_1_0,,869.0,,,0.201, | ||
Sample1,D9S1122,13_11,948,real_allele,,,,,,, | ||
Sample1,D9S1122,12_10,108,-1_stutter,13_11,,948.0,,,0.114, | ||
Sample1,D9S1122,11_11,991,real_allele,,,,,,, | ||
Sample1,D9S1122,10_10,87,-1_stutter,11_11,,991.0,,,0.088, | ||
Sample1,FGA,23_15_3_0,1436,real_allele,,,,,,, | ||
Sample1,FGA,22_14_3_0,262,-1_stutter,23_15_3_0,,1436.0,,,0.182, | ||
Sample1,FGA,21_13_3_0,48,BelowAT,,,,,0.013,, | ||
Sample1,FGA,20_12_3_0,1750,real_allele,,,,,,, | ||
Sample1,FGA,18_10_3_0,181,real_allele,,,,,,, | ||
Sample1,FGA,17_9_3_0,15,BelowAT,,,,,0.004,, | ||
Sample1,PENTA D,15_15,50,real_allele,,,,,,, | ||
Sample1,PENTA D,13_13,1000,real_allele,,,,,,, | ||
Sample1,PENTA E,7_7,505,real_allele,,,,,,,7.0 | ||
Sample1,TH01,7_7,2197,real_allele,,,,,,, | ||
Sample1,TH01,6_6,1632,real_allele,,,,,,, | ||
Sample1,TH01,5_5,66,BelowAT,,,,,0.017,, | ||
Sample1,TPOX,11_11,15,BelowAT,,,,,1.0,,11.0 | ||
SampleID,Locus,CE_Allele,LUS_Plus,Reads,allele_type,parent_allele1,parent_allele2,allele1_ref_reads,allele2_ref_reads,perc_noise,perc_stutter | ||
Sample1,D4S2408,10.0,10_10_0,1022,real_allele,,,,,, | ||
Sample1,D4S2408,9.0,9_9_0,116,-1_stutter/+1_stutter,10_10_0,8_8_0,1022.0,1050.0,, | ||
Sample1,D4S2408,8.0,8_8_0,1050,real_allele,,,,,, | ||
Sample1,D8S1179,14.0,14_12_1_0,869,real_allele,,,,,, | ||
Sample1,D8S1179,13.0,13_11_1_0,184,-1_stutter,14_12_1_0,,869.0,,,0.212 | ||
Sample1,D8S1179,12.0,12_10_1_0,37,-2_stutter,14_12_1_0,,869.0,,,0.201 | ||
Sample1,D9S1122,13.0,13_11,948,real_allele,,,,,, | ||
Sample1,D9S1122,12.0,12_10,108,-1_stutter,13_11,,948.0,,,0.114 | ||
Sample1,D9S1122,11.0,11_11,991,real_allele,,,,,, | ||
Sample1,D9S1122,10.0,10_10,87,-1_stutter,11_11,,991.0,,,0.088 | ||
Sample1,FGA,23.0,23_15_3_0,1436,real_allele,,,,,, | ||
Sample1,FGA,22.0,22_14_3_0,262,-1_stutter,23_15_3_0,,1436.0,,,0.182 | ||
Sample1,FGA,21.0,21_13_3_0,48,BelowAT,,,,,0.013, | ||
Sample1,FGA,20.0,20_12_3_0,1750,real_allele,,,,,, | ||
Sample1,FGA,18.0,18_10_3_0,181,real_allele,,,,,, | ||
Sample1,FGA,17.0,17_9_3_0,15,BelowAT,,,,,0.004, | ||
Sample1,PENTA D,15.0,15_15,50,real_allele,,,,,, | ||
Sample1,PENTA D,13.0,13_13,1000,real_allele,,,,,, | ||
Sample1,PENTA E,7.0,7_7,505,real_allele,,,,,, | ||
Sample1,TH01,7.0,7_7,2197,real_allele,,,,,, | ||
Sample1,TH01,6.0,6_6,1632,real_allele,,,,,, | ||
Sample1,TH01,5.0,5_5,66,BelowAT,,,,,0.017, | ||
Sample1,TPOX,11.0,11_11,15,BelowAT,,,,,1.0, |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,28 +1,28 @@ | ||
Locus,CE Allele,Allele Seq,Reads | ||
D4S2408,8,ATCTATCTATCTATCTATCTATCTATCTATCT,1000 | ||
D4S2408,9,ATCTATCTATCTATCTATCTATCTATCTATCTATCT,1357 | ||
D4S2408,10,ATCTATCTATCTATCTATCTATCTATCTATCTATCTATCT,900 | ||
D8S1179,12,TCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTA,26 | ||
D8S1179,12,TCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,11 | ||
D8S1179,13,TCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,95 | ||
D8S1179,13,TCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,89 | ||
D8S1179,14,TCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,739 | ||
D8S1179,14,TCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,130 | ||
D9S1122,10,TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,87 | ||
D9S1122,11,TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,991 | ||
D9S1122,12,TAGATCGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,108 | ||
D9S1122,13,TAGATCGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,948 | ||
FGA,17,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,15 | ||
FGA,18,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,181 | ||
FGA,20,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,1750 | ||
FGA,21,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,48 | ||
FGA,22,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,262 | ||
FGA,23,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,1436 | ||
PentaD,13,AAAAGAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGA,1000 | ||
PentaD,15,AAAAGAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGA,50 | ||
PentaE,7,AAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGA,505 | ||
TH01,5,AATGAATGAATGAATGAATG,66 | ||
TH01,6,AATGAATGAATGAATGAATGAATG,1632 | ||
TH01,7,AATGAATGAATGAATGAATGAATGAATG,2197 | ||
TPOX,11,AATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATG,15 | ||
vWA,16,TCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTA,6 | ||
D4S2408,8.0,ATCTATCTATCTATCTATCTATCTATCTATCT,1000 | ||
D4S2408,9.0,ATCTATCTATCTATCTATCTATCTATCTATCTATCT,1357 | ||
D4S2408,10.0,ATCTATCTATCTATCTATCTATCTATCTATCTATCTATCT,900 | ||
D8S1179,12.0,TCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTA,26 | ||
D8S1179,12.0,TCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,11 | ||
D8S1179,13.0,TCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,95 | ||
D8S1179,13.0,TCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,89 | ||
D8S1179,14.0,TCTATCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,739 | ||
D8S1179,14.0,TCTATCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTA,130 | ||
D9S1122,10.0,TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,87 | ||
D9S1122,11.0,TAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,991 | ||
D9S1122,12.0,TAGATCGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,108 | ||
D9S1122,13.0,TAGATCGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGATAGA,948 | ||
FGA,17.0,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,15 | ||
FGA,18.0,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,181 | ||
FGA,20.0,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,1750 | ||
FGA,21.0,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,48 | ||
FGA,22.0,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,262 | ||
FGA,23.0,TTTCTTTCTTTCTTTTTTCTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTTTCTCCTTCCTTCC,1436 | ||
PentaD,13.0,AAAAGAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGA,1000 | ||
PentaD,15.0,AAAAGAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGA,50 | ||
PentaE,7.0,AAAGAAAAGAAAAGAAAAGAAAAGAAAAGAAAAGA,505 | ||
TH01,5.0,AATGAATGAATGAATGAATG,66 | ||
TH01,6.0,AATGAATGAATGAATGAATGAATG,1632 | ||
TH01,7.0,AATGAATGAATGAATGAATGAATGAATG,2197 | ||
TPOX,11.0,AATGAATGAATGAATGAATGAATGAATGAATGAATGAATGAATG,15 | ||
vWA,16.0,TCTATCTGTCTGTCTGTCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCTATCCATCTA,6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ran into a problem with this code because the marker plots use the CE allele for plotting- removing it so it can actually create the plots.