indels #26

rnmitchell · 2020-06-01T10:59:48Z

This PR is to address the possibility of indels within the flanking sequence of the UAS region when using STRait Razor output (full length sequences). The initial idea is to create a dictionary of known STR length alleles for each locus and if the calculated length allele is not in the list of known alleles, flag the sequence as containing a possible indel in the flanking report for the user to manually evaluate.

rnmitchell · 2020-06-15T13:03:02Z

I think this is ready for review @standage. It's not much of a change (I realized I had accidentally included the indel_flag function in marker.py in the previous branch when it should have been here... oops! This is why I should only work on one branch at a time...). I also added a test to ensure the indel flag is created if the length allele is not present in the dictionary.

One note I want to make. I updated the test_flank_anno() file. If you look at this file, you can see that there are a decent number of sequences that are flagged as possible indel or partial sequence. You can also see that the majority of these sequences have 1 read and therefore are likely crap (and would be filtered out anyway), so I don't see this as a huge deal, but I wanted to see if you had any opinion on the number of sequences that are flagged (may also be a question for RebJ).

standage

Looks good!

standage · 2020-06-15T17:49:01Z

You can also see that the majority of these sequences have 1 read and therefore are likely crap...

Unless I'm missing something, the number of reads for each sequence is 100. So I'm not sure how to interpret that. But apart from the uncertainty there I don't see anything problematic with the output.

rnmitchell · 2020-06-15T18:13:04Z

Ahh sorry, I was looking at a different file... I had made Reads all to be 100 in the test files.

standage · 2020-06-15T18:16:07Z

When I first saw that a while back, I figured you had just set the number of reads to 100 for testing purposes. At what point do sequences get filtered out by number of reads?

rnmitchell · 2020-06-15T18:17:26Z

It's up to the user to do beforehand, lusSTR don't do any filtering. That could be something we implement in the future, I suppose.

standage · 2020-06-15T18:19:03Z

Yeah, I mean in that case I can't think of any concerns about the output as is.

standage · 2020-06-15T19:58:36Z

Should I go ahead and merge this?

rnmitchell · 2020-06-15T19:59:36Z

Oh, yeah go ahead. I got distracted.

intial commit

f2d420f

rnmitchell mentioned this pull request Jun 10, 2020

Fix format and annotation issues #27

Merged

Rebecca Mitchell added 5 commits June 11, 2020 09:47

updated length allele dictionaries

90d98b4

Merge branch 'master' into indels

fe34c5c

added indel flag for length alleles not in dict

791099a

added test for testing indel flag

74b16aa

fixed style error

5233d3c

rnmitchell marked this pull request as ready for review June 15, 2020 13:03

rnmitchell requested a review from standage June 15, 2020 13:03

standage approved these changes Jun 15, 2020

View reviewed changes

standage merged commit bf7e4b4 into master Jun 15, 2020

standage deleted the indels branch June 15, 2020 20:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

indels #26

indels #26

rnmitchell commented Jun 1, 2020

rnmitchell commented Jun 15, 2020

standage left a comment

standage commented Jun 15, 2020

rnmitchell commented Jun 15, 2020

standage commented Jun 15, 2020

rnmitchell commented Jun 15, 2020

standage commented Jun 15, 2020

standage commented Jun 15, 2020

rnmitchell commented Jun 15, 2020

indels #26

indels #26

Conversation

rnmitchell commented Jun 1, 2020

rnmitchell commented Jun 15, 2020

standage left a comment

Choose a reason for hiding this comment

standage commented Jun 15, 2020

rnmitchell commented Jun 15, 2020

standage commented Jun 15, 2020

rnmitchell commented Jun 15, 2020

standage commented Jun 15, 2020

standage commented Jun 15, 2020

rnmitchell commented Jun 15, 2020