-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence motif analysis #176
Comments
With some more discussion the goal has been rephrased to be: how to correlate ambiguous positions in the final sequence for each template. Which could end up looking like the graph shown below. Above you see where in the sequence the ambiguous nodes are located, and below you see indicated with arrow how good the support is for a link between the two ambiguous positions.
flowchart LR;
A1-.->A2;
A1==>B2;
B1-->A2;
A2==>A3;
A2-.->B3;
A3-->A4;
A3-.->B4;
B2-->B3;
B3==>B4;
This could hard to fully complete if The intend is to run this algorithm after the whole alignment has been done, as all positions for all reads are then known. The ambiguous positions should be identified by the code (first possibility <75% score?) and connections between positions should be found in the placed reads. |
Added ambiguous positions annotation in sequence consensus, added ambiguity threshold to batchfile, added range warnings to batchfiles.
For analysis over multiple ambiguous nodes the following idea came to mind: the user can select a single position which will remove all traces except the ones coming from that position. And the higher order traces from this position will be shown as well. To give the user some feedback which nodes do have a nice level of higher order information there should be some bar showing the sum of all higher order traces for each position or something similar. |
There are some recent cases with polyclonal datasets which have multiple sequences on a single template that need some way to find the motifs in the reads. This means that varieties have to be tracked to see which ones correlate. For ideas see: https://meme-suite.org/meme/.
A naive algorithm to create such results would be to go over all reads and combine the ones that fit together (using a fuzzy matching based on the alignment) into patches of sequence. All patches (with at least 2% of all reads or some other cutoff) can then be presented on the right location. This would allow the user to see the bigger picture of the alignment with the number of sequencing mistakes drastically reduced.
Example reads
Should compress to the following
The main missing parts right now are:
The text was updated successfully, but these errors were encountered: