-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changing minDR and maxDR give unexpected output #81
Comments
Hi Michelle, This is very concerning and obviously not the correct behavior. I'll try to look into this quickly. Connor |
Hi Michelle, I've looked in a bit at your observations and I can at least answer part of your questions. First, I think part of the problem that the Now the repeated subsequences in an individual read are almost never (at least with 100bp data) going to contain the full sequence of the direct repeat so setting A lot of the logic in crass is dedicated to clustering reads that appear to contain various partial matches to the same direct repeat and trying to determine the proper boundaries of the repeat and spacer. Now I'm realizing that this behavior is not clear as the I'm still looking into your other problems. |
Hi Michelle, I've just uploaded some new code on the the github repo which fixes your second bug/question. Crass should no longer output duplicate DR groups. I also think I understand why changing the DR sizes affects the number of CRISPRs. Basically what happens is that the many partial DR sequences that are initially found get grouped using a simple single-linkage clustering algorithm that is quite parameter sensitive. Changing the |
Hi Connor, Thanks for looking into these issues. I'm glad that the duplicate DR groups bug has now been fixed. It seems like there should be separate arguments for the initial DR size search (which would vary optimally based on the length of your read) and the final DR cleaning step. This seems like it wouldn't be too hard to implement. At the very least, the documentation should make what the existing parameters do clearer. The harder issue is what to do about the DR clustering. At least for my dataset, separating out these two highly similar DR types is essential. It would be ideal to have some separate parameter that would control the sensitivity of the clustering. Not quite sure how to do this, but i'll think about it some more. I understand this software isn't your priority right now. I can try to contribute, at least on the documentation end in the next couple of weeks. |
Hi Connor,
I have a subset of some data in the file test.txt. Each sequence in this fasta file contains one of two DR's which differ by only a single bp GTTCCAATTAATCTTAAACCCTACTAGGGATTGAAAC vs GTTCCAATTAATCTTAAACCCTATTAGGGATTGAAAC (they were "grep-ed" from a metagenome).
Since these two DR's are 37 bp long i adjusted
minDR
andmaxDR
like so (crass will throw an error if they are equal):When i do this, CRASS finds 0 CRISPRs. However, if I leave
minDR
andmaxDR
at their defaults of {23,47}, CRASS finds 3 CRISPRs and two of them are identical. WhenminDR
andmaxDR
are {27,40}, CRASS finds 1 CRISPR. Ironically, none of these are the correct number of CRISPRs in this dataset.There are two issues here:
Can you explain to me why adjusting the
minDR
andmaxDR
affects CRASS' ability to find CRISPRs even when the DR's are within the range set by the arguments? Is there another parameter i need to think about adjusting if I am adjustingminDR
andmaxDR
?CRASS will sometimes create duplicate DR groups if the clusters converge on the same consensus DR. This is a fairly undesirable outcome.
test.txt
Thanks,
Michelle
The text was updated successfully, but these errors were encountered: