-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set the Z value dynamically according to the database used #103
Comments
Hi @blakesweeney. Just for the record, I added the esl-seqstat command to rnacentral-import-pipeline. The idea is to put this file somewhere where I can download and parse the results. |
Hey @blakesweeney! There is a problem running the esl-seqstat command in pdbe: $ esl-seqstat pdbe-0.fasta We also have this F character on lines 7466 and 12603. Any suggestions on how to solve this without being manually? |
Without looking at those sequences, I'm betting they are tRNA and the F character is the amino acid on it. There are likely other cases with different characters as well. The easiest thing to do would be exclude those sequences from search, but I'm not sure that is a good idea. Another choice is to strip those characters off the sequence, which has other possible issues. I'd lean toward doing a very crude modification of the sequences to strip off things that are not ACGU, from the end of tRNA sequences only, but that is something that @AntonPetrov would need to weigh in on. |
This is not a new problem: in previous releases we generated a special fasta file for the old search (the Is it possible to continue excluding some sequences from sequence search as before? |
Sure, we can exclude them like we do currently. I'll add that filtering step to this export as well. |
This is for RNAcentral/rnacentral-sequence-search#103. We should only have parsable sequences in the sequence search dataset. This should select only the sequences that nhmmer can work with.
If someone searches just in miRBase, it should be miRBase-specific
The text was updated successfully, but these errors were encountered: