Set the Z value dynamically according to the database used #103

carlosribas · 2020-01-09T17:12:21Z

If someone searches just in miRBase, it should be miRBase-specific

carlosribas · 2020-02-03T14:21:38Z

Hi @blakesweeney. Just for the record, I added the esl-seqstat command to rnacentral-import-pipeline. The idea is to put this file somewhere where I can download and parse the results.

carlosribas · 2020-02-06T14:41:20Z

Hey @blakesweeney! There is a problem running the esl-seqstat command in pdbe:

$ esl-seqstat pdbe-0.fasta
Parse failed (sequence file pdbe-0.fasta):
Line 6316: illegal character F

We also have this F character on lines 7466 and 12603. Any suggestions on how to solve this without being manually?

blakesweeney · 2020-02-07T10:03:54Z

Without looking at those sequences, I'm betting they are tRNA and the F character is the amino acid on it. There are likely other cases with different characters as well. The easiest thing to do would be exclude those sequences from search, but I'm not sure that is a good idea. Another choice is to strip those characters off the sequence, which has other possible issues. I'd lean toward doing a very crude modification of the sequences to strip off things that are not ACGU, from the end of tRNA sequences only, but that is something that @AntonPetrov would need to weigh in on.

AntonPetrov · 2020-02-07T10:15:35Z

This is not a new problem: in previous releases we generated a special fasta file for the old search (the _excluded file contained all the exceptional sequences): http://ftp.ebi.ac.uk/pub/databases/RNAcentral/releases/13.0/sequences/.internal/

Is it possible to continue excluding some sequences from sequence search as before?

blakesweeney · 2020-02-07T10:22:01Z

Sure, we can exclude them like we do currently. I'll add that filtering step to this export as well.

This is for RNAcentral/rnacentral-sequence-search#103. We should only have parsable sequences in the sequence search dataset. This should select only the sequences that nhmmer can work with.

carlosribas self-assigned this Jan 9, 2020

blakesweeney self-assigned this Feb 4, 2020

carlosribas added a commit that referenced this issue Feb 6, 2020

#103 - Set the e-value according to the database used

5a5a649

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the Z value dynamically according to the database used #103

Set the Z value dynamically according to the database used #103

carlosribas commented Jan 9, 2020

carlosribas commented Feb 3, 2020

carlosribas commented Feb 6, 2020

blakesweeney commented Feb 7, 2020

AntonPetrov commented Feb 7, 2020

blakesweeney commented Feb 7, 2020

Set the Z value dynamically according to the database used #103

Set the Z value dynamically according to the database used #103

Comments

carlosribas commented Jan 9, 2020

carlosribas commented Feb 3, 2020

carlosribas commented Feb 6, 2020

blakesweeney commented Feb 7, 2020

AntonPetrov commented Feb 7, 2020

blakesweeney commented Feb 7, 2020