Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate a representative subset of tRNAs #104

Open
AntonPetrov opened this issue Jan 10, 2020 · 1 comment
Open

Generate a representative subset of tRNAs #104

AntonPetrov opened this issue Jan 10, 2020 · 1 comment
Assignees

Comments

@AntonPetrov
Copy link
Member

Some tRNA sequences get a large number of hits which causes a problem for sequence search, for example this sequence currently crashes it: GCGGAAGUAGUUCAGUGGUAGAACACCACCUUGCCAAGGUGGGGGUCGCGGGUUCGAAUCCCGUCUUCCGCUCCA.

This is not surprising given that we have >4 million tRNA sequences.

Let's try the same strategy as we used for rRNAs. For example, the following query whitelists ~110,000 tRNAs that excludes millions of sequences only found in ENA or Rfam:

https://rnacentral.org/search?q=rna_type:%22tRNA%22%20and%20(expert_db:%22gtRNAdb%22%20or%20expert_db:%22refseq%22%20or%20expert_db:%22ensembl%22%20or%20expert_db:%22hgnc%22%20or%20expert_db:%22flybase%22%20or%20expert_db:%22wormbase%22%20or%20expert_db:%22pombase%22%20or%20expert_db:%22TAIR%22%20or%20expert_db:%22SGD%22%20or%20expert_db:%22MGI%22%20or%20expert_db:%22dictybase%22%20or%20expert_db:%22PDBe%22)

@blakesweeney - would it be possible to create a set of, say, 5 whitelist-trna files and make all-except-rrna-trna instead of all-except-rrna files?

@carlosribas
Copy link
Contributor

The sequence Anton mentioned now works without problems, but at the moment we have >7 million tRNA sequences and it might be interesting to create that whitelist to reduce the number of FASTA files and improve performance.

@blakesweeney let me know if I should create that, if not we may close this ticket

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants