You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
for genbank, we do --name-from-first, so we get output like this:
CP001941.1 Aciduliprofundum boonei T4...
for gtdb, we do trickier name setting, so we get output like this:
GCF_000025665 s__Aciduliprofundum boonei
with the main difference here being that the GCF_ identifier points to the identifier for the whole genome, not just the first sequence. That seems better.
We could add an optional identifier string to signatures. Hrm. Ref sourmash-bio/sourmash#268 for more such questions.
For wort I generated a name closer to GTBD: GCF_000246355.1 Leptospira kirschneri serovar Mozdok str. 'B 81/7 type 3/Tsaratsovo' strain=B 81/7 type 3/Tsaratsovo, CLC_glsol0
I generate it from assembly_summary.txt, using assembly_accession, organism_name, infraspecific_name (if there is one) and finally a comma and asm_name.
This example is pretty much the worst case I found: long name, with ' in the middle (so I need to escape properly in the shell). But the crucial point is using GCF_000246355.1 in the first position, because --name-from-first in NCBI assemblies is a mess for our use cases.
we've standardized over the last two years on putting the identifier first, as above, and we use this for pretty everything (including sourmash taxonomy, and picklists). Everything seems to work fine ;). Closing as resolved 🎉 !
for genbank, we do
--name-from-first
, so we get output like this:CP001941.1 Aciduliprofundum boonei T4...
for gtdb, we do trickier name setting, so we get output like this:
GCF_000025665 s__Aciduliprofundum boonei
with the main difference here being that the GCF_ identifier points to the identifier for the whole genome, not just the first sequence. That seems better.
We could add an optional identifier string to signatures. Hrm. Ref sourmash-bio/sourmash#268 for more such questions.
ref #7
The text was updated successfully, but these errors were encountered: