-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NCBI Taxon ID included in the final_table.tsv file? #29
Comments
@cpavloud I found out about the ncbi-taxonomist tool. We could use it I think. Would you like to have a look and share any thoughts? |
I am not sure how it would work exactly (the ncbi-taxonomist page does not provide very good examples/explanations), but we could give it a try. |
Think of a while loop that will start from the end of the taxonomy in each row of the Assiming we are looking for
would return:
|
So, for example, if you have this classifications in the
you would search for and get the last line for each of your searches? |
I would search for
If I would not get a hit, I would continue with |
@cpavloud have a look. would that be ok ?
|
If there were no NCBI taxonomy IDs for |
Exactly! |
This feature is now ready and will be part of The issue is now resolved. |
Re-opening the issue: |
This is definitely useful for ITS #52 |
One think that has been requested is to enhance the final_table.tsv file to include (apart from the columns it already includes), the NCBI Taxon ID for each ASV/OTU and the accession number of the sequence that was its closest match in the database used. The NCBI Taxon ID could then be used as the taxonConceptID when submitting data to GBIF/OBIS using the DwC-A format (as discussed here)
For example, instead of the current final_table.tsv file, which looks like this
OTU_id,ERR0000008,ERR0000009,Classification
Otu1,1123,2,Eukaryota;Arthropoda;Insecta;Plecoptera;Capniidae;Allocapnia;Allocapnia aurora
Otu2,3,0,Eukaryota;Porifera;Demospongiae;Hadromerida;Polymastiidae;Polymastia;Polymastia littoralis
(Ideally) It could be something like this
OTU_id,ERR0000008,ERR0000009,Classification,Accession_number,NCBI_Taxon_ID
Otu1,1123,2,Eukaryota;Arthropoda;Insecta;Plecoptera;Capniidae;Allocapnia;Allocapnia aurora,JN200445,608846
Otu2,3,0,Eukaryota;Porifera;Demospongiae;Hadromerida;Polymastiidae;Polymastia;Polymastia littoralis,NC_023834,1473587
If it is not possible to retrieve the accession number and/or the NCBI taxon ID, I think we can find some workarounds.
Perhaps it will be possible to retrieve the NCBI Taxon ID using the Bio.Entrez package
The text was updated successfully, but these errors were encountered: