Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run against Eukrprot database did not complete successfully and Eukprot update request #66

Open
mkarlik93 opened this issue Jul 12, 2024 · 1 comment

Comments

@mkarlik93
Copy link

Hi,

I have been trying to run EUKulele using Eukprot database, however run breaks at the step of taxonomic estimation:

Here you have log information:

Traceback (most recent call last):
File "/home/karlicki/miniconda3/envs/EUKulele/lib/python3.7/site-packages/EUKulele/manage_steps.py", line 587, in manageTaxEstimation
est_err_file,est_out_file)
File "/home/karlicki/miniconda3/envs/EUKulele/lib/python3.7/site-packages/EUKulele/tax_placement.py", line 280, in place_taxonomy
tax_table, classes = read_in_taxonomy(tax_file)
File "/home/karlicki/miniconda3/envs/EUKulele/lib/python3.7/site-packages/EUKulele/tax_placement.py", line 69, in read_in_taxonomy
tax_out.source_id = tax_out.source_id.astype(str)
File "/home/karlicki/miniconda3/envs/EUKulele/lib/python3.7/site-packages/pandas/core/generic.py", line 5460, in getattr
return object.getattribute(self, name)
AttributeError: 'DataFrame' object has no attribute 'source_id'

Taxonomic estimation did not complete successfully. Check log file for details.
AttributeError("'DataFrame' object has no attribute 'source_id'")

Could you please help with that?

Also, the provided eukprot db contains around 700 records, whereas the latest V3 contains more then 900, it would be possible to update the database too?

We would love to run the tool on the freshwater metatranscriptomes and we need both mmetsp and eukprot running to get comprehensive taxonomic annotation.

Thanks a lot!

@akrinos
Copy link
Member

akrinos commented Aug 31, 2024

Hi @mkarlik93 !! So sorry for the delayed reply as a lot has happened this summer.

I have made and will soon deploy all the requested changes to EUKulele -

  • first, indeed the taxonomy table at present needs to have a column called "Source_ID" (any case is accepted) in order to run properly - this has been fixed in the new downloaded EukProt taxonomy table
  • the new EukProt version is updated to use the datasets from EukProt v03.2021_11_22 with its 1100+ entries
  • EukProt should now download properly in bioconda version 2.0.9 for EUKulele

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants