Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding binning-only flag, simplify recipes #192

Merged
merged 20 commits into from
Mar 12, 2024
Merged

Adding binning-only flag, simplify recipes #192

merged 20 commits into from
Mar 12, 2024

Conversation

rhysnewell
Copy link
Owner

@rhysnewell rhysnewell commented Jan 4, 2024

Todo:

  • add binning only flag
  • add extra binners flag
  • add skip taxonomy
  • add skip singlem
  • parallelise singlem

extra binners flags splits the skip-binners flag in two for ease of understanding. Default binners are now metabat, vamb, semibin, rosella whilst users have to explicitly specify that they want to run concoct and maxbin (via --extra-binners)

Reasoning is that maxbin and concoct often slow down to a single thread for days at a time. To be fair, rosella used to do that too but is fixed now :) probably

Version bumped to 0.9.0 due to breaking changes to skip-binners argument, version bumped but not released

@rhysnewell rhysnewell requested review from wwood and AroneyS January 11, 2024 07:30
aviary/aviary.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@AroneyS AroneyS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few comments. Do you want me to run the tests on our end?

@rhysnewell
Copy link
Owner Author

Yep, testing on your end. I just slapped together a rework for the singlem script as well. I think the read container class in that script would be useful to include in other scripts as it simplifies a lot of the ifelse statements that those scripts use. If you could check on your end specifically checking if #181 and #182 are fixed?

@AroneyS
Copy link
Collaborator

AroneyS commented Jan 15, 2024

Most of the tests succeed here. Just that the diversity symlink has changed. diversity now symlinks to data/singlem_out/singlem_appraisal.tsv instead of linking diversity/singlem_out to data/singlem_out.

Should we set the symlink diversity to data/singlem_out?

@AroneyS
Copy link
Collaborator

AroneyS commented Jan 15, 2024

I don't have the data to test #181 and #182. @wwood?

@wwood
Copy link
Collaborator

wwood commented Jan 15, 2024

I don't have any test data for these, but could run them on the datasets where the issues arose.

Do we have a multi-sample short read dataset we use for testing? I guess not?

@rhysnewell
Copy link
Owner Author

rhysnewell commented Jan 15, 2024

Most of the tests succeed here. Just that the diversity symlink has changed. diversity now symlinks to data/singlem_out/singlem_appraisal.tsv instead of linking diversity/singlem_out to data/singlem_out.

Should we set the symlink diversity to data/singlem_out?

Oh that shouldn't happen, that would mean the gtdbtk link is incorrect too. Let me fix

@AroneyS
Copy link
Collaborator

AroneyS commented Jan 15, 2024

This fixes the test paths (since it is now aviary_out/diversity rather than aviary_out/diversity/singlem_out.
But aviary_out/diversity/metagenome.combined_otu_table.csv is empty.

CITATION.cff Show resolved Hide resolved
@rhysnewell
Copy link
Owner Author

@AroneyS @wwood I think this is good to go, I've tested on some datasets and it ran without issue. Also fixes #195

@AroneyS

This comment was marked as resolved.

@AroneyS
Copy link
Collaborator

AroneyS commented Mar 10, 2024

I realised that aviary_out/diversity/metagenome.combined_otu_table.csv is empty due to The SINGLEM_METAPACKAGE_PATH environment variable, which points to the default data directory, is not set.. Are we still ok letting SingleM fail silently for errors like that?

@rhysnewell
Copy link
Owner Author

Ah, no I'll fix that up

@rhysnewell
Copy link
Owner Author

@wwood I'm getting this error with a fresh install of singlem + db. What database versions are compatible and how do ensure the correct version is downloaded?

03/10/2024 11:30:43 PM INFO: SingleM v0.16.0
03/10/2024 11:30:43 PM INFO: Retrieval successful. Location of backpack is: /scratch/dbs/singlem
Traceback (most recent call last):
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/bin/singlem", line 658, in <module>
    singlem.pipe.SearchPipe().run(
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/pipe.py", line 63, in run
    metapackage = self._parse_packages_or_metapackage(**kwargs)
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/pipe.py", line 105, in _parse_packages_or_metapackage
    return Metapackage.acquire_default()
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/singlem/metapackage.py", line 71, in acquire_default
    backpack = zenodo_backpack.acquire(env_var_name=DATA_ENVIRONMENT_VARIABLE, version=DATA_DEFAULT_VERSION)
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/zenodo_backpack/__init__.py", line 121, in acquire
    if version != zb.data_version_string():
  File "/scratch/conda/56a97b6fc94fa6c4aba96ad6ffd34bfe_/lib/python3.10/site-packages/zenodo_backpack/__init__.py", line 71, in data_version_string
    return self.contents[DATA_VERSION]
KeyError: 'data_version'

@wwood
Copy link
Collaborator

wwood commented Mar 11, 2024

That error is usually because the payload_directory, not the base directory, of the backpack. I'll write some code so it'll work either way, but maybe worth fixing in Aviary? Or is this an error specific to your install?

@rhysnewell
Copy link
Owner Author

This is the aviary install, but I guess I'm not quite sure what you are asking. Should the SINGLEM_METAPACKAGE_PATH variable be pointing to the .zb directory or the extracted contents of this folder? Currently, aviary moves payload_directory to a user specified location, so is using the latter

@wwood
Copy link
Collaborator

wwood commented Mar 11, 2024

It should point to the .zb directory, because the version is recorded in the CONTENTS.json file within that, rather than the one in the payload directory.

@rhysnewell
Copy link
Owner Author

@AroneyS Okay, it should error properly and install the singlem db appropriately now

@AroneyS
Copy link
Collaborator

AroneyS commented Mar 11, 2024

I still get the env variable error. Are you running the test/test_integration.py tests?

@rhysnewell
Copy link
Owner Author

Yep, integration tests are running without issue. SingleM is producing a sensible OTU table. You may need to redownload the singlem DB and reset the aviary configuration?

@AroneyS
Copy link
Collaborator

AroneyS commented Mar 11, 2024

singlem_pipe_reads is working for me but not singlem_appraise
I guess its due to using bash -c?

I also get SingleM Errored, please check data/singlem_out/singlem_log.txt but no error.

@rhysnewell
Copy link
Owner Author

True, I hadn't caught that singlem_appraise was also busted but silently. I've reworked it, tested on my end and it works as expected

@AroneyS
Copy link
Collaborator

AroneyS commented Mar 11, 2024

Tests work on my end now

@rhysnewell rhysnewell merged commit 7195d25 into dev Mar 12, 2024
2 checks passed
@rhysnewell rhysnewell deleted the ISS-185 branch March 12, 2024 00:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants